[j-nsp] Segment Routing Real World Deployment (was: VPC mc-lag)

Sun Jul 8 15:28:09 EDT 2018

> Of Alexandre Guimaraes
> Sent: Saturday, July 07, 2018 1:01 PM
> 
Hi Alexandre,
With the level of detail you provided I'm afraid that it seems like some of your troubles are rooted in somewhat suboptimal design choices.

> My Usage Cent
> 
> My core Network, P and PE, are 100% Juniper
> 
> We start using VPLS, based in BGP sessions, at that time we was working at
> maximum of 2 or 3 new provisions per day.
> We won a big project contract, we reach 90/100 per month.
> VPLS become a issue in all fronts...
> 
> Planning/ low ports - price of 10G ports using MX and rack space usage
> 
This is good business case for an aggregation network out of say those EX switches you mentioned to aggregate low speed customer links into bundles of 10/40GE links towards PEs 
This allows you then to use the potential of a PE slot fully as dictated by the fabric making a better use of the chassis.
The carrier Ethernet features on PE that allows you to realize such L2 services aggregation are flexible VLAN tag manipulation (push/pop/translate 1/2 tags) and per interface VLANs range. 
Although the EX switches don't support per interface VLAN range but -I still think that ~4000 customers (or service vlans) per Agg. switch is enough. 
 
> Provisioning... vlan remap, memory usage of the routers and 2000/2500
> circuits/customers per MX
> 
Templates and automation in provisioning will really make the difference if you go pass a certain scale or customer onboarding rate,

> Tshoot, a headache to find the signaling problem when, for example: fiber
> degraded, all BGP sessions start flapping and the things become crazy and
> the impact increase each minute.
> 
I think BGP sessions are no different to LSP sessions in this regard, maybe just routed differently (not PE-to-PE but PE-to-RR).
Maybe running a BFD on your core links for rapid problems detection and interface hold down or dampening to stabilize the network could have helped with this.  

> Operating, vpls routing table become a pain is the ass when you use
> multipoint connections and with Lucifer reason, those multipoint become
> unreachable and the vpls table and all routing tables become ruge to analyze.
> 
On the huge routing table sizes, 
I think the problem of huge tables is something we all have to bear when in business of L2/L3 VPN services. 
But in Ethernet services only p2mp and mp2mp services require standard l2-switch-like mac learning and thus exhibit this scaling problem, but there's no need for mac learning for p2p services.
So I guess you could have just disabled the mac learning on instances that were intended to support p2p services.
Also it's a good practice to limit , contractually, how much resources can each VPN customer use, -in L2 services is for instance the MACs per interface or per bridge-domain, etc... 

> Regarding L2circuits using LDP.
> 
But hey I'm glad it worked out for you with the LDP signalled PWs, and yes I do agree the config is simpler for LDP. 

adam