[j-nsp] Segment Routing Real World Deployment (was: VPC mc-lag)

Tue Jul 10 17:33:03 EDT 2018

> From: Mark Tinka [mailto:mark.tinka at seacom.mu]
> Sent: Tuesday, July 10, 2018 11:33 AM
> 
> 
> On 9/Jul/18 17:25, adamv0025 at netconsultings.com wrote:
> 
> 
> 
> Well that really depends on the type of fault, let me explain:
> 
> All agreed.
> 
> My point was that if there is enough redundancy inside the core network to
> deal with fibre failures that can keep iBGP sessions up, it will also keep LDP
> sessions up, but most importantly, traffic will continue to flow.
> 
Well yeah that was my point in all my examples in the post you quoted :)

> The issue I have is when sessions remain up (because of high Keepalive
> timers), but there is no actual data plane. This is why I am saying that it would
> be remiss of us to give the community the impression that session uptime is
> better than what happens at the transport level, particularly where link
> redundancy may not be clear enough to abstract the relationship between
> both.
> 
I guess the point to take back from all this is, one should not try to solve hop-by-hop failures with end-to-end failovers. 
(not even if one of the endpoints themselves fails -but that's for another discussion - one about service egress protection). 
The hop-by-hop (transport) failures should be taken care of without the end-to-end service even noticing, that is the proper (non-leaky) abstraction (allowing you to scale).

Now a robust transport network with appropriate redundancy and failover mechanisms is responsibility of each operator.
One can use IGP tuning or take the new path computation out of the equation completely with FRR options: LDP + FRR based on LFA or even rLFA (with targeted LDP sessions all over the place) or SR with ti-LFA or RSVP-TE FRR. With FRR there's really no need to tune IGP for fast SPF path computation -but there might still be need to tune it for fast LSA/LSP propagation (think BGP-PIC "core"). 
With FRR it then all boils down to how fast can link failures be detected -loss of light will always be the fastest method, but the good practice is to use some sort of keepalives as well (BFD/CFM/IGP-keepalives). 
Then the faster the link down and up detection the faster the link can flap -that's where one should use dampening features (IP event dampening, bfd damping or hold-down timers -for getting interface up ) to protect your IGP (or BGP dampening for nexus users -since nexus platform doesn't have any other dampening (exception is/was 3548)). 

adam

netconsultings.com
::carrier-class solutions for the telecommunications industry::