[c-nsp] Resilience in order of few hundreds of milliseconds

Tue Mar 6 15:55:24 EST 2007

This is my experience with playing with fast hellos:

I wouldn't be as concerned with traffic load affecting reliability of 
fast keepalives of your IGP (although that's still a concern). My 
understanding is that there is built in QOS for IGP protocols to push 
them to the front of the queue on a congested interface to keep the 
adjacency up under most conditions. I'd be more afraid of relying on a 
software driven process keeping up under high CPU load of the main CPU 
of the platform in question. If your CPU load spikes to a degree where 
it can't process the hellos in a timely fashion, you'll start dropping 
adjacencies all over the place causing a lot of convergence and more CPU 
churn for SPF recalculations. Normally under high CPU load condition 
like this without your IGP dropping adjacencies, your FIB in hardware 
will keep traffic moving despite the CPU spike. I set keepalives to be 
somewhat aggressive but not overly aggressive for this reason. If you 
set it too low, you will have an increased risk of false positives and 
network instability. Hardware based detection of a failed link would be 
ideal. POS obviously comes to mind. BFD is there for simulating the POS 
recovery time, although I know you said it wasn't an option... plus last 
I heard, BFD still runs in software on some major platforms like the 
6500. If you have a platform where everything is run off the main CPU 
(like the 7200) do NOT play with really low IGP timers. At least not up 
to an NPE-400. I don't have experience with the G1 or G2 yet, but the 
architecture doesn't change in nature by being faster. It's still 
interrupt driven by the main CPU.

Just my $0.02. (And feel free to correct me if I am incorrect in my 
understanding of something).

Alaerte.Vidali at nokia.com wrote:
> Tony, Rodney,
> 
> Thanks for the analysis. I see you agree on the limitation of how to
> detect the failure.
> As I said, the solution must be without using FRR or BFD. 
> 
> When there is FRR support, we can use RSVP hellos; I am wondering way
> Cisco does not support RSVP hellos being a mechanism to indicate failure
> on logical tunnel interface from headend to tailend. The headend would
> detect failure on option 1 and try option 2. (or if there are two tunnel
> interfaces with diverse path and Multipath Routing, when one logical
> interface goes Down traffic would be carried by other interface)
> Any comments?
> 
> 
> Moving to other idea, what do you think about this OSPF tuning?
> -Send OSPF hellos at each 100ms, setting TspfDelay and TspfHold to 0. 
> 
> (it is not necessary reach 50 to 100ms as in FRR; 200 to 400ms would be
> fine, but network stability under high traffic is very important, so no
> false positives needed)
> 
> Br,
> Alaerte
> 
> 
> ========================================================================
> ==========================================
> -----Original Message-----
> From: ext Tony Li [mailto:tli at cisco.com] 
> Sent: Tuesday, March 06, 2007 4:52 PM
> To: Vidali Alaerte (Nokia-NET/RioDeJaneiro)
> Cc: rodunn at cisco.com; cisco-nsp at puck.nether.net
> Subject: Re: [c-nsp] Resilience in order of few hundreds of milliseconds
> 
> 
> To achieve that, you'd have to be running a keepalive mechanism (e.g.  
> ping) down the tunnel at an interval that's about a third of the desired
> failure detection time.  I don't think you'd really like that.
> 
> Tony
> 
> 
> -----Original Message-----
> From: ext Rodney Dunn [mailto:rodunn at cisco.com] 
> Sent: Tuesday, March 06, 2007 4:56 PM
> To: Vidali Alaerte (Nokia-NET/RioDeJaneiro)
> Cc: rodunn at cisco.com; cisco-nsp at puck.nether.net
> Subject: Re: [c-nsp] Resilience in order of few hundreds of milliseconds
> 
> On Tue, Mar 06, 2007 at 11:53:54AM -0600, Alaerte.Vidali at nokia.com
> wrote:
>> Hi Rodney,
>>
>> Yes, I have something in mind. What is the lowest time we can get 
>> using second precomputed path from headend to tailend on the same 
>> tunnel interface?
> 
> You could have a backu precomputed path but you still need the
> notification that the primary path isn't valid anymore (ie: Think BFD
> triggered FRR).
> 
>> The headend would sense problem in the first option and switch to 
>> second option.
> 
> How would it "sense" a failure at any point along the path?
> 
>> I did not test it.
>>
>> Tks,
>> Alaerte
>>
>> -----Original Message-----
>> From: ext Rodney Dunn [mailto:rodunn at cisco.com]
>> Sent: Tuesday, March 06, 2007 2:49 PM
>> To: Vidali Alaerte (Nokia-NET/RioDeJaneiro)
>> Cc: cisco-nsp at puck.nether.net
>> Subject: Re: [c-nsp] Resilience in order of few hundreds of 
>> milliseconds
>>
>> No way I can think of.
>>
>> Did you have something in mind?
>>
>> You have to have a backup path precomputed and ready to switch the 
>> frames to get that kind of failover.
>>
>> rodney
>>
>> On Tue, Mar 06, 2007 at 11:27:50AM -0600, Alaerte.Vidali at nokia.com
>> wrote:
>>>  Hi,
>>>
>>> Looking for alternatives of fast recovery of MPLS under failure 
>>> without using FRR.
>>> Any input appreciated.
>>>
>>> (BFD and OSPF timer tuning already considered)
>>>
>>> Best Regards,
>>> Alaerte
>>>
>>>
>>> _______________________________________________
>>> cisco-nsp mailing list  cisco-nsp at puck.nether.net 
>>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>>> archive at http://puck.nether.net/pipermail/cisco-nsp/
> 
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/