[c-nsp] BFD bug in IOS SXF6

Chris Griffin cgriffin at ufl.edu
Fri Jan 12 16:11:49 EST 2007


Is lowering the process-max-task now considered best practices by Cisco 
for situations where busy boxes deal with short duration timers?  I know 
our boxes get hammered when ISP peers go away and 100K or so prefixes 
change direction :-)  This also causes tons of false positives on our 
NMS boxes with their SNMP polling.  Running SXF here too.

Thanks
Chris

Rodney Dunn wrote:
> On Fri, Jan 12, 2007 at 02:36:20PM -0500, Richard A Steenbergen wrote:
>> On Fri, Jan 12, 2007 at 08:31:35AM -0500, Rodney Dunn wrote:
>>> Guys,
>>>
>>> Can you please try any BFD test on 12.2(33)SRA latest code and not SXF?
>>>
>>> There were some large changes to BFD and scheduling to improve this
>>> and they will not be back ported to 12.2(18)SXF.
>> Did it get pushed down to the linecards?
> 
> Not yet.
> 
>> This is the only way I'd even 
>> consider trusting it, as the RP CPU is thrashed far too often even under 
>> the best of circumstances. I can't use the MSFC3 in a BGP edge role (few 
>> transit feeds, a couple dozen peers, iBGP links, etc) even with policies 
>> optimized for replication on anything less than almost all 180 sec 
>> hold-times without getting false positive BGP session drops.
> 
> I'm not arguing with you but I'd like to see the data with lower timers
> to show a false positive.
> 
> Provide a sniffer trade, debug ip packet for the peer when you see
> the session drop. Make sure you are not seeing input queue drops
> if you have a lot of routes or large number of peers. The idea was
> more that the CPU should be handling a very small number of packets
> unless there are features causing punts.
> 
> You can tune the process-max-task time down from 200 msec to 20 msec
> to limit the time one process can run. 
> 
> But for BFD it's not the same anymore. It's not a standard run to completion
> operation. There is a new concept of preemption that BFD uses to get
> fast activation on a failure. And the actual BFD packets themselve are
> not even handled at process level. They are consumed and accounted for
> under interrupt level.  I need to read up more on exactly how it got
> implemented but I know that was part of the changes and hardening in
> SRA.
> 
> Rodney
> 
>  
>> -- 
>> Richard A Steenbergen <ras at e-gerbil.net>       http://www.e-gerbil.net/ras
>> GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/

-- 
Chris Griffin                           cgriffin at ufl.edu
Sr. Network Engineer - CCNP             Phone: (352) 392-2061
CNS - Network Services                  Fax:   (352) 392-9440
University of Florida/FLR               Gainesville, FL 32611


More information about the cisco-nsp mailing list