[c-nsp] BFD bug in IOS SXF6

Rodney Dunn rodunn at cisco.com
Fri Jan 12 16:13:40 EST 2007


On Fri, Jan 12, 2007 at 04:11:49PM -0500, Chris Griffin wrote:
> Is lowering the process-max-task now considered best practices by Cisco 
> for situations where busy boxes deal with short duration timers? 

I don't know about best practice but for tight timer scnearios
and large scale control plane stuff I've had to do it.

> I know 
> our boxes get hammered when ISP peers go away and 100K or so prefixes 
> change direction :-)  This also causes tons of false positives on our 
> NMS boxes with their SNMP polling.  Running SXF here too.

It might very well help there.

Rodney

> 
> Thanks
> Chris
> 
> Rodney Dunn wrote:
> > On Fri, Jan 12, 2007 at 02:36:20PM -0500, Richard A Steenbergen wrote:
> >> On Fri, Jan 12, 2007 at 08:31:35AM -0500, Rodney Dunn wrote:
> >>> Guys,
> >>>
> >>> Can you please try any BFD test on 12.2(33)SRA latest code and not SXF?
> >>>
> >>> There were some large changes to BFD and scheduling to improve this
> >>> and they will not be back ported to 12.2(18)SXF.
> >> Did it get pushed down to the linecards?
> > 
> > Not yet.
> > 
> >> This is the only way I'd even 
> >> consider trusting it, as the RP CPU is thrashed far too often even under 
> >> the best of circumstances. I can't use the MSFC3 in a BGP edge role (few 
> >> transit feeds, a couple dozen peers, iBGP links, etc) even with policies 
> >> optimized for replication on anything less than almost all 180 sec 
> >> hold-times without getting false positive BGP session drops.
> > 
> > I'm not arguing with you but I'd like to see the data with lower timers
> > to show a false positive.
> > 
> > Provide a sniffer trade, debug ip packet for the peer when you see
> > the session drop. Make sure you are not seeing input queue drops
> > if you have a lot of routes or large number of peers. The idea was
> > more that the CPU should be handling a very small number of packets
> > unless there are features causing punts.
> > 
> > You can tune the process-max-task time down from 200 msec to 20 msec
> > to limit the time one process can run. 
> > 
> > But for BFD it's not the same anymore. It's not a standard run to completion
> > operation. There is a new concept of preemption that BFD uses to get
> > fast activation on a failure. And the actual BFD packets themselve are
> > not even handled at process level. They are consumed and accounted for
> > under interrupt level.  I need to read up more on exactly how it got
> > implemented but I know that was part of the changes and hardening in
> > SRA.
> > 
> > Rodney
> > 
> >  
> >> -- 
> >> Richard A Steenbergen <ras at e-gerbil.net>       http://www.e-gerbil.net/ras
> >> GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
> > _______________________________________________
> > cisco-nsp mailing list  cisco-nsp at puck.nether.net
> > https://puck.nether.net/mailman/listinfo/cisco-nsp
> > archive at http://puck.nether.net/pipermail/cisco-nsp/
> 
> -- 
> Chris Griffin                           cgriffin at ufl.edu
> Sr. Network Engineer - CCNP             Phone: (352) 392-2061
> CNS - Network Services                  Fax:   (352) 392-9440
> University of Florida/FLR               Gainesville, FL 32611


More information about the cisco-nsp mailing list