[c-nsp] BFD bug in IOS SXF6
Rodney Dunn
rodunn at cisco.com
Fri Jan 12 16:13:40 EST 2007
On Fri, Jan 12, 2007 at 04:11:49PM -0500, Chris Griffin wrote:
> Is lowering the process-max-task now considered best practices by Cisco
> for situations where busy boxes deal with short duration timers?
I don't know about best practice but for tight timer scnearios
and large scale control plane stuff I've had to do it.
> I know
> our boxes get hammered when ISP peers go away and 100K or so prefixes
> change direction :-) This also causes tons of false positives on our
> NMS boxes with their SNMP polling. Running SXF here too.
It might very well help there.
Rodney
>
> Thanks
> Chris
>
> Rodney Dunn wrote:
> > On Fri, Jan 12, 2007 at 02:36:20PM -0500, Richard A Steenbergen wrote:
> >> On Fri, Jan 12, 2007 at 08:31:35AM -0500, Rodney Dunn wrote:
> >>> Guys,
> >>>
> >>> Can you please try any BFD test on 12.2(33)SRA latest code and not SXF?
> >>>
> >>> There were some large changes to BFD and scheduling to improve this
> >>> and they will not be back ported to 12.2(18)SXF.
> >> Did it get pushed down to the linecards?
> >
> > Not yet.
> >
> >> This is the only way I'd even
> >> consider trusting it, as the RP CPU is thrashed far too often even under
> >> the best of circumstances. I can't use the MSFC3 in a BGP edge role (few
> >> transit feeds, a couple dozen peers, iBGP links, etc) even with policies
> >> optimized for replication on anything less than almost all 180 sec
> >> hold-times without getting false positive BGP session drops.
> >
> > I'm not arguing with you but I'd like to see the data with lower timers
> > to show a false positive.
> >
> > Provide a sniffer trade, debug ip packet for the peer when you see
> > the session drop. Make sure you are not seeing input queue drops
> > if you have a lot of routes or large number of peers. The idea was
> > more that the CPU should be handling a very small number of packets
> > unless there are features causing punts.
> >
> > You can tune the process-max-task time down from 200 msec to 20 msec
> > to limit the time one process can run.
> >
> > But for BFD it's not the same anymore. It's not a standard run to completion
> > operation. There is a new concept of preemption that BFD uses to get
> > fast activation on a failure. And the actual BFD packets themselve are
> > not even handled at process level. They are consumed and accounted for
> > under interrupt level. I need to read up more on exactly how it got
> > implemented but I know that was part of the changes and hardening in
> > SRA.
> >
> > Rodney
> >
> >
> >> --
> >> Richard A Steenbergen <ras at e-gerbil.net> http://www.e-gerbil.net/ras
> >> GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
> > _______________________________________________
> > cisco-nsp mailing list cisco-nsp at puck.nether.net
> > https://puck.nether.net/mailman/listinfo/cisco-nsp
> > archive at http://puck.nether.net/pipermail/cisco-nsp/
>
> --
> Chris Griffin cgriffin at ufl.edu
> Sr. Network Engineer - CCNP Phone: (352) 392-2061
> CNS - Network Services Fax: (352) 392-9440
> University of Florida/FLR Gainesville, FL 32611
More information about the cisco-nsp
mailing list