[c-nsp] BFD bug in IOS SXF6
Chris Griffin
cgriffin at ufl.edu
Fri Jan 12 16:11:49 EST 2007
Is lowering the process-max-task now considered best practices by Cisco
for situations where busy boxes deal with short duration timers? I know
our boxes get hammered when ISP peers go away and 100K or so prefixes
change direction :-) This also causes tons of false positives on our
NMS boxes with their SNMP polling. Running SXF here too.
Thanks
Chris
Rodney Dunn wrote:
> On Fri, Jan 12, 2007 at 02:36:20PM -0500, Richard A Steenbergen wrote:
>> On Fri, Jan 12, 2007 at 08:31:35AM -0500, Rodney Dunn wrote:
>>> Guys,
>>>
>>> Can you please try any BFD test on 12.2(33)SRA latest code and not SXF?
>>>
>>> There were some large changes to BFD and scheduling to improve this
>>> and they will not be back ported to 12.2(18)SXF.
>> Did it get pushed down to the linecards?
>
> Not yet.
>
>> This is the only way I'd even
>> consider trusting it, as the RP CPU is thrashed far too often even under
>> the best of circumstances. I can't use the MSFC3 in a BGP edge role (few
>> transit feeds, a couple dozen peers, iBGP links, etc) even with policies
>> optimized for replication on anything less than almost all 180 sec
>> hold-times without getting false positive BGP session drops.
>
> I'm not arguing with you but I'd like to see the data with lower timers
> to show a false positive.
>
> Provide a sniffer trade, debug ip packet for the peer when you see
> the session drop. Make sure you are not seeing input queue drops
> if you have a lot of routes or large number of peers. The idea was
> more that the CPU should be handling a very small number of packets
> unless there are features causing punts.
>
> You can tune the process-max-task time down from 200 msec to 20 msec
> to limit the time one process can run.
>
> But for BFD it's not the same anymore. It's not a standard run to completion
> operation. There is a new concept of preemption that BFD uses to get
> fast activation on a failure. And the actual BFD packets themselve are
> not even handled at process level. They are consumed and accounted for
> under interrupt level. I need to read up more on exactly how it got
> implemented but I know that was part of the changes and hardening in
> SRA.
>
> Rodney
>
>
>> --
>> Richard A Steenbergen <ras at e-gerbil.net> http://www.e-gerbil.net/ras
>> GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
> _______________________________________________
> cisco-nsp mailing list cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
--
Chris Griffin cgriffin at ufl.edu
Sr. Network Engineer - CCNP Phone: (352) 392-2061
CNS - Network Services Fax: (352) 392-9440
University of Florida/FLR Gainesville, FL 32611
More information about the cisco-nsp
mailing list