[c-nsp] OPSF / BFD timer advise

Bruce Pinsky bep at whack.org
Sat Jul 14 16:01:38 EDT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Oliver Boehmer (oboehmer) wrote:
> James Worley <> wrote on Thursday, July 12, 2007 4:05 PM:
> 
>> Hi Phil
>>
>> We have about 40 6509-sup720 running
>> 's72033-advipservicesk9_wan-mz.122-18.SXF5' this particular site is
>> configured in a 4 core model.
> 
> SXF's BFD implementation is still subject to false alarms in case of
> CPUHOG or extensive high-cpu situations. You definitly want to do
> "process-max-time 50" to work around some of the issues, but the
> implementation is much more robust in SRA (and will be in SXH as well).
> 250x4 is is not that aggressive, but it can still fail.
> 
>> I would agree with you that a 1sec OSPF dead timer is over kill,
>> especially as we have sub second timers on BFD. My only question now
>> is what people suggest we set the OSPF timers to?
> 
> default. Running tuned OSPF hello/dead timers on a BFD protected link is
> over-engineered IMHO. BFD is much better suited to send/process fast
> hellos than any IGP will ever be. 
> 
>> As I understand BFD its able to detect link failure. The OSPF timer
>> still need to be quick enough to cause convergence should the problem
>> not be link failure or put another way as a back up to BFD.
> 
> You generally tune OSPF hello/dead to detect neighbor failures. This is
> done via BFD, so what's the point running fast hellos?
> 
> Did you also tune SPF/LSA throttle and the like? Quick neighbor
> detection doesn't help you achieving fast convergence if your OSPF nodes
> wait 5 seconds before even starting to calculate the new topology.
>  
>> We are not entirely sure what is causing the issue. The syslog only
>> show the OSPF neighbour as down. Strangely the outage can last a few
>> minutes before syslog reports the neighbour as back up and the M-VPNS
>> are back up. 
> 
> Well, your OSPF-5-ADJCHG should show something like "Neighbor Down: BFD
> node down" at the end if the error was detected by BFD, and "Neighbor
> Down: Dead timer expired" if OSPF detected this itself.
> 


And as Oli and others will tell you, fast hellos and BFD are simply
substitutes for slow (some would say poor) failure detection of the media.
 Your first line of defense should be to optimize the media failure
detection mechanisms.  What type of transport is between your core and
distribution layers where these OSPF adjacencies reside?

- --
=========
bep

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.4 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGmSuhE1XcgMgrtyYRApBiAJ9LEkkNpHwakYFO3zGnxwegrgCQzACffVFk
nhnAkKM+Dn1HUT7aYKzEb14=
=46jq
-----END PGP SIGNATURE-----


More information about the cisco-nsp mailing list