[c-nsp] 12.2(33)SRC*/SRD* Watchdog NMI Timeout Crash/BFD Issue

Mark Tinka mtinka at globaltransit.net
Wed Apr 15 06:07:56 EDT 2009


Hi all.

So we've been going back and forth on this issue with TAC, 
and I recall posting a few comments about it online several 
months back.

Here's an update for the archives and anyone that's 
interested:

So TAC and I initially worked through bug ID CSCek75694 
(Crash in Pseudo Preemption handler when BFD is configured) 
which linked over to bug ID CSCsq32269 (C7200 crash due to 
watchdog nmi). TAC came back to say this issue was fixed in 
12.2(33)SRC3, as well as other trains. However, this was not 
to be...

So we logged another case with TAC after SRC3 crashed on us 
the exact same way. We seem to have made some progress - bug 
ID CSCsz05181 (stack corruption crash with BFD configured) 
has just been filed.

To summarize, when BFD is enabled and some commands are run 
on a regular basis, e.g., show bootvar" and "show c7200", 
the router crashes. It is not guaranteed that the router 
will crash when these circumstances all come together, but 
the more often the commands are run, the greater the chance 
of the router crashing.

In our case, the regular execution of these commands is due 
to RANCID, hence the eventual cause of the crash.

The current workaround is to disable BFD (for us, RANCID 
takes higher priority).

But that's not all - we were wondering why, while the SRC* 
code for the NPE-G2 and 7201 are vulnerable to this bug, 
they have never once crashed, with BFD enabled and RANCID 
querying these platforms.

Well, it turns out this issue only affects MIPS-based 
processors. While the issue isn't exactly BFD-specific, 
currently, BFD is the only feature known to trigger it.

The reason the NPE-G2 and 7201 are not affected is because 
these platforms do not use MIPS processors.

Still no news on which release will carry the (final) fix, 
but I'm hoping SRC5 at least :-). SRD4 is also affected, for 
anyone that's running it.

Suggest not to run BFD on this code, for the time being 
(particularly on the NPE-G1).

Cheers,

Mark.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: This is a digitally signed message part.
URL: <https://puck.nether.net/pipermail/cisco-nsp/attachments/20090415/94c95912/attachment.bin>


More information about the cisco-nsp mailing list