[c-nsp] WS-X6708-10GE FM-2-BAD-MESSAGE + traceback

Peter Rathlev peter at rathlev.dk
Mon May 25 19:54:35 EDT 2009


We just experienced something nasty during a C6kSup720 upgrade from
SXF13 to SXI1. We've been upgrading redundant nodes in three PoPs (eight
nodes total) with no problems except one. This node was taken offline
(set-overload-bit, lowered local pref etc.) and then reloaded, just like
all the other nodes who had not had any problems with this.

The last node came up but couldn't boot the 6708 card providing the core
connections. The console put out messages like this:

-Traceback= 425EABD4 42C2417C 42C24330 4134B0BC 4134B0A8
003593: May 25 20:27:30.854 CEST: %FM-2-BAD_MESSAGE: Error in internal messaging - context: 0x53E4ECA0, result: 0, reply_pak:0x0, slot6, online_status: ONLINE

The traceback line was always the same. The "context" part of the
FM-2-BAD-MESSAGE line changed for every line, but some values were
repeated between non adjacent lines. The switch logged hundreds of
message almost all at once (i.e. within ~100 ms).

The 6708 module ended up in "PwrDown" state. I tried booting it again
(with "power enable") but just ended up in the same place with the same
messages. A full power down of the whole chassis resolved the problem.

All this wouldn't have been a problem in itself; the redundant node was
providing the relevant services while the failed node was down. The
nasty bit was that the failed node actually interfered in the network.
Example: Even though configured with "standby preempt delay minimum 300"
the node tried to take over HSRP gateway functionality. We've also seen
evidence of some kind of corruption in L2 switching. Even though the
6708 module never actually came online some of the neighbors saw
interfaces as up/up.

The question is then: Should we look more into this? We don't have much
spare time, so if we can safely assume this was a "one off" we'll just
let it be at that. Cisco.com says the "FM-2-BAD-MESSAGE" is a software
error, but not much else. We haven't (yet) had time to look at the show
tech output but will do so in the near future.

Any input much appreciated.


More information about the cisco-nsp mailing list