[c-nsp] Sup 720 - a very high number SP - RP crash

krunal shah krun.shah at gmail.com
Fri Jul 23 09:25:15 EDT 2010


hi,

I have been seeing a very high number of supervisor 720 (WS-SUP720) crashes
in many customer's environment. Bassically the SP stops receiving the heart
beats from RP.

Following error is very common reasons seen sometimes for SP and sometimes
for RP.

For SP

%CPU_MONITOR-SP-6-NOT_HEARD: CPU_MONITOR messages have not been heard for
150 seconds [6/1]
%CPU_MONITOR-SP-3-TIMED_OUT: CPU_MONITOR messages have failed, resetting
system [6/1]

For RP

%CPU_MONITOR-6-NOT_HEARD: CPU_MONITOR messages have not been heard for %d
seconds [%d/%d]

CPU monitor messages have not been detected for a significant amount of
time. [dec] is the number of seconds. A timeout is likely to occur soon,
which will reset the system. This error can be caused by a badly seated
module or by high traffic in the EOBC channel.

*Recommended Action: *Verify that all modules are seated properly in the
chassis. Pull out the module mentioned in the message and inspect the
backplane and module for bent pins or hardware damage. If the message
persists after reseating all the modules, a hardware problem may exist, such
as a defective module or chassis.
Is this common problem that anybody also seeing in their 6500s with sup720?
Is this a common hard defect with EOBC channel that blocks the communication
between RP and SP? If so what are the preventive actions ??

Krunal


More information about the cisco-nsp mailing list