[c-nsp] RES: activ/standby cpu card status changed

Fri Feb 29 15:29:17 EST 2008

Nemeth,

Your SUP crashed because it failed over 10 consecutive TestSPRPInbandPing.
Get the fix/workaround for sc33990 below.

/eninja

CSCsc33990

Symptoms: A supervisor engine may unexpectedly reset when the
TestSPRPInbandPing as part of the Cisco Generic Online Diagnostics (GOLD)
fails for 10 consecutive times.

The following syslog error messages are typically generated right before the
supervisor engine resets, and can also be found in the crashinfo files:

%CONST_DIAG-SP-3-HM_TEST_FAIL: Module <slot#> TestSPRPInbandPing consecutive
failure count:5
%CONST_DIAG-SP-6-HM_TEST_INFO: CPU util(5sec): SP=10% RP=0% Traffic=0%
netint_thr_active[0], Tx_Rate[4412], Rx_Rate[0]
%CONST_DIAG-SP-3-HM_TEST_FAIL: Module <slot#> TestSPRPInbandPing consecutive
failure count:10
%CONST_DIAG-SP-6-HM_TEST_INFO: CPU util(5sec): SP=10% RP=0% Traffic=0%
netint_thr_active[0], Tx_Rate[4652], Rx_Rate[0]
%CONST_DIAG-SP-2-HM_SUP_CRSH: Supervisor crashed due to unrecoverable
errors, Reason: Failed TestSPRPInbandPing

Conditions: This symptom is observed on a Cisco Catalyst 6500 series switch
and Cisco 7600 series router that run an integrated Cisco IOS software
image. The trigger for the symptom may be possible corruption in TCAM
entries that are used to perform the TestSPRPInbandPing.

Workaround: Enter the no diagnostic crash global configuration command to
disable exceptions that are being triggered by failed diagnostic monitoring.
However, you should do this with discretion because it may also prevent the
system from taking proactive measure to mitigate problems that could impact
user traffic.

Further Information: The fix for this caveat is more of an enhancement
because it only prevents the system from being over-aggressive in taking
exceptions when the TestSPRPInbandPing fails under specific conditions.
Therefore, the fix for this caveat does not address all triggers that may
cause the TestSPRPInbandPing to fail. Please consult Cisco TAC for further
assistance if you experience the same problem after upgrading to a Cisco IOS
software image that contains the fix for this caveat.

On Fri, Feb 29, 2008 at 1:24 AM, Nemeth Laszlo <csirek at externet.hu> wrote:

> Hi!
>
> I put the crash file here:
>
> ftp://195.70.33.12/crashinfo_20080228-151329_cpu1
> ftp://195.70.33.12/crashinfo_20080228-151329_cpu2
>
>
> If anybody knows what was the problem, please don't silent it :)
>
> Possible it's an IOS problem?
>
> Thanks
> Laci
>
>
> Leonardo Gama Souza írta:
> > Hi.
> >
> > It sounds like your MSFC crashed.
> > You ought to look into the crashinfo file in order to figure out why.
> >
> > cheers,
> > Leonardo Gama.
> >
> > ------------------------------------------------------------------------
> > *De:* cisco-nsp-bounces at puck.nether.net em nome de Nemeth Laszlo
> > *Enviada:* qui 28/2/2008 13:43
> > *Para:* cisco-nsp at puck.nether.net
> > *Assunto:* [c-nsp] activ/standby cpu card status changed
> >
> > Hi!
> >
> > My 7604 router has 2 WS-SUP32-10GE-3B cpu card in RRP-PLUS mode.
> >
> > System image file is "sup-bootdisk:s3223-ipservices_wan-
> mz.122-18.SXF9.bin"
> >
> > I got this syslog messages and after it the cpu card changed the standby
> > mode to
> > active and active to standby. The cpu went at 100% through 15 minutes.
> > I saw a network L2 loop, but I don't know that this L2 loop problem
> > caused by
> > the CPU change, or the CPU change caused by the L2 loop. I use RSTP.
> > This router
> > and more other 2 are members of a litle 10G ring.
> >
> > I can't found this error messages on cisco.com.
> >
> > We has a similar problem on 1 january 2008 when happend a cpu state
> > change to
> > (cpu was 100% like now, other time the cpu goes on 0-2%).
> >
> > Any idea?
> >
> > Thanks
> > Laci
> >
> > core2#sh redundancy history  | inc state
> > Feb 28 16:13:33 *my state = ACTIVE(13) *peer state = DISABLED(1)
> > Feb 28 16:17:12 *my state = ACTIVE(13) *peer state = UNKNOWN(0)
> > Feb 28 16:17:21 *my state = ACTIVE(13) *peer state = STANDBY COLD(4)
> > Feb 28 16:18:09 *my state = ACTIVE(13) *peer state = STANDBY
> COLD-CONFIG(5)
> > Feb 28 16:18:19 *my state = ACTIVE(13) *peer state = STANDBY HOT(8)
> >
> > core2#sh redundancy switchover
> > Switchovers this system has experienced          : 1
> > Last switchover reason                           : Active crashed.
> > Uptime since this supervisor switched to active  : 8 weeks, 1 day, 18
> > hours, 50
> > minutes
> > Total system uptime from reload                  : 28 weeks, 1 day, 1
> > hour, 29
> > minutes
> >
> > core2#sh redundancy switchover history
> > Index  Previous  Current  Switchover             Switchover
> >         active    active   reason                 time
> > -----  --------  -------  ----------             ----------
> >     1       1        2     active unit failed     22:44:19 MET Tue Jan 1
> > 2008
> >
> >
> >
> > *Feb 28 16:11:12 MET: %CONST_DIAG-SP-STDBY-3-HM_TEST_FAIL: Module 1
> > TestSPRPInbandPing consecutive failure count:7
> > *Feb 28 16:11:12 MET: %CONST_DIAG-SP-STDBY-6-HM_TEST_INFO: CPU
> > util(5sec): SP=7%
> > RP=0% Traffic=0%
> > netint_thr_active[0], Tx_Rate[70], Rx_Rate[4946], dev=1[IPv4, fail=7]
> > *Feb 28 16:13:12 MET: %CONST_DIAG-SP-STDBY-3-HM_TEST_FAIL: Module 1
> > TestSPRPInbandPing consecutive failure count:14
> > *Feb 28 16:13:12 MET: %CONST_DIAG-SP-STDBY-6-HM_TEST_INFO: CPU
> > util(5sec): SP=2%
> > RP=0% Traffic=0%
> > netint_thr_active[0], Tx_Rate[70], Rx_Rate[8290], dev=1[IPv4, fail=14]
> > Feb 28 16:13:33 MET: %LINEPROTO-5-UPDOWN: Line protocol on Interface
> > TenGigabitEthernet1/1, changed state to down
> > Feb 28 16:13:33 MET: %BGP-5-ADJCHANGE: neighbor xx.xxx.xxx.xxx Down
> > Interface flap
> > Feb 28 16:13:33 MET: %PFREDUN-SP-6-ACTIVE: Standby processor removed or
> > reloaded, changing to Simplex mode
> > Feb 28 16:13:33 MET: %LINK-SP-3-UPDOWN: Interface TenGigabitEthernet1/1,
> > changed
> > state to down
> > Feb 28 16:13:33 MET: %LINEPROTO-SP-5-UPDOWN: Line protocol on Interface
> > TenGigabitEthernet1/1, changed state to down
> > Feb 28 16:17:11 MET: %PFREDUN-SP-6-ACTIVE: Standby initializing for
> > RPR-PLUS mode
> > Feb 28 16:17:11 MET: %SYS-SP-3-LOGGER_FLUSHED: System was paused for
> > 00:00:00 to
> > ensure console debugging output.
> >
> > -
> > _______________________________________________
> > cisco-nsp mailing list  cisco-nsp at puck.nether.net
> > https://puck.nether.net/mailman/listinfo/cisco-nsp
> > archive at http://puck.nether.net/pipermail/cisco-nsp/
> >
>
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
>