[c-nsp] RES: activ/standby cpu card status changed]
Nemeth Laszlo
csirek at externet.hu
Mon Mar 3 05:08:10 EST 2008
Hello!
I found this bug on Cisco TAC this error fixed in 12.2(18)SXF2.
But i use s3223-ipservices_wan-mz.122-18.SXF9.bin, so this bug lives again?
Thanks
Laci
e ninja írta:
> Nemeth,
>
> Your SUP crashed because it failed over 10 consecutive
> TestSPRPInbandPing. Get the fix/workaround for sc33990 below.
>
> /eninja
>
>
> CSCsc33990
>
> Symptoms: A supervisor engine may unexpectedly reset when the
> TestSPRPInbandPing as part of the Cisco Generic Online Diagnostics
> (GOLD) fails for 10 consecutive times.
>
> The following syslog error messages are typically generated right before
> the supervisor engine resets, and can also be found in the crashinfo files:
>
> %CONST_DIAG-SP-3-HM_TEST_FAIL: Module <slot#> TestSPRPInbandPing
> consecutive failure count:5
> %CONST_DIAG-SP-6-HM_TEST_INFO: CPU util(5sec): SP=10% RP=0% Traffic=0%
> netint_thr_active[0], Tx_Rate[4412], Rx_Rate[0]
> %CONST_DIAG-SP-3-HM_TEST_FAIL: Module <slot#> TestSPRPInbandPing
> consecutive failure count:10
> %CONST_DIAG-SP-6-HM_TEST_INFO: CPU util(5sec): SP=10% RP=0% Traffic=0%
> netint_thr_active[0], Tx_Rate[4652], Rx_Rate[0]
> %CONST_DIAG-SP-2-HM_SUP_CRSH: Supervisor crashed due to unrecoverable
> errors, Reason: Failed TestSPRPInbandPing
>
> Conditions: This symptom is observed on a Cisco Catalyst 6500 series
> switch and Cisco 7600 series router that run an integrated Cisco IOS
> software image. The trigger for the symptom may be possible corruption
> in TCAM entries that are used to perform the TestSPRPInbandPing.
>
> Workaround: Enter the no diagnostic crash global configuration command
> to disable exceptions that are being triggered by failed diagnostic
> monitoring. However, you should do this with discretion because it may
> also prevent the system from taking proactive measure to mitigate
> problems that could impact user traffic.
>
> Further Information: The fix for this caveat is more of an enhancement
> because it only prevents the system from being over-aggressive in taking
> exceptions when the TestSPRPInbandPing fails under specific conditions.
> Therefore, the fix for this caveat does not address all triggers that
> may cause the TestSPRPInbandPing to fail. Please consult Cisco TAC for
> further assistance if you experience the same problem after upgrading to
> a Cisco IOS software image that contains the fix for this caveat.
>
>
>
>
>
> On Fri, Feb 29, 2008 at 1:24 AM, Nemeth Laszlo <csirek at externet.hu
> <mailto:csirek at externet.hu>> wrote:
>
> Hi!
>
> I put the crash file here:
>
> ftp://195.70.33.12/crashinfo_20080228-151329_cpu1
> ftp://195.70.33.12/crashinfo_20080228-151329_cpu2
>
>
> If anybody knows what was the problem, please don't silent it :)
>
> Possible it's an IOS problem?
>
> Thanks
> Laci
>
>
> Leonardo Gama Souza írta:
> > Hi.
> >
> > It sounds like your MSFC crashed.
> > You ought to look into the crashinfo file in order to figure out why.
> >
> > cheers,
> > Leonardo Gama.
> >
> >
> ------------------------------------------------------------------------
> > *De:* cisco-nsp-bounces at puck.nether.net
> <mailto:cisco-nsp-bounces at puck.nether.net> em nome de Nemeth Laszlo
> > *Enviada:* qui 28/2/2008 13:43
> > *Para:* cisco-nsp at puck.nether.net <mailto:cisco-nsp at puck.nether.net>
> > *Assunto:* [c-nsp] activ/standby cpu card status changed
> >
> > Hi!
> >
> > My 7604 router has 2 WS-SUP32-10GE-3B cpu card in RRP-PLUS mode.
> >
> > System image file is
> "sup-bootdisk:s3223-ipservices_wan-mz.122-18.SXF9.bin"
> >
> > I got this syslog messages and after it the cpu card changed the
> standby
> > mode to
> > active and active to standby. The cpu went at 100% through 15
> minutes.
> > I saw a network L2 loop, but I don't know that this L2 loop problem
> > caused by
> > the CPU change, or the CPU change caused by the L2 loop. I use RSTP.
> > This router
> > and more other 2 are members of a litle 10G ring.
> >
> > I can't found this error messages on cisco.com <http://cisco.com>.
> >
> > We has a similar problem on 1 january 2008 when happend a cpu state
> > change to
> > (cpu was 100% like now, other time the cpu goes on 0-2%).
> >
> > Any idea?
> >
> > Thanks
> > Laci
> >
> > core2#sh redundancy history | inc state
> > Feb 28 16:13:33 *my state = ACTIVE(13) *peer state = DISABLED(1)
> > Feb 28 16:17:12 *my state = ACTIVE(13) *peer state = UNKNOWN(0)
> > Feb 28 16:17:21 *my state = ACTIVE(13) *peer state = STANDBY COLD(4)
> > Feb 28 16:18:09 *my state = ACTIVE(13) *peer state = STANDBY
> COLD-CONFIG(5)
> > Feb 28 16:18:19 *my state = ACTIVE(13) *peer state = STANDBY HOT(8)
> >
> > core2#sh redundancy switchover
> > Switchovers this system has experienced : 1
> > Last switchover reason : Active crashed.
> > Uptime since this supervisor switched to active : 8 weeks, 1 day, 18
> > hours, 50
> > minutes
> > Total system uptime from reload : 28 weeks, 1 day, 1
> > hour, 29
> > minutes
> >
> > core2#sh redundancy switchover history
> > Index Previous Current Switchover Switchover
> > active active reason time
> > ----- -------- ------- ---------- ----------
> > 1 1 2 active unit failed 22:44:19 MET
> Tue Jan 1
> > 2008
> >
> >
> >
> > *Feb 28 16:11:12 MET: %CONST_DIAG-SP-STDBY-3-HM_TEST_FAIL: Module 1
> > TestSPRPInbandPing consecutive failure count:7
> > *Feb 28 16:11:12 MET: %CONST_DIAG-SP-STDBY-6-HM_TEST_INFO: CPU
> > util(5sec): SP=7%
> > RP=0% Traffic=0%
> > netint_thr_active[0], Tx_Rate[70], Rx_Rate[4946], dev=1[IPv4, fail=7]
> > *Feb 28 16:13:12 MET: %CONST_DIAG-SP-STDBY-3-HM_TEST_FAIL: Module 1
> > TestSPRPInbandPing consecutive failure count:14
> > *Feb 28 16:13:12 MET: %CONST_DIAG-SP-STDBY-6-HM_TEST_INFO: CPU
> > util(5sec): SP=2%
> > RP=0% Traffic=0%
> > netint_thr_active[0], Tx_Rate[70], Rx_Rate[8290], dev=1[IPv4,
> fail=14]
> > Feb 28 16:13:33 MET: %LINEPROTO-5-UPDOWN: Line protocol on Interface
> > TenGigabitEthernet1/1, changed state to down
> > Feb 28 16:13:33 MET: %BGP-5-ADJCHANGE: neighbor xx.xxx.xxx.xxx Down
> > Interface flap
> > Feb 28 16:13:33 MET: %PFREDUN-SP-6-ACTIVE: Standby processor
> removed or
> > reloaded, changing to Simplex mode
> > Feb 28 16:13:33 MET: %LINK-SP-3-UPDOWN: Interface
> TenGigabitEthernet1/1,
> > changed
> > state to down
> > Feb 28 16:13:33 MET: %LINEPROTO-SP-5-UPDOWN: Line protocol on
> Interface
> > TenGigabitEthernet1/1, changed state to down
> > Feb 28 16:17:11 MET: %PFREDUN-SP-6-ACTIVE: Standby initializing for
> > RPR-PLUS mode
> > Feb 28 16:17:11 MET: %SYS-SP-3-LOGGER_FLUSHED: System was paused for
> > 00:00:00 to
> > ensure console debugging output.
> >
> > -
> > _______________________________________________
> > cisco-nsp mailing list cisco-nsp at puck.nether.net
> <mailto:cisco-nsp at puck.nether.net>
> > https://puck.nether.net/mailman/listinfo/cisco-nsp
> > archive at http://puck.nether.net/pipermail/cisco-nsp/
> >
>
> _______________________________________________
> cisco-nsp mailing list cisco-nsp at puck.nether.net
> <mailto:cisco-nsp at puck.nether.net>
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
>
>
More information about the cisco-nsp
mailing list