[c-nsp] Packet Memory buffer test detected errors

Nikolay Pavlov quetzal at zone3000.net
Fri Jul 14 04:56:13 EDT 2006


On Thursday,  6 July 2006 at 22:33:12 +0300, Nikolay Pavlov wrote:
> On Thursday,  6 July 2006 at 17:48:20 +0100, David Freedman wrote:

Hi David and list. 
I have a POST message for this device:

cscat4506-sw11-NY#more bootflash:post-2006.06.01.16.27.41-failed.txt

Power-on-self-test for Module 1:  WS-X4014
 Port/Test Status: (. = Pass, F = Fail)
 Reset Reason: Software/User


Port Traffic: L2 Serdes Loopback ...
 0: .  1: F  2: F  3: .  4: .  5: .  6: .  7: .  8: F  9: F 10: F 11: .
12: . 13: . 14: . 15: F 16: F 17: F 18: . 19: . 20: . 21: . 22: F 23: F
24: . 25: . 26: . 27: . 28: F 29: F 30: . 31: .


Port Traffic: L2 Asic Loopback ...
 0: F  1: F  2: F  3: F  4: F  5: F  6: F  7: F  8: F  9: F 10: F 11: F
12: F 13: F 14: F 15: F 16: F 17: F 18: F 19: F 20: F 21: F 22: F 23: F
24: F 25: F 26: F 27: F 28: F 29: F 30: F 31: F


Port Traffic: L3 Asic Loopback ...
 0: F  1: F  2: F  3: F  4: F  5: F  6: F  7: F  8: F  9: F 10: F 11: F
12: F 13: F 14: F 15: F 16: F 17: F 18: F 19: F 20: F 21: F 22: F 23: F
24: F 25: F 26: F 27: F 28: F 29: F 30: F 31: F au: .


Switch Subsystem Memory ...
 1: .  2: .  3: .  4: .  5: .  6: .  7: .  8: .  9: . 10: . 11: . 12: .
13: . 14: . 15: . 16: . 17: . 18: . 19: . 20: . 21: . 22: . 23: . 24: .
25: . 26: . 27: . 28: . 29: . 30: . 31: . 32: . 33: . 34: . 35: . 36: .
37: . 38: . 39: . 40: . 41: . 42: . 43: . 44: . 45: . 46: . 47: . 48: .
49: . 50: . 51: . 52: . 53: . 54: .


Module 1 Failed

I seems that it's only Supervisor module related problem is it?
Will it disappear if i install new one?


> > from DDTS CSCdz57255:
> > 
> > --------------------------------------------------------------------
> > 
> > Symptoms:
> > 
> > Devices connected to a Catalyst 4000 or 4500 family IOS switch may 
> > sometimes
> > experience poor network connectivity. Some packets are dropped by the 
> > switch
> > due to a faulty component(sram) on the supervisor. Larger packets are 
> > affected
> > more than the smaller ones. The transient sram failure is very rare and has
> > been seen on only a very small number of supervisors. A similar problem has
> > been identified on Catalyst 4000 Supervisor Engine II. Please see 
> > CSCdy46288
> > for details.
> > 
> > Note that poor connectivity can be a result of various network
> > misconfigurations as well and replacing the supervisor in those cases 
> > will not
> > fix the problem. Hence it is highly recommended that the following steps 
> > taken
> > to confirm the transient SRAM problem.
> > 
> > The following indications will be present. Please capture the Output of the
> > below requested tests.
> > 
> > 1. Successive iterations of the command show platform cpu
> > packets statistics all for Cisco IOS Releases 12.1(12c)EW and
> > higher will show the counter for VlanZeroBadCrc under the "Packets 
> > Dropped In
> > Processing by Reason" steadily increasing. The increase over several 
> > minutes
> > will be in the range of hundreds or thousands. If a small number of
> > VlanZeroBadCrc are seen and the number is not increasing, then that is 
> > not an
> > indication of a problem.
> 
> Heh.. Thanks for quick response Devid, but VlanZeroBadCrc is always zero
> for me.. So intresting what kind of problem i have? :)
> 
> > 
> > For Cisco IOS Releases 12.1(11b)EW1 and lower, successive iterations of the
> > command show platform cpuport all will show
> > the VlanZeroBadCrc counter (for 12.1(11b)EW1) or the VlanZero counter 
> > (for 12.1
> > (8a)EW1) under the "Packets Dropped In Processing by Reason" steadily
> > increasing in the range of hundreds or thousands over several minutes.
> > 
> > The following information should be captured immediately:
> > 
> > show platform cpu packets statistics all or
> > show
> > platform cpuport all (several iterations)
> > show platform software interface all (several
> > iterations)
> > 
> > 2. Perform a soft reset by issuing the reload
> > command. In the presence of this bug, the Supervisor will fail POST. The 
> > POST
> > results should be captured to a text file.
> > 3. If the customer is running an image equal to or higher than 12.2(18)
> > EW,capture the O/P for the following command : show diagnostics
> > result module all detail
> > 4. Perform a power cycle (power off/on) of the switch. On booting up, the
> > Supervisor will pass POST and there will be no further symptoms of the
> > problem. The POST results should be captured to a text file, and a
> > show tech should be collected.
> > 
> > All indications must be present in order to conclude that the problem 
> > was due
> > to this bug.
> > 
> > If you are running 12.2(18)EW or later and encounter a message as below : "%
> > C4K_L3HWFORWARDING-3-FTECONSISTENCYCHECKFAILED: FwdTableEntry Consistency
> > Check Failed: index 98339" then, most probably the supervisor has 
> > encountered
> > an SRAM corruption for the memory used as "forwarding memory". Please 
> > refer to
> > the bug CSCed49194.
> > 
> > 
> > Conditions:
> > 
> > This problem has been traced to an SRAM component failure which is 
> > transient
> > in nature. The incidence of this failure is extremely rare and is well 
> > below
> > the predicted failure rates for this component. If you believe that you 
> > have
> > encountered this bug please open a case with the TAC (Technical Assistance
> > Center) and attach all the above captured information to the case. Boards
> > exhibiting this failure should be replaced using RMA.
> > 
> > 
> > Workaround:
> > 
> > For the software releases earlier than 12.1(19)EW, hard resetting the 
> > switch
> > by powering it OFF & ON is the only workaround.
> > 
> > For the software releases 12.1(19)EW and later but prior to 12.2(18)EW, the
> > SRAM workaround incorporated is "partial". For the software release 
> > 12.1(21)E
> > and earlier, the SRAM workaround is "partial" too. The software detects, 
> > logs
> > and takes appropriate action, depending upon the configuration mode. 
> > Here the
> > SRAM workaround can be configured in either of the 3 modes : normal,
> > conservative or aggressive, using the following command :
> > 
> > (config) diagnostic monitor action <conservative | normal |
> > aggressive>
> > 
> > conservative : Directed memory tests are not run, so does not reliably 
> > detect
> > the failure. Does not reset the switch on error detection, but does 
> > syslog the
> > message.
> > 
> > normal : Directed memory tests are run, so reliably detects the failure. 
> > Does
> > not reset the switch on error detection, but does syslog the message.
> > 
> > aggressive : Directed memory tests are run, so reliably detects the 
> > failure.
> > Soft-resets the switch on error detection & syslogs the message. On bootup,
> > the supervisor remains in the faulty state. This action allows for either a
> > redundant supervisor engine or network-level redundancy to take over.
> > 
> > For software release 12.2(18)EW onwards, an SRAM workaround is 
> > incorporated to
> > automatically detect the failure and take action to recover from the failed
> > state depending upon the configuration. This SRAM workaround can be 
> > configured
> > in any of the 3 modes : conservatve, normal or aggressive as described 
> > below:
> > 
> > (config) diagnostic monitor action <conservative | normal |
> > aggressive>
> > 
> > conservative : Directed memory tests are not run, so does not reliably 
> > detect
> > the failure. Does not reset the switch on error detection, but does 
> > syslog the
> > message.
> > 
> > normal : Directed memory tests are run, so reliably detects the failure. On
> > detection of the failure, supervisor resets and on bootup, removes the
> > affected memory from the usage and continues to function with the
> > available "good" memory. It syslogs the message at regular intervals.
> > 
> > aggressive : Directed memory tests are run, so reliably detects the 
> > failure.
> > Soft-resets the switch on error detection & syslogs the message. On bootup,
> > the supervisor fails to come online. This action allows for either a 
> > redundant
> > supervisor engine or network-level redundancy to take over.
> > 
> > For detailed explanation, please refer to the release-notes for the bug
> > CSCed61591.
> > 
> > In any case, on the detection of the problem, the supervisor needs to be
> > RMA'ed.
> > 
> > All diagnostics and all actions can be completely disabled (even if 
> > there is a
> > standby supervisor present) with this CLI:
> > 
> > (config) no diagnostic monitor action
> > 
> > 
> > 
> > _______________________________________________
> > cisco-nsp mailing list  cisco-nsp at puck.nether.net
> > https://puck.nether.net/mailman/listinfo/cisco-nsp
> > archive at http://puck.nether.net/pipermail/cisco-nsp/
> 
> -- 
> ========================================================================= 
> = Best regards, Nikolay Pavlov. <<<------------------------------------ = 
> ========================================================================= 
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/

-- 
========================================================================= 
= Best regards, Nikolay Pavlov. <<<------------------------------------ = 
========================================================================= 


More information about the cisco-nsp mailing list