[c-nsp] Packet Memory buffer test detected errors
Nikolay Pavlov
quetzal at zone3000.net
Thu Jul 6 15:33:12 EDT 2006
On Thursday, 6 July 2006 at 17:48:20 +0100, David Freedman wrote:
> from DDTS CSCdz57255:
>
> --------------------------------------------------------------------
>
> Symptoms:
>
> Devices connected to a Catalyst 4000 or 4500 family IOS switch may
> sometimes
> experience poor network connectivity. Some packets are dropped by the
> switch
> due to a faulty component(sram) on the supervisor. Larger packets are
> affected
> more than the smaller ones. The transient sram failure is very rare and has
> been seen on only a very small number of supervisors. A similar problem has
> been identified on Catalyst 4000 Supervisor Engine II. Please see
> CSCdy46288
> for details.
>
> Note that poor connectivity can be a result of various network
> misconfigurations as well and replacing the supervisor in those cases
> will not
> fix the problem. Hence it is highly recommended that the following steps
> taken
> to confirm the transient SRAM problem.
>
> The following indications will be present. Please capture the Output of the
> below requested tests.
>
> 1. Successive iterations of the command show platform cpu
> packets statistics all for Cisco IOS Releases 12.1(12c)EW and
> higher will show the counter for VlanZeroBadCrc under the "Packets
> Dropped In
> Processing by Reason" steadily increasing. The increase over several
> minutes
> will be in the range of hundreds or thousands. If a small number of
> VlanZeroBadCrc are seen and the number is not increasing, then that is
> not an
> indication of a problem.
Heh.. Thanks for quick response Devid, but VlanZeroBadCrc is always zero
for me.. So intresting what kind of problem i have? :)
>
> For Cisco IOS Releases 12.1(11b)EW1 and lower, successive iterations of the
> command show platform cpuport all will show
> the VlanZeroBadCrc counter (for 12.1(11b)EW1) or the VlanZero counter
> (for 12.1
> (8a)EW1) under the "Packets Dropped In Processing by Reason" steadily
> increasing in the range of hundreds or thousands over several minutes.
>
> The following information should be captured immediately:
>
> show platform cpu packets statistics all or
> show
> platform cpuport all (several iterations)
> show platform software interface all (several
> iterations)
>
> 2. Perform a soft reset by issuing the reload
> command. In the presence of this bug, the Supervisor will fail POST. The
> POST
> results should be captured to a text file.
> 3. If the customer is running an image equal to or higher than 12.2(18)
> EW,capture the O/P for the following command : show diagnostics
> result module all detail
> 4. Perform a power cycle (power off/on) of the switch. On booting up, the
> Supervisor will pass POST and there will be no further symptoms of the
> problem. The POST results should be captured to a text file, and a
> show tech should be collected.
>
> All indications must be present in order to conclude that the problem
> was due
> to this bug.
>
> If you are running 12.2(18)EW or later and encounter a message as below : "%
> C4K_L3HWFORWARDING-3-FTECONSISTENCYCHECKFAILED: FwdTableEntry Consistency
> Check Failed: index 98339" then, most probably the supervisor has
> encountered
> an SRAM corruption for the memory used as "forwarding memory". Please
> refer to
> the bug CSCed49194.
>
>
> Conditions:
>
> This problem has been traced to an SRAM component failure which is
> transient
> in nature. The incidence of this failure is extremely rare and is well
> below
> the predicted failure rates for this component. If you believe that you
> have
> encountered this bug please open a case with the TAC (Technical Assistance
> Center) and attach all the above captured information to the case. Boards
> exhibiting this failure should be replaced using RMA.
>
>
> Workaround:
>
> For the software releases earlier than 12.1(19)EW, hard resetting the
> switch
> by powering it OFF & ON is the only workaround.
>
> For the software releases 12.1(19)EW and later but prior to 12.2(18)EW, the
> SRAM workaround incorporated is "partial". For the software release
> 12.1(21)E
> and earlier, the SRAM workaround is "partial" too. The software detects,
> logs
> and takes appropriate action, depending upon the configuration mode.
> Here the
> SRAM workaround can be configured in either of the 3 modes : normal,
> conservative or aggressive, using the following command :
>
> (config) diagnostic monitor action <conservative | normal |
> aggressive>
>
> conservative : Directed memory tests are not run, so does not reliably
> detect
> the failure. Does not reset the switch on error detection, but does
> syslog the
> message.
>
> normal : Directed memory tests are run, so reliably detects the failure.
> Does
> not reset the switch on error detection, but does syslog the message.
>
> aggressive : Directed memory tests are run, so reliably detects the
> failure.
> Soft-resets the switch on error detection & syslogs the message. On bootup,
> the supervisor remains in the faulty state. This action allows for either a
> redundant supervisor engine or network-level redundancy to take over.
>
> For software release 12.2(18)EW onwards, an SRAM workaround is
> incorporated to
> automatically detect the failure and take action to recover from the failed
> state depending upon the configuration. This SRAM workaround can be
> configured
> in any of the 3 modes : conservatve, normal or aggressive as described
> below:
>
> (config) diagnostic monitor action <conservative | normal |
> aggressive>
>
> conservative : Directed memory tests are not run, so does not reliably
> detect
> the failure. Does not reset the switch on error detection, but does
> syslog the
> message.
>
> normal : Directed memory tests are run, so reliably detects the failure. On
> detection of the failure, supervisor resets and on bootup, removes the
> affected memory from the usage and continues to function with the
> available "good" memory. It syslogs the message at regular intervals.
>
> aggressive : Directed memory tests are run, so reliably detects the
> failure.
> Soft-resets the switch on error detection & syslogs the message. On bootup,
> the supervisor fails to come online. This action allows for either a
> redundant
> supervisor engine or network-level redundancy to take over.
>
> For detailed explanation, please refer to the release-notes for the bug
> CSCed61591.
>
> In any case, on the detection of the problem, the supervisor needs to be
> RMA'ed.
>
> All diagnostics and all actions can be completely disabled (even if
> there is a
> standby supervisor present) with this CLI:
>
> (config) no diagnostic monitor action
>
>
>
> _______________________________________________
> cisco-nsp mailing list cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
--
=========================================================================
= Best regards, Nikolay Pavlov. <<<------------------------------------ =
=========================================================================
More information about the cisco-nsp
mailing list