[c-nsp] C6k diag failure in lab, need to worry?
Sukumar Subburayan (sukumars)
sukumars at cisco.com
Thu Apr 10 13:14:03 EDT 2008
Agreed. We will strive to do whatever we can.
Just want to point out that this is not a "crash", but a second reset on
bootup.
As Peter pointed out, this extends the bootup time in the 1% bootup
case, it can happen.
sukumar
> -----Original Message-----
> From: e ninja [mailto:eninja at gmail.com]
> Sent: Wednesday, April 09, 2008 11:11 PM
> To: Sukumar Subburayan (sukumars)
> Cc: Peter Rathlev; cisco-nsp
> Subject: Re: [c-nsp] C6k diag failure in lab, need to worry?
>
> Sukumar,
>
> " You can ignore this one, as it _should_ not have any
> impact, after the second reload." is not an acceptable answer.
>
> 1 crash in every 100 reboots = 1 million crashes out of every
> 100 million reboots. In our quest for perfection, we should
> strive to investigate and rectify every unexpected deviation
> from the norm.
>
> Peter,
>
> Open a TAC case and submit all the captures for Cisco BU to
> investigate and rectify so that all other customers can
> benefit from the solution.
>
> /eninja
>
>
>
> On Wed, Apr 9, 2008 at 10:16 AM, Sukumar Subburayan
> (sukumars) <sukumars at cisco.com> wrote:
>
>
> Peter,
>
> You can ignore this one, as it should not have any
> impact, after the
> second reload.
>
> We have seen this very rarely (once in 100+ reboots, on very few
> systems), where an ASIC was not intialized properly,
> and diagnostics was catching the condition, and resetting the
> supervisor.
>
> sukumar
>
>
>
>
>
> > -----Original Message-----
> > From: cisco-nsp-bounces at puck.nether.net
> > [mailto:cisco-nsp-bounces at puck.nether.net] On Behalf
> Of Peter Rathlev
> > Sent: Wednesday, April 09, 2008 8:40 AM
> > To: cisco-nsp
> > Subject: [c-nsp] C6k diag failure in lab, need to worry?
> >
> > 'ello,
> >
> > We just had a "funny" experience with a C6k/720 in our lab.
> > We were testing SXF13 AIS, and during a reload we saw
> the following:
> >
> > 00:01:36: %SCHED-SP-7-WATCH: Attempt to monitor uninitialized
> > watched bitfield (address 0).
> > -Process= "Shutdown", ipl= 0, pid= 256
> > -Traceback= 402C3A18 404ED840 4029C954 4029C940
> > 00:01:40: %DIAG-SP-3-MAJOR: Module 5: Online Diagnostics
> > detected a Major Error.
> > Please use 'show diagnostic result <target>' to see
> test results.
> > 00:01:40: %CONST_DIAG-SP-3-BOOTUP_TEST_FAIL: Module 5:
> > TestAclDeny failed
> > 00:01:41: %OIR-SP-6-INSCARD: Card inserted in slot 5,
> > interfaces are now online Reload scheduled for 07:05:31 PST
> > Wed Apr 9 2008 (in 13 seconds)
> >
> > Module 5 is the supervisor. Afterwards it reloaded and didn't
> > do it again, also across several reboots. It's a Sup720-3B
> > with a single WS-X6708-10GE and a WS-SVC-FWM-1. It never
> > reaches starting GOLD for the DFC.
> >
> > I didn't have the time to do the "show diagnostics result"
> > before reboot, and afterwards it say it never got a failure
> > on TestAclDeny:
> >
> > fw1#sh diagnostic res mod 5 test 18 det
> > Current bootup diagnostic level: minimal
> > Test results: (. = Pass, F = Fail, U = Untested)
> > ______________________________________________________________
> > _____________
> > 18) TestAclDeny ---------------------> .
> > Error code ------------------> 3 (DIAG_SKIPPED)
> > Total run count -------------> 1
> > Last test execution time ----> Apr 09 2008 07:08:26
> > First test failure time -----> n/a
> > Last test failure time ------> n/a
> > Last test pass time ---------> Apr 09 2008 07:08:26
> > Total failure count ---------> 0
> > Consecutive failure count ---> 0
> > ______________________________________________________________
> > _____________
> > fw1#
> >
> > None of the other tests show any failures either: "show
> > diagnostics result module 5 detail | incl failure" gives only
> > "0" and "n/a" stats. I can do "diagnostic start module 5 test
> > 18" all I want and no failures by the way, just getting
> > "%DIAG-SP-6-TEST_OK: Module 5: TestAclDeny{ID=18} has
> > completed successfully" and no problems.
> >
> > Is this something we should try and dig into, reporting it to
> > TAC? Or should we just ignore this ~5 min delay in a lab
> > reboot? We can't seem to reproduce it. :'(
> >
> > The box had just been "upgraded" to SXF13 AES shortly before
> > (from SXF6
> > AIS) due to some miscommunications, and this was the first
> > boot on SXF13 AIS, but I can't imagine this can have
> any impact.
> >
> > Regards,
> > Peter
> >
> >
> > _______________________________________________
> > cisco-nsp mailing list cisco-nsp at puck.nether.net
> > https://puck.nether.net/mailman/listinfo/cisco-nsp
> > archive at http://puck.nether.net/pipermail/cisco-nsp/
> >
> _______________________________________________
> cisco-nsp mailing list cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
>
>
>
>
More information about the cisco-nsp
mailing list