RE: [nsp] GSR LCs in stuck in FABL RUN state

From: Martin, Christian (cmartin@gnilink.net)
Date: Tue Aug 21 2001 - 09:37:08 EDT


Well, I had to come in and rebuild the router (luckily the box was onsite!).
I started by installing new CSC's and removing the secondary GRP. Reloaded
and the same problem was occuring. I also started seeing LOS alarms for the
SFC in slot 19. Then the alarm would clear. So I installed a new GRP, and
all the linecards activated (IOS RUN) except 1(FABL RUN). So I copied the
original config on and reloaded once again, and Voila! The router booted
just fine. I took the GRP into the lab and installed it and the cards
stayed in FABL RUN state, so it looks as if the GRP got hosed during the fab
dnld stage. As I understand it, this can happend during microcode and
fabric-loader upgrades, as well as during field diags if the power is
recycled. Trouble is, I did soft reboots initially, and saw the issue. I
did go from 12.0(11)S3 to 12.0(17)S1, so this is in line with your
assertion, Majdi.

During the madness, I executed the whole gamut of maintenance commands -
test mbus power, hw-module slot x reload, upgrade slot x, diag x verbose,
show controller xbar, show controller sca, etc, etc. Most of these caused
the router to hang. I have opened a case with the TAC to see if they can
diagnose the problem further, but there is little information as nothing
ever crashed and the router is stable now. So, well see. The scariest
thing was that the just stayed in FABL RUN. It was as if they were not
synced with the SFC/CSCs, so they couldn't pull the IOS across the fabric.
Does anyone know if you can adjust the config register or some other value
to get verbose output during bootup? At least this would help me to
determine where things are hanging.

Anyway, thanks for the replies!

chris

> -----Original Message-----
> From: Majdi S. Abbas [mailto:msa@samurai.sfo.dead-dog.com]
> Sent: Tuesday, August 21, 2001 9:16 AM
> To: cisco-nsp@puck.nether.net
> Cc: 'cisco-nsp@puck.nether.net'
> Subject: Re: [nsp] GSR LCs in stuck in FABL RUN state
>
>
> On Tue, Aug 21, 2001 at 03:45:16AM -0400, Martin, Christian wrote:
> > Has anyone ever seen this? I just upgraded IOS on 4 12xxx.
> On the fourth,
> > the router hung while writing the config upon reboot. I
> reloaded the router
> > again, and noticed that some of the cards were in FABL RUN
> state for over a
> > half hour. I had 'service upgrade all' and the hidden
> 'service download-fl'
> > configured on all of them. So I removed these commands and
> reloaded again.
> > Now, I cannot get anything across the MBUS. Show diag
> returns errors. And
> > a show hardware displays this madness...
> >
> > WARNING: Old SCA found on CSC in slot 16
> > Contact your technical support representative
> > WARNING: Old SCA found on CSC in slot 17
> > Contact your technical support representative
> > RES-GTE-GSR2#
> >
> > I've never had any problems before with GSR upgrades. Any ideas?
>
> Let me guess, you upgraded to 12.0.16.6(S) or higher from 15S or
> below (13.5S also seems to do this)?
>
> Almost every time we do that, the CSCs start 'dying' and we have
> these problems. Cisco insists on having us RMA the CSCs, but
> it's just
> as easy to fix this problem in software:
>
> test mbus power X off (x == slot)
> test mbus power X on.
>
> Bounce the cards until they come up, then do an
> upgrade all slot X, and in config mode, microcode reload the card.
> Sometimes they'll stick again, but just bounce them until they come
> back up. After that they are usually okay.
>
> I have probably had to do this hundreds of times.
>
> --msa
>



This archive was generated by hypermail 2b29 : Sun Aug 04 2002 - 04:12:50 EDT