[c-nsp] OT:SUSHI REGISTER RESET ERROR

Fri Aug 14 00:13:42 EDT 2009

Jack,

Several things can lead to the symptoms you describe. That is why it is
important you shed further light on the events that led to the problem. (i.e
what changed? Is this a lab or production device? sh captures? IOS
version??? etc)

When posting to public fora, it is always a good idea to describe recreate
steps to problems so that a clear picture of the issue is projected from the
get go to aid troubleshooting and resolution. This will also help the
manufacturer learn a thing or two about it and hopefully fix the root cause.

Anyhow, your SFC in slot 18 reported SUSHI errors which apparently
compromised the fabric integrity and removing it seem to have resolved the
problem. As designed, the backup CSC kicked in as a Switch Fabric Card and
relinquished its backup CSC duties thus the "nonredundant fabric" output you
see in sh cont fia. Your backup CSC will continue to function as an SFC and
your fabric will remain nonredundant until you install a working SFC in slot
18.

Each SFC/CSC card provides 10-Gbps full-duplex connection to all LCs
and 10-Gbps
switch fabric does not operate in one-quarter bandwidth mode.

http://www.cisco.com/en/US/docs/routers/12000/12016s/maintenance/guides/16084csa.html#wp56884
http://www.cisco.com/en/US/products/hw/routers/ps167/products_tech_note09186a00801e1da7.shtml

-Eninja

PS. Someone at Cisco's c12k team may want to check the code for notes on
when and why we call "SUSHI REGISTER RESET ERROR" and attempt a recreate of
this seemingly critical problem as it doesn't have a precedence - at least
in the public domain.

On Wed, Aug 12, 2009 at 10:15 PM, jack daniels <jckdaniels12 at gmail.com>wrote:

> Hi All,
>
> I found this error was coming on SLOT 18 which is SFC.
>
> EARLIER OUTPUT WAS -
>
> sh led
> SLOT 0  : RUN IOS
> SLOT 6  : WAITRTRY
> SLOT 7  :  RP ACTV
> SLOT 8  : INITMEM
> SLOT 9  : RUN IOS
> SLOT 15 : WAITRTRY
>
>
>
> FOR TROUBLESHOOT , then I saw -
>
> 1) output of sh gsr
>
>
>  Slot 18 type  = Switch Fabric Card 16XOC192
>         state = Card Powered<<<<<<<<<<<<<<<<<<<<<
> Slot 19 type  = Switch Fabric Card 16XOC192
>         state = Card Powered<<<<<<<<<<<<<<<<<<<<<<<
> Slot 20 type  = Switch Fabric Card 16XOC192
>         state = Card Powered<<<<<<<<<<<<<<<<<<<<<<<
>
>
> 2) Again executed show gsr command and found -
>
>
> Slot 17 type  = Clock Scheduler Card OC192 Dual Priority
>         state = Card NOT Powered; Power cycle fabric cards   PRIMARY
> CLOCK<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> Slot 18 type  = Switch Fabric Card 16XOC192
>         state = Card NOT Powered; Power cycle fabric
> cards<<<<<<<<<<<<<<<<<<<<<<<
> Slot 19 type  = Switch Fabric Card 16XOC192
>         state = Card NOT Po<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>
>
> 3) After shutting down SFC – slot 18 <<<<<<<<<<<<<<<<<<<<<<<<<<<
>
> sh led
> SLOT 0  : RUN IOS
> SLOT 6  : RUN IOS
> SLOT 7  :  RP STBY
> SLOT 8  :  RP ACTV
> SLOT 9  : RUN IOS
> SLOT 15 : RUN IOS
>
>
> At the moment all cards show powered up and in RUN IOS mode.
>
> 4) sh controller fia
> Fabric configuration: 10Gbps bandwidth, nonredundant
> fabric<<<<<<<<<<<<<<<<<<<<<<<
> Master Scheduler: Slot 17     Backup Scheduler: Slot 16
> Fab epoch no 235        Halt count 0
>
> From Fabric FIA Errors
> -----------------------
> redund overflow 0          cell drops 0
> cell parity     0
> Switch cards present    0x001B    Slots  16 17 19 20
> Switch cards monitored  0x001B    Slots  16 17 19 20
>
>
>
>
> CAN someone guide me why shutting down one SFC in slot 18  all LC 0,615 and
> 7 came in IOS RUN mode and started working.
>
> I think - Each LC is connected in 10 Gbps mode via 4 link to switch fabric
> . Now what I know is for full b/w mode 10 Gbps half duplex , you require
> atleast 2 SFC online working. But if you see all SFC went to power down and
> then  power up state , so why few LC cards were still online.
>
> Please ALSO guide - what is signiface of 2 SFC or 1 SFC running .
>
> Regards
>
> On 8/13/09, e ninja <eninja at gmail.com> wrote:
>>
>> Jack,
>>
>> What changed prior to the errors? Also, is this a lab or production
>> device?
>>
>> Either way, reply all (or unicast) the complete sh tech and sh log along
>> with a sh controller fia from an attach session to all LCs.
>>
>> -Eninja
>>
>>
>>  On Tue, Aug 11, 2009 at 10:54 PM, jack daniels <jckdaniels12 at gmail.com>wrote:
>>
>>> Hi all,
>>>
>>> I'm getting below error in gsr chassis 12416 , please suggest
>>>
>>> 048724: .Aug 11 20:09:13.853 IST: %MBUS-6-SWITCHED_FABCLK: Slot 9 primary
>>> clock switched to clock 0
>>> 048725: .Aug 11 20:09:17.191 IST: %FABRIC-3-ERR_HANDLE: Reconfigure all
>>> fabric cards due to SUSHI REGISTER RESET ERROR error from slot 18
>>> 048726: .Aug 11 20:09:18.067 IST: %MBUS-6-SWITCHED_FABCLK: Slot 0 primary
>>> clock switched to clock 0
>>> 048727: .Aug 11 20:09:18.067 IST: %MBUS-6-SWITCHED_FABCLK: Slot 9 primary
>>> clock switched to clock 0
>>> 048728: .Aug 11 20:09:21.413 IST: %FABRIC-3-ERR_HANDLE: Reconfigure all
>>> fabric cards due to SUSHI REGISTER RESET ERROR error from slot 18
>>> 048729: .Aug 11 20:09:22.289 IST: %MBUS-6-SWITCHED_FABCLK: Slot 0 primary
>>> clock switched to clock 0
>>> 048730: .Aug 11 20:09:22.289 IST: %MBUS-6-SWITCHED_FABCLK: Slot 9 primary
>>> clock switched to clock 0
>>> 048731: .Aug 11 20:09:25.627 IST: %FABRIC-3-ERR_HANDLE: Reconfigure all
>>> fabric cards due to SUSHI REGISTER RESET ERROR error from slot 18
>>> 048732: .Aug 11 20:09:26.502 IST: %MBUS-6-SWITCHED_FABCLK: Slot 0 primary
>>> clock switched to clock 0
>>> 048733: .Aug 11 20:09:26.502 IST: %MBUS-6-SWITCHED_FABCLK: Slot 9 primary
>>> clock switched to clock 0
>>> 048734: .Aug 11 20:09:29.841 IST: %FABRIC-3-ERR_HANDLE: Reconfigure all
>>> fabric cards due to SUSHI REGISTER RESET ERROR error from slot 18
>>> 048735: .Aug 11 20:09:30.716 IST: %MBUS-6-SWITCHED_FABCLK: Slot 0 primary
>>> clock switched to clock 0
>>> 048736: .Aug 11 20:09:30.716 IST: %MBUS-6-SWITCHED_FABCLK: Slot 9 primary
>>> clock switched to clock 0
>>> 048737: .Aug 11 20:09:34.054 IST: %FABRIC-3-ERR_HANDLE: Reconfigure all
>>> fabric cards due to SUSHI REGISTER RESET ERROR error from slot 18
>>>
>>>
>>> sh gsr
>>> Slot 0  type  = Modular SPA Interface Card
>>>        state = IOS RUN   Line Card Enabled
>>>        subslot 0/0: SPA-1X10GE-L-V2 (0x50C), status is ok
>>>        subslot 0/1: Empty
>>>        subslot 0/2: Empty
>>>        subslot 0/3: Empty
>>> Slot 6  type  = Modular SPA Interface Card
>>>        state = RTRYWAIT  Waiting to retry download after persistent
>>> failures
>>> Slot 7  type  = Performance Route Processor
>>>        state = ACTV RP   IOS Running  ACTIVE
>>> Slot 8  type  = Performance Route Processor
>>>        state = RP RDY    Route Processor Powered
>>> Slot 9  type  = Modular SPA Interface Card
>>>        state = IOS RUN   Line Card Enabled
>>>        subslot 9/0: Empty
>>>        subslot 9/1: Empty
>>>        subslot 9/2: Empty
>>>        subslot 9/3: Empty
>>> Slot 15 type  = Modular SPA Interface Card
>>>        state = RTRYWAIT  Waiting to retry download after persistent
>>> failures
>>> Slot 16 type  = Clock Scheduler Card OC192 Dual Priority
>>>        state = Card Powered
>>> Slot 17 type  = Clock Scheduler Card OC192 Dual Priority
>>>        state = Card Powered  PRIMARY CLOCK
>>> Slot 18 type  = Switch Fabric Card 16XOC192
>>>        state = Card Powered
>>> Slot 19 type  = Switch Fabric Card 16XOC192
>>>        state = Card Powered
>>> Slot 20 type  = Switch Fabric Card 16XOC192
>>>        state = Card Powered
>>> Slot 24 type  = Alarm Module(16)
>>>        state = Card Powered
>>> Slot 25 type  = Alarm Module(16)
>>>        state = Card Powered
>>> Slot 27 type  = Bus Board(16)
>>>        state = Card Powered
>>> Slot 28 type  = Blower Module(16)
>>>        state = Card Powered
>>> Slot 29 type  = Blower Module(16)
>>>
>>>        state = Card Powered
>>>
>>>
>>> sh led
>>> SLOT 0  : RUN IOS
>>> SLOT 6  : WAITRTRY
>>> SLOT 7  :  RP ACTV
>>> SLOT 8  : INITMEM
>>> SLOT 9  : RUN IOS
>>> SLOT 15 : WAITRTRY
>>>
>>> Regards
>>> _______________________________________________
>>> cisco-nsp mailing list  cisco-nsp at puck.nether.net
>>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>>> archive at http://puck.nether.net/pipermail/cisco-nsp/
>>>
>>
>>
>