[c-nsp] 7600 sup7203bxl error msg

Wed Nov 8 19:07:30 EST 2006

Everytime we get parity error in the system controller, we try to soft 
reset the IBC and recover from the condition. Things should just be fine, 
if this was a transient one off case.

However, if the issue is persistant, we will be constantly resetting the 
IBC and hence dropping packets. The second syslog message is to warn you 
that you are seeing excessive IBC resets.

According to your syslog your SP's system controller is what is reporting 
the parity error.

Is the output of 'show ibc' below from the SP-side?

If not, can you get us 'remote command switch show ibc' .

sukumar

On Wed, 8 Nov 2006, matt carter wrote:

>
> hey all,
>
> hoping someone may have an insight into some errors i have not seen before
>
> Nov  7 11:24:46.535 AEST: %SYSTEM_CONTROLLER-SP-3-ERROR: Error condition
> detected: TM_DATA_PARITY_ERROR
>
> Nov  7 11:24:46.535 AEST: %SYSTEM_CONTROLLER-SP-3-EXCESSIVE_RESET: System
> Controller is getting reset so frequently
>
> first one is fair enough
>
> Explanation    The most common errors from the Mistral ASIC on the
> supervisor engine are TM_DATA_PARITY_ERROR and TM_NPP_PARITY_ERROR. Possible
> causes of these parity errors are random static discharge or other external
> factors.
> Recommended Action    If the error message is only seen once (or rarely),
> the recommendation is to monitor the switch syslog to confirm the error
> message was an isolated incident. If these error messages are reoccurring,
> open a case with the Technical Assistance Center
>
> but the construct of the second error seems to suggest this is not a
> "isolated incident" since my system controller is being reset "so
> frequently" which kind of makes the log messages somewhat contradictory in a
> fashion. when i go looking at the SP stack i can only see the mistral error
> interrupt called once.
>
> from show stack
> Interrupt level stacks:
> Level    Called Unused/Size  Name
>  5           1   7168/9000  Mistral Error Interrupt
>
> when i check the ibc stats i can see the mistral hardware was definately
> reset at the time of the incident, and has
> been reset 3 times, but there is 0 errors.
>
> anyone have ideas on how to proceed in terms of catching this if it happens
> again?
>
> from show ibc
> Interface information:
>        Interface IBC0/0(idb 0x43090238)
>        Hardware is Mistral IBC (revision 5)
>        5 minute rx rate 5000 bits/sec, 11 packets/sec
>        5 minute tx rate 12000 bits/sec, 22 packets/sec
>        4900820 packets input, 317526199 bytes
>        4788796 broadcasts received
>        10253179 packets output, 730435109 bytes
>        68213 broadcasts sent
>        0 Packets CEF Switched, 0 Packets Fast Switched
>        0 Packets SLB Switched, 0 Packets CWAN Switched
>        IBC resets   = 3; last at 11:24:46.535 AEST Fri Nov 7 2006
> MISTRAL ERROR COUNTERS
>        System address timeouts  = 0     BUS errors     = 0
>        IBC Address timeouts     = 0 (addr 0x0)
>        Page CRC errors          = 0     IBL CRC errors = 0
>        ECC Correctable errors   = 0
>        Packets with padding removed (0/0/0)   = 0
>        Packets expanded (0/0)   = 0
>        Packets attempted tail end expansion > 1 page and were dropped = 0
>        IP packets dropped with frag offset of 1 = 0
>        0 packets (aggregate) dropped on throttled interfaces
>        Hazard Illegal packet length     = 0     Illegal Offset       = 0
>        Hazard Packet underflow          = 0     Packet Overflow      = 0
>        IBL fill hang count              = 0     Unencapsed packets   = 0
>        LBIC RXQ Drop pkt count = 0            LBIC drop pkt count  = 0
>        LBIC Drop pkt stick     = 0
>
>
>
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
>