[c-nsp] 7600 sup7203bxl error msg

Sukumar Subburayan sukumars at cisco.com
Thu Nov 9 13:36:45 EST 2006


I think, we have a bug here.

We wanted to add a syslog message only when we have excessive (n number of 
resets, with y seconds)  system  controller soft reset to warn user. This 
enhancement was requested via CSCed21601.

However, I looked at the bug, and looks like we are simply printing this 
syslog message everytime we are soft resetting the system controller, 
which is not correct.

For now, you can ignore the 2nd message. We will fix this in an upcoming 
release.

Also, there is no correlation between 'Mistral Error Interrupt' and IBC 
resets in 'show ibc'. So, they don't have to match.

sukumar

On Thu, 9 Nov 2006, Dale W. Carder wrote:

> We too got this error on a SXF4 router last night.  Guess you're
> not alone :-)
>
> Nov  9 05:25:34: %SYSTEM_CONTROLLER-3-ERROR: Error condition detected: 
> TM_DATA_PARITY_ERROR
> Nov  9 05:25:34: %SYSTEM_CONTROLLER-3-EXCESSIVE_RESET: System Controller 
> is getting reset so frequently
>
> In our case, it looks like the problem came from the RP instead
> of the SP.  From "sh ibc" it looks like the number of IBC resets
> that occurred was 2, and any other error counter is 0.  "sh stack"
> says the Mistral Error Interrupt process has only been called once.
>
> Dale
>
> ----------------------------------
> Dale W. Carder - Network Engineer
> University of Wisconsin at Madison
> http://net.doit.wisc.edu/~dwcarder
>
>
> On Nov 8, 2006, at 6:07 PM, Sukumar Subburayan wrote:
>> Everytime we get parity error in the system controller, we try to soft
>> reset the IBC and recover from the condition. Things should just be 
>> fine,
>> if this was a transient one off case.
>> 
>> However, if the issue is persistant, we will be constantly resetting the
>> IBC and hence dropping packets. The second syslog message is to warn you
>> that you are seeing excessive IBC resets.
>> 
>> According to your syslog your SP's system controller is what is 
>> reporting
>> the parity error.
>> 
>> Is the output of 'show ibc' below from the SP-side?
>> 
>> If not, can you get us 'remote command switch show ibc' .
>> 
>> sukumar
>> 
>> 
>> 
>> On Wed, 8 Nov 2006, matt carter wrote:
>> 
>>> 
>>> hey all,
>>> 
>>> hoping someone may have an insight into some errors i have not seen 
>>> before
>>> 
>>> Nov  7 11:24:46.535 AEST: %SYSTEM_CONTROLLER-SP-3-ERROR: Error 
>>> condition
>>> detected: TM_DATA_PARITY_ERROR
>>> 
>>> Nov  7 11:24:46.535 AEST: %SYSTEM_CONTROLLER-SP-3-EXCESSIVE_RESET: 
>>> System
>>> Controller is getting reset so frequently
>>> 
>>> first one is fair enough
>>> 
>>> Explanation    The most common errors from the Mistral ASIC on the
>>> supervisor engine are TM_DATA_PARITY_ERROR and TM_NPP_PARITY_ERROR. 
>>> Possible
>>> causes of these parity errors are random static discharge or other 
>>> external
>>> factors.
>>> Recommended Action    If the error message is only seen once (or 
>>> rarely),
>>> the recommendation is to monitor the switch syslog to confirm the error
>>> message was an isolated incident. If these error messages are 
>>> reoccurring,
>>> open a case with the Technical Assistance Center
>>> 
>>> but the construct of the second error seems to suggest this is not a
>>> "isolated incident" since my system controller is being reset "so
>>> frequently" which kind of makes the log messages somewhat contradictory 
>>> in a
>>> fashion. when i go looking at the SP stack i can only see the mistral 
>>> error
>>> interrupt called once.
>>> 
>>> from show stack
>>> Interrupt level stacks:
>>> Level    Called Unused/Size  Name
>>> 5           1   7168/9000  Mistral Error Interrupt
>>> 
>>> when i check the ibc stats i can see the mistral hardware was 
>>> definately
>>> reset at the time of the incident, and has
>>> been reset 3 times, but there is 0 errors.
>>> 
>>> anyone have ideas on how to proceed in terms of catching this if it 
>>> happens
>>> again?
>>> 
>>> from show ibc
>>> Interface information:
>>>       Interface IBC0/0(idb 0x43090238)
>>>       Hardware is Mistral IBC (revision 5)
>>>       5 minute rx rate 5000 bits/sec, 11 packets/sec
>>>       5 minute tx rate 12000 bits/sec, 22 packets/sec
>>>       4900820 packets input, 317526199 bytes
>>>       4788796 broadcasts received
>>>       10253179 packets output, 730435109 bytes
>>>       68213 broadcasts sent
>>>       0 Packets CEF Switched, 0 Packets Fast Switched
>>>       0 Packets SLB Switched, 0 Packets CWAN Switched
>>>       IBC resets   = 3; last at 11:24:46.535 AEST Fri Nov 7 2006
>>> MISTRAL ERROR COUNTERS
>>>       System address timeouts  = 0     BUS errors     = 0
>>>       IBC Address timeouts     = 0 (addr 0x0)
>>>       Page CRC errors          = 0     IBL CRC errors = 0
>>>       ECC Correctable errors   = 0
>>>       Packets with padding removed (0/0/0)   = 0
>>>       Packets expanded (0/0)   = 0
>>>       Packets attempted tail end expansion > 1 page and were dropped = 
>>> 0
>>>       IP packets dropped with frag offset of 1 = 0
>>>       0 packets (aggregate) dropped on throttled interfaces
>>>       Hazard Illegal packet length     = 0     Illegal Offset       = 0
>>>       Hazard Packet underflow          = 0     Packet Overflow      = 0
>>>       IBL fill hang count              = 0     Unencapsed packets   = 0
>>>       LBIC RXQ Drop pkt count = 0            LBIC drop pkt count  = 0
>>>       LBIC Drop pkt stick     = 0


More information about the cisco-nsp mailing list