[c-nsp] %IPC-SPSTBY-5-WATERMARK errors on dual-sup 6500 & SXI

Eninja eninja at gmail.com
Fri May 1 23:10:26 EDT 2009


Phil,

This doesn't seem like a hardware issue.

The answers are in the IOS errors - eXternal Data Representation - XDR  
- used by IPC for RP-to-RP and RP-LC communication failed to allocate  
memory to XDR which was probably carrying keepalive messages between  
the RPs when it choked causing the other RP not to receive keepalive  
responses and thus forcing a crash (as designed) to take over on the  
assumption the active RP was down.

Inform Cisco TAC to look into IPC buffers, memory allocation and bugs  
(because IOS should do a better job at allocating memory to this  
rather critical housekeeping function)

In the meantime, you may want to start reviewing what's different  
between your other working boxes and this one with regards to IPC  
(what do your 'show ipc ....' say?) and IOS image.

Eninja


On Apr 30, 2009, at 5:32 PM, Phil Mayers <p.mayers at imperial.ac.uk>  
wrote:

> All,
>
> We have a chassi with 2x sup720-3B and running SXI that, for the  
> second time, appears to have "lost" the standby SUP to the above  
> error messages.
>
> The first time, the pattern was:
>
> Mar 17 17:24:37.378 GMT: %XDR-6-XDRIPCNOTIFY: Message not sent to  
> slot 6/0 (6) because of
> IPC error timeout. Disabling linecard. (Expected during linecard OIR  
> or system reloads)
> Mar 17 17:24:42.826 GMT: %XDR-SPSTBY-3-XDRNOMEM: XDR failed to  
> allocate memory during ipcQ
> chunks creation.
> -Traceback= 40252F70 4025350C 40932AB8 40DD8E9C 40426BA8 40427068  
> 40427534 40427E38
> 40428608 40F465F4 40F3699C 40F36BB8 416E175C
>
> ...we did not notice these, but then a few days later the router began
> logging:
>
> Mar 21 07:17:51.798 GMT: %IPC-SPSTBY-5-WATERMARK: 1600 messages  
> pending in rcv for the
> port Card6/0:Request(2060000.7) seat 2060000
> Mar 21 07:18:21.967 GMT: %IPC-SPSTBY-5-WATERMARK: 1600 messages  
> pending in rcv for the
> port Card6/0:Request(2060000.7) seat 2060000
> Mar 21 07:18:52.126 GMT: %IPC-SPSTBY-5-WATERMARK: 1600 messages  
> pending in rcv for the
> port Card6/0:Request(2060000.7) seat 2060000
>
> ...with the number of IPC messages rising, basically forever.
>
> TAC advised a bunch of stuff that basically amounted to re-seating  
> the card, failing over to the sup to see if the sup or software was  
> faulty (yikes...), swapping the sups around in the slots, and so  
> forth. I re-seated the sup and it seemed stable, until a few days ago:
>
> Apr 21 01:26:18.815 BST: %RPC-SPSTBY-2-FAILED_USERHANDLE: Failed to  
> send
> RPC request online_diag_sp_request:get_rp_cpu_info
> -Traceback= 40252F70 4025350C 40B43D3C 410D8528 410FCEF8 4109B750
> 4109C550 4109D140 4109AAD0 4109A8E4 4088E6C0 4088E6AC
>
> ...then...
>
> Apr 24 08:18:46.367 BST: %IPC-SPSTBY-5-WATERMARK: 1600 messages  
> pending
> in rcv for the port Card6/0:Request(2060000.7) seat 2060000
>
> ...again, rising forever.
>
> I'm going to re-open the TAC case and see what they say, but I was  
> wondering if anyone had come across this. There are some similar- 
> sounding messages in the SXI release notes, but we've got other  
> identically-configured boxes that don't display these symptoms, so  
> I'm fearing a hardware fault (which would be ironic - this sup came  
> from Cisco in response to an RMA...)
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/


More information about the cisco-nsp mailing list