[c-nsp] %IPC-SPSTBY-5-WATERMARK errors on dual-sup 6500 & SXI
Phil Mayers
p.mayers at imperial.ac.uk
Thu Apr 30 12:32:01 EDT 2009
All,
We have a chassi with 2x sup720-3B and running SXI that, for the second
time, appears to have "lost" the standby SUP to the above error messages.
The first time, the pattern was:
Mar 17 17:24:37.378 GMT: %XDR-6-XDRIPCNOTIFY: Message not sent to slot
6/0 (6) because of
IPC error timeout. Disabling linecard. (Expected during linecard OIR or
system reloads)
Mar 17 17:24:42.826 GMT: %XDR-SPSTBY-3-XDRNOMEM: XDR failed to allocate
memory during ipcQ
chunks creation.
-Traceback= 40252F70 4025350C 40932AB8 40DD8E9C 40426BA8 40427068
40427534 40427E38
40428608 40F465F4 40F3699C 40F36BB8 416E175C
...we did not notice these, but then a few days later the router began
logging:
Mar 21 07:17:51.798 GMT: %IPC-SPSTBY-5-WATERMARK: 1600 messages pending
in rcv for the
port Card6/0:Request(2060000.7) seat 2060000
Mar 21 07:18:21.967 GMT: %IPC-SPSTBY-5-WATERMARK: 1600 messages pending
in rcv for the
port Card6/0:Request(2060000.7) seat 2060000
Mar 21 07:18:52.126 GMT: %IPC-SPSTBY-5-WATERMARK: 1600 messages pending
in rcv for the
port Card6/0:Request(2060000.7) seat 2060000
...with the number of IPC messages rising, basically forever.
TAC advised a bunch of stuff that basically amounted to re-seating the
card, failing over to the sup to see if the sup or software was faulty
(yikes...), swapping the sups around in the slots, and so forth. I
re-seated the sup and it seemed stable, until a few days ago:
Apr 21 01:26:18.815 BST: %RPC-SPSTBY-2-FAILED_USERHANDLE: Failed to send
RPC request online_diag_sp_request:get_rp_cpu_info
-Traceback= 40252F70 4025350C 40B43D3C 410D8528 410FCEF8 4109B750
4109C550 4109D140 4109AAD0 4109A8E4 4088E6C0 4088E6AC
...then...
Apr 24 08:18:46.367 BST: %IPC-SPSTBY-5-WATERMARK: 1600 messages pending
in rcv for the port Card6/0:Request(2060000.7) seat 2060000
...again, rising forever.
I'm going to re-open the TAC case and see what they say, but I was
wondering if anyone had come across this. There are some
similar-sounding messages in the SXI release notes, but we've got other
identically-configured boxes that don't display these symptoms, so I'm
fearing a hardware fault (which would be ironic - this sup came from
Cisco in response to an RMA...)
More information about the cisco-nsp
mailing list