[c-nsp] 3550-12 interrupts out of control, possibly hardware?
Andy Dills
andy at xecu.net
Thu Aug 16 12:36:19 EDT 2012
I've got a customer with a weird situation.
They have a pretty straightforward setup, two 7200s fronting two cisco
3550-12s, distributing to a series of 48 port 3550s. It's a bit dated, but
works very well for their needs.
They have one special network attached to (only) one of the copper gige
ports on (one of) the 3550-12s which gets a decent amount of traffic
(~100mbps or so). It's a layer 3 connection.
Well, one of their 3550-12s died, taking down that network. They moved the
IP configuration of the port and moved the cable immediately, restoring
service, and racked/configured a replacement switch, but left that network
on the second 3550-12, as it seemed fine.
However, once it began to come under load this morning, the CPU pegged
(80-99%, normally at 1-2%), causing packet drops and latency.
At that point I got involved, and for the life of me I can't figure out
why this happened. Clearly it's interrupts, as there were no processes in
the "sh proc cpu" that had more than 1% of CPU. However, cef was working
fine, everything looked normal in terms of the traditional interrupt-based
troubleshooting.
So, after scratching our heads for a bit, I had them move the connection
back to the original, newly-replaced switch. Note that these switches are
configured 100% identically with the exception of IP address and hostname.
Same IOS versions. I mean literally, if you diff the two in rancid, those
are the only config changes.
Zero problems from the point they moved the connection off of the switch
in question, both switches now have 1-2% CPU and things are humming along
fine.
So, my question is: What could be the possible causes of this? Could this
be a symptom of failing hardware, perhaps some bad memory requiring
constant CPU corrections?
Thanks,
Andy
---
Andy Dills
Xecunet, Inc.
www.xecu.net
301-682-9972
---
More information about the cisco-nsp
mailing list