[c-nsp] 3550-12 interrupts out of control, possibly hardware?

Andy Dills andy at xecu.net
Thu Aug 16 14:22:52 EDT 2012


In doing further investigation, looking at traffic graphs, I see that once 
they moved the network to the other switch, all of a sudden "vlan1" 
started seeing all of the traffic that was being routed to that network. 
Typically, the only traffic the switches see on vlan1 is traffic actually 
destined for the switch (config, ICMP, etc). And the switch they are 
currently on does not see any traffic on the vlan1, and once I had them 
move the connection, neither switch sees the big traffic spike on vlan1 
any longer.

This is quite odd, because as I mentioned, the two switches are configured 
the same...can anybody suggest an explanation or potential course of 
determining why the traffic that 

I'm wondering if it's some odd software bug relating to them enabling "ip 
routing" on that second switch last night, but not booting fresh after 
doing so. I just can't puzzle out in my head why traffic destined for an 
L3 port would transit the VLAN like that.

Thanks,
Andy

On Thu, 16 Aug 2012, Andy Dills wrote:

> 
> I've got a customer with a weird situation.
> 
> They have a pretty straightforward setup, two 7200s fronting two cisco 
> 3550-12s, distributing to a series of 48 port 3550s. It's a bit dated, but 
> works very well for their needs.
> 
> They have one special network attached to (only) one of the copper gige 
> ports on (one of) the 3550-12s which gets a decent amount of traffic 
> (~100mbps or so). It's a layer 3 connection.
> 
> Well, one of their 3550-12s died, taking down that network. They moved the 
> IP configuration of the port and moved the cable immediately, restoring 
> service, and racked/configured a replacement switch, but left that network 
> on the second 3550-12, as it seemed fine. 
> 
> However, once it began to come under load this morning, the CPU pegged 
> (80-99%, normally at 1-2%), causing packet drops and latency.
> 
> At that point I got involved, and for the life of me I can't figure out 
> why this happened. Clearly it's interrupts, as there were no processes in 
> the "sh proc cpu" that had more than 1% of CPU. However, cef was working 
> fine, everything looked normal in terms of the traditional interrupt-based 
> troubleshooting.
> 
> So, after scratching our heads for a bit, I had them move the connection 
> back to the original, newly-replaced switch. Note that these switches are 
> configured 100% identically with the exception of IP address and hostname. 
> Same IOS versions. I mean literally, if you diff the two in rancid, those 
> are the only config changes.
> 
> Zero problems from the point they moved the connection off of the switch 
> in question, both switches now have 1-2% CPU and things are humming along 
> fine.
> 
> So, my question is: What could be the possible causes of this? Could this 
> be a symptom of failing hardware, perhaps some bad memory requiring 
> constant CPU corrections?
> 
> Thanks,
> Andy
> 
> ---
> Andy Dills
> Xecunet, Inc.
> www.xecu.net
> 301-682-9972
> ---
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
> 

---
Andy Dills
Xecunet, Inc.
www.xecu.net
301-682-9972
---


More information about the cisco-nsp mailing list