[c-nsp] 3550-12 interrupts out of control, possibly hardware?

Andy Dills andy at xecu.net
Thu Aug 16 18:23:19 EDT 2012


Thanks, I appreciate those suggestions. I verified both the SDM and VTP 
configs are identical. 

Did you see my followup from earlier? I identified that for some reason 
unknown to me, the traffic was hitting the vlan1 interface before exiting 
via the L3 interface facing that network, which was forcing all of the 
traffic to get process switched. I have no idea why, though, and would 
love suggestions.

My best guess is that because they configured the port for L3 mode before 
they enabled ip routing on the failover 3550-12, something didn't happen 
right and perhaps a reload would have fixed it. I do know that in the past 
when I have done "ip routing" on a live 3550, it goes unresponsive for 
about 10-15 seconds, so I have to assume a lot goes on behind the scenes. 
And I do know from the transcript of their changes that they configured 
the port for L3 mode before realizing ip routing had never been enabled on 
that switch. Given the "illogical" (in quotes because perhaps there 
is some logic that is escaping me) nature of the behavior observed, I have 
to assume it was some sort of quirk of bug like this. For what it's worth, 
they're both running c3550-ipservices-mz.122-44.SE6.

Thanks,
Andy

On Fri, 17 Aug 2012, Tóth András wrote:

> Hi Andy,
> 
> One idea is different SDM templates being used. The SDM template is
> not showing up in running-config, and changing it requires a reload as
> well. I would compare them with 'sh sdm prefer' command. You might be
> running out of IPv4 routes, which causes rest of routes to be applied
> in software, so packets are software switched by the CPU which can
> cause high utilization.
> 
> http://www.cisco.com/en/US/products/hw/switches/ps646/products_tech_note09186a0080094bc6.shtml
> 
> http://www.cisco.com/en/US/docs/switches/lan/catalyst3550/software/release/12.2_44_se/configuration/guide/swadmin.html#wp1235565
> 
> Best regards,
> Andras
> 
> On Thu, Aug 16, 2012 at 6:36 PM, Andy Dills <andy at xecu.net> wrote:
> >
> > I've got a customer with a weird situation.
> >
> > They have a pretty straightforward setup, two 7200s fronting two cisco
> > 3550-12s, distributing to a series of 48 port 3550s. It's a bit dated, but
> > works very well for their needs.
> >
> > They have one special network attached to (only) one of the copper gige
> > ports on (one of) the 3550-12s which gets a decent amount of traffic
> > (~100mbps or so). It's a layer 3 connection.
> >
> > Well, one of their 3550-12s died, taking down that network. They moved the
> > IP configuration of the port and moved the cable immediately, restoring
> > service, and racked/configured a replacement switch, but left that network
> > on the second 3550-12, as it seemed fine.
> >
> > However, once it began to come under load this morning, the CPU pegged
> > (80-99%, normally at 1-2%), causing packet drops and latency.
> >
> > At that point I got involved, and for the life of me I can't figure out
> > why this happened. Clearly it's interrupts, as there were no processes in
> > the "sh proc cpu" that had more than 1% of CPU. However, cef was working
> > fine, everything looked normal in terms of the traditional interrupt-based
> > troubleshooting.
> >
> > So, after scratching our heads for a bit, I had them move the connection
> > back to the original, newly-replaced switch. Note that these switches are
> > configured 100% identically with the exception of IP address and hostname.
> > Same IOS versions. I mean literally, if you diff the two in rancid, those
> > are the only config changes.
> >
> > Zero problems from the point they moved the connection off of the switch
> > in question, both switches now have 1-2% CPU and things are humming along
> > fine.
> >
> > So, my question is: What could be the possible causes of this? Could this
> > be a symptom of failing hardware, perhaps some bad memory requiring
> > constant CPU corrections?
> >
> > Thanks,
> > Andy
> >
> > ---
> > Andy Dills
> > Xecunet, Inc.
> > www.xecu.net
> > 301-682-9972
> > ---
> > _______________________________________________
> > cisco-nsp mailing list  cisco-nsp at puck.nether.net
> > https://puck.nether.net/mailman/listinfo/cisco-nsp
> > archive at http://puck.nether.net/pipermail/cisco-nsp/
> 

---
Andy Dills
Xecunet, Inc.
www.xecu.net
301-682-9972
---


More information about the cisco-nsp mailing list