[c-nsp] Brief CPU spikes on 6500 Sup 720

Wed Jul 14 11:01:41 EDT 2010

typo. Should have been ECMP(equal cost multi-path) i.e. equal cost routes. 

-Ben

On Jul 14, 2010, at 10:59 AM, Aaron Riemer wrote:

> Forgive my ignorance. What is ECPM??
> 
> Shouldn't all routed traffic be handled by the active HSRP node?
> 
> -----Original Message-----
> From: Benjamin Lovell [mailto:belovell at cisco.com] 
> Sent: Wednesday, 14 July 2010 10:38 PM
> To: Aaron Riemer
> Cc: 'JC Cockburn'; 'Phil Mayers'; cisco-nsp at puck.nether.net
> Subject: Re: [c-nsp] Brief CPU spikes on 6500 Sup 720
> 
> Most of the time we see problems like this it is caused by asymmetric
> routing. The ECPM return path leads to the standby switch which does not
> know the DMAC as all traffic was processed by the active switch. This is
> usually fixed by increasing the MAC table timers to match the ARP timers so
> that standby will keep MAC in table long enough until the next time ARP
> comes around. 
> 
> -Ben
> 
> On Jul 14, 2010, at 9:46 AM, Aaron Riemer wrote:
> 
>> Hi Phil,
>> 
>> Answers below:
>> 
>> 1) IOS - s72033-advipservicesk9_wan-mz.122-18.SXF17a.bin
>> 2) HSRP configured between two core 6509's. SVI is VLAN1 (I know don't
> ask)
>> trunked between the cores via 10G. Only ports in VLAN1 on one core switch
>> are impacted and seeing the flooding.
>> 3) Building floor switches connect to both cores (Routed and running
> EIGRP)
>> 4) Spanning Tree Below:
>> 
>> Core1:
>> spanning-tree mode pvst
>> spanning-tree vlan 1-199,336,503-930 priority 16384
>> 
>> Core2:
>> spanning-tree mode pvst
>> spanning-tree vlan 1-199,336,503-930 priority 0
>> 
>> 5) No rate limiting or CoPP configured. We are seeing drops even when the
>> CPU is not hitting 100% (most likely due to ASIC oversubscription).  
>> 6) Source of traffic is unknown at this stage. Will turn to wireshark
>> tomorrow.
>> 7) I don't believe there are any L2 loops. If spanning-tree was an issue I
>> would think the CPU would gradually hit 100% and stay there.
>> 
>> We are seeing output drops on interfaces and oversubscription of ASICs as
> a
>> result of this flooding which I think is the main culprit for the brief
>> connectivity outages. Is there a way similar to CoPP to protect the ASICs
> to
>> ensure they are never 100% utilised? Egress shaping on all suspect ports?
>> 
>> 
>> Thanks,
>> 
>> Aaron.
>> 
>> 
>> -----Original Message-----
>> From: cisco-nsp-bounces at puck.nether.net
>> [mailto:cisco-nsp-bounces at puck.nether.net] On Behalf Of JC Cockburn
>> Sent: Wednesday, 14 July 2010 8:03 PM
>> To: 'Phil Mayers'
>> Cc: cisco-nsp at puck.nether.net
>> Subject: Re: [c-nsp] Brief CPU spikes on 6500 Sup 720
>> Importance: High
>> 
>> Hi Phil,
>> I had a problem like this last year on 6500's.
>> It was related to bug: CSCsk23521
>> Basically a server in our datacenter used multicast addresses in the range
>> allocated for BPDU's, and this just killed the SP (100% CPU...).
>> 
>> If you do a "remote command switch sh proc cpu" on the 6500 you can see if
>> the SP CPU is under fire...
>> 
>> Cheers
>> JC
>> 
>> -----Original Message-----
>> From: cisco-nsp-bounces at puck.nether.net
>> [mailto:cisco-nsp-bounces at puck.nether.net] On Behalf Of Phil Mayers
>> Sent: Wednesday, July 14, 2010 1:41 PM
>> To: cisco-nsp at puck.nether.net
>> Subject: Re: [c-nsp] Brief CPU spikes on 6500 Sup 720
>> 
>> On 14/07/10 11:30, Aaron Riemer wrote:
>>> Hi Group,
>>> 
>>> 
>>> 
>>> We are having trouble with unicast flooding on a particular VLAN and
>>> associated ports and as a result brief spikes in CPU usage on one of our
>>> 6509 core switches.
>>> 
>>> 
>>> 
>>> ARP and MAC timeouts are set to default and we haven't had problems with
>>> this in the past. The problem is I believe this is causing brief 100%
>> spikes
>>> within the SP or RP and as a result brief connectivity outages.
>> 
>> Which is it? SP or RP?
>> 
>>> 
>>> 
>>> 
>>> We have narrowed down the source of the unicast flooding but we need to
>> know
>>> why it is occurring.
>> 
>> Rather more info required I think.
>> 
>> * IOS version
>> * Config of ports & SVIs in question
>> * Nature of downstream devices (if any)
>> * spanning tree config (if any)
>> * rough idea of the size of the ARP & MAC tables
>> * Any MLS rate-limit or CoPP config
>> * Nature of the source of the unicast-flooded traffic
>> * Any possibility of loops in the network?
>> 
>>> Has anyone experienced this in the past? Could unicast flooding over
>>> multiple interfaces account for this kind of behaviour?
>> 
>> Anything punted to the CPU at high rate could cause this kind of thing. 
>> That's why MLS limiters and CoPP are important on this platform, even 
>> with all their limitations.
>> _______________________________________________
>> cisco-nsp mailing list  cisco-nsp at puck.nether.net
>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>> archive at http://puck.nether.net/pipermail/cisco-nsp/
>> 
>> _______________________________________________
>> cisco-nsp mailing list  cisco-nsp at puck.nether.net
>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>> archive at http://puck.nether.net/pipermail/cisco-nsp/
>> <RP CPU.txt><sh module.txt><SP
> CPU.txt>_______________________________________________
>> cisco-nsp mailing list  cisco-nsp at puck.nether.net
>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>> archive at http://puck.nether.net/pipermail/cisco-nsp/