[c-nsp] Brief CPU spikes on 6500 Sup 720

Wed Jul 14 10:59:18 EDT 2010

Forgive my ignorance. What is ECPM??

Shouldn't all routed traffic be handled by the active HSRP node?

-----Original Message-----
From: Benjamin Lovell [mailto:belovell at cisco.com] 
Sent: Wednesday, 14 July 2010 10:38 PM
To: Aaron Riemer
Cc: 'JC Cockburn'; 'Phil Mayers'; cisco-nsp at puck.nether.net
Subject: Re: [c-nsp] Brief CPU spikes on 6500 Sup 720

Most of the time we see problems like this it is caused by asymmetric
routing. The ECPM return path leads to the standby switch which does not
know the DMAC as all traffic was processed by the active switch. This is
usually fixed by increasing the MAC table timers to match the ARP timers so
that standby will keep MAC in table long enough until the next time ARP
comes around. 

-Ben

On Jul 14, 2010, at 9:46 AM, Aaron Riemer wrote:

> Hi Phil,
> 
> Answers below:
> 
> 1) IOS - s72033-advipservicesk9_wan-mz.122-18.SXF17a.bin
> 2) HSRP configured between two core 6509's. SVI is VLAN1 (I know don't
ask)
> trunked between the cores via 10G. Only ports in VLAN1 on one core switch
> are impacted and seeing the flooding.
> 3) Building floor switches connect to both cores (Routed and running
EIGRP)
> 4) Spanning Tree Below:
> 
> Core1:
> spanning-tree mode pvst
> spanning-tree vlan 1-199,336,503-930 priority 16384
> 
> Core2:
> spanning-tree mode pvst
> spanning-tree vlan 1-199,336,503-930 priority 0
> 
> 5) No rate limiting or CoPP configured. We are seeing drops even when the
> CPU is not hitting 100% (most likely due to ASIC oversubscription).  
> 6) Source of traffic is unknown at this stage. Will turn to wireshark
> tomorrow.
> 7) I don't believe there are any L2 loops. If spanning-tree was an issue I
> would think the CPU would gradually hit 100% and stay there.
> 
> We are seeing output drops on interfaces and oversubscription of ASICs as
a
> result of this flooding which I think is the main culprit for the brief
> connectivity outages. Is there a way similar to CoPP to protect the ASICs
to
> ensure they are never 100% utilised? Egress shaping on all suspect ports?
> 
> 
> Thanks,
> 
> Aaron.
> 
> 
> -----Original Message-----
> From: cisco-nsp-bounces at puck.nether.net
> [mailto:cisco-nsp-bounces at puck.nether.net] On Behalf Of JC Cockburn
> Sent: Wednesday, 14 July 2010 8:03 PM
> To: 'Phil Mayers'
> Cc: cisco-nsp at puck.nether.net
> Subject: Re: [c-nsp] Brief CPU spikes on 6500 Sup 720
> Importance: High
> 
> Hi Phil,
> I had a problem like this last year on 6500's.
> It was related to bug: CSCsk23521
> Basically a server in our datacenter used multicast addresses in the range
> allocated for BPDU's, and this just killed the SP (100% CPU...).
> 
> If you do a "remote command switch sh proc cpu" on the 6500 you can see if
> the SP CPU is under fire...
> 
> Cheers
> JC
> 
> -----Original Message-----
> From: cisco-nsp-bounces at puck.nether.net
> [mailto:cisco-nsp-bounces at puck.nether.net] On Behalf Of Phil Mayers
> Sent: Wednesday, July 14, 2010 1:41 PM
> To: cisco-nsp at puck.nether.net
> Subject: Re: [c-nsp] Brief CPU spikes on 6500 Sup 720
> 
> On 14/07/10 11:30, Aaron Riemer wrote:
>> Hi Group,
>> 
>> 
>> 
>> We are having trouble with unicast flooding on a particular VLAN and
>> associated ports and as a result brief spikes in CPU usage on one of our
>> 6509 core switches.
>> 
>> 
>> 
>> ARP and MAC timeouts are set to default and we haven't had problems with
>> this in the past. The problem is I believe this is causing brief 100%
> spikes
>> within the SP or RP and as a result brief connectivity outages.
> 
> Which is it? SP or RP?
> 
>> 
>> 
>> 
>> We have narrowed down the source of the unicast flooding but we need to
> know
>> why it is occurring.
> 
> Rather more info required I think.
> 
>  * IOS version
>  * Config of ports & SVIs in question
>  * Nature of downstream devices (if any)
>  * spanning tree config (if any)
>  * rough idea of the size of the ARP & MAC tables
>  * Any MLS rate-limit or CoPP config
>  * Nature of the source of the unicast-flooded traffic
>  * Any possibility of loops in the network?
> 
>> Has anyone experienced this in the past? Could unicast flooding over
>> multiple interfaces account for this kind of behaviour?
> 
> Anything punted to the CPU at high rate could cause this kind of thing. 
> That's why MLS limiters and CoPP are important on this platform, even 
> with all their limitations.
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
> 
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
> <RP CPU.txt><sh module.txt><SP
CPU.txt>_______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/