[c-nsp] CEF Scanner eating CPU in Supervisor 720
Peter Salanki
peter.salanki at bahnhof.net
Fri Jun 9 16:26:05 EDT 2006
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Nope, no HSRP on the box at all.
Interface configuration sample:
interface Vlan304
description Akalla Husby - A157 - 3
ip address 10.16.33.1 255.255.255.0 secondary
ip address 81.170.132.1 255.255.254.0
ip access-group AUTO-A157_VLAN304-IN in
ip helper-address 213.136.32.49
ip helper-address 213.136.32.48
no ip redirects
ipv6 address 2001:9B0:1:1015::1/64
ipv6 enable
ipv6 nd prefix 2001:9B0:1:1015::/64 3600 3600
arp timeout 1200
end
9 jun 2006 kl. 22.23 skrev Ian Dickinson:
> Is there HSRP on any of the interfaces where there were duplicate IPs?
> I've seen Sup's die a 100% CPU death in this scenario, and not
> recover when
> the duplicate was removed.
>
> Ian
>
> Rodney Dunn wrote:
>> Peter,
>>
>> Someone mentioned to me (and thy were watching this alias and
>> our thread) that they were seeing a similar issue.
>>
>> I tried to recreate it in the lab with this:
>>
>> R1 --
>> R2 -- UUT --- R3 -- R4
>>
>> I sent 1k routes from R2 to UUT as EBGP routes.
>> I originated 1k OSPF routes adn 1k iBGP routes from R4.
>> I had MPLS on between UUT and R3.
>>
>> I then turned up R1 with duplicate ip address.
>>
>> I do see the adj flap as the two devices argue about who should
>> own that ip address.
>>
>> And I see the adj update in 'sh ip cef ev' but I'm not able to
>> get the CEF scanner to run high.
>>
>> On Monday I'll see if I can get a 76xx with SXE5 on it and try.
>>
>> If you could try without MPLS and or contact me offline and get
>> me remote access I'll look at it with you if you can recreate it.
>>
>> In newer code that is coming out the CEF scanner doesn't exist
>> anymore. But I'd still like to understand what's happening here
>> because even if one adj is flapping it doesn't seem normal that the
>> scanner would constantly run like that.
>>
>> Rodney
>>
>> On Fri, Jun 09, 2006 at 08:39:21AM -0400, Rodney Dunn wrote:
>>
>>> On Fri, Jun 09, 2006 at 08:56:47AM +0200, Peter Salanki wrote:
>>>
>> I think the problem s not at all MPLS related. I did a perl hack that
>> paresed the output of sh ip cef event new and added static arp on the
>> hosts which flapped rapidly, 7 IPs had abnormal activity. The load is
>> now down to a more acceptable level of 20% avg. I could remove the
>> statics and isable MPLS on the core facing interfaces just to make
>> sure that MPLS has nothing to do with it if you want.
>>>>
>>>> I hate asking for things from production networks but if you could
>>>> do that without too much trouble that would be a good data point
>>>> to have.
>>>>
>>>> Also, those macs are they on the downstream interfaces (non mpls
>>>> enabled interfaces)?
>>>>
>>>> Are there any routes resolving through those arp/mac's on those
>>>> interfaces?
>>>>
>>>>
>>>> How are the arps changing?
>>>>
>>>>
>>>> Do you have any
>>>>
>> case about this, and/or any plans of "fixing" it?
>>>>
>>>> I need to understand a little more about what the root problem
>>>> is first.
>>>>
>>>> I don't like the
>>>>
>> thought of directly connected kiddies being able to drain all cpu on
>> my (imo. not cheap) sup720-3bxl by just stealing eachothers IP
>> addresses.
>>>>
>>>> Help me understand a bit more about what is actually going on to
>>>> trigger
>>>> it and we'll see what we can do.
>>>>
>>>>
>>>> I'm not a l2 person but isn't there stuff about securing macs on
>>>> ports, etc.?
>>>>
>>>>
>> 9 jun 2006 kl. 00.32 skrev Rodney Dunn:
>>
>>
>>> One trick is you can do a 'sh ip cef ev new' and do it over and
>>> over. See which ones are flapping.
>>
>>> How many routes do you have?
>>
>>> Can you turn off logging to the console: no logg con
>>
>>> and run a couple of mpls debugs and let's see what that says:
>>
>>> debug mpls lfib cef
>>> debug mpls lfib enc
>>
>>> Set the lot to a couple of meg.
>>
>>> Rodney
>>
>>> On Thu, Jun 08, 2006 at 11:13:02PM +0200, Peter Salanki wrote:
>>
>>>> The CEF Scanner is now eating almost all CPU :/
>>>>
>>>> The events table doesn't look any particular to me,
>>>> --SNAP--
>>>>
>>>> +00:00:00.000: 81.170.148.226/32 ADJ (Vl4001) update
>>>> [OK]
>>>> +00:00:00.024: 195.178.160.138/32 ADJ (Vl19) update
>>>> [OK]
>>>> +00:00:00.052: 81.170.138.13/32 ADJ (Vl604) update
>>>> [OK]
>>>> +00:00:00.232: 81.170.152.129/32 ADJ (Vl4003) update
>>>> [OK]
>>>> +00:00:00.240: 81.170.148.118/32 ADJ (Vl4001) update
>>>> [OK]
>>>> +00:00:00.304: 81.170.149.246/32 ADJ (Vl4001) update
>>>> [OK]
>>>> +00:00:00.320: 81.170.152.50/32 ADJ (Vl4003) update
>>>> [OK]
>>>> +00:00:00.380: 81.170.154.117/32 ADJ (Vl4004) update
>>>> [OK]
>>>> +00:00:00.388: 213.136.56.90/32 ADJ (Vl39) update
>>>> [OK]
>>>> +00:00:00.400: 81.170.136.79/32 ADJ (Vl504) update
>>>> [OK]
>>>> +00:00:00.416: 195.178.160.173/32 ADJ (Vl19) update
>>>> [OK]
>>>> +00:00:00.512: 81.170.164.163/32 ADJ (Vl4009) update
>>>> [OK]
>>>> +00:00:00.728: 81.170.130.75/32 ADJ (Vl204) update
>>>> [OK]
>>>> +00:00:00.736: [Default] 199.3.108.0/24 NBD modified
>>>> [OK]
>>>> +00:00:00.736: [Default] 199.3.109.0/24 NBD modified
>>>> [OK]
>>>> +00:00:00.820: 195.178.186.24/32 ADJ (Vl666) update
>>>> [OK]
>>>> +00:00:00.832: 81.170.160.3/32 ADJ (Vl4007) update
>>>> [OK]
>>>> +00:00:00.868: 81.170.164.33/32 ADJ (Vl4009) update
>>>> [OK]
>>>> +00:00:00.944: 81.170.132.159/32 ADJ (Vl304) update
>>>> [OK]
>>>> +00:00:00.952: 81.170.128.77/32 ADJ (Vl104) update
>>>> [OK]
>>>> +00:00:01.008: 81.170.149.246/32 ADJ (Vl4001) update
>>>> [OK]
>>>> +00:00:01.128: 194.68.123.141/32 ADJ (Vl15) update
>>>> [OK]
>>>> --More--
>>>>
>>>>
>>>> 8 jun 2006 kl. 19.40 skrev Rodney Dunn:
>>>>
>>>>
>>>>> Are you running MPLS on the box?
>>
>>>>> Check the sh ip cef event outut and see if you have a /32 ADJ
>>>>> for a mac constantly changing. That's the most common trigger
>>>>> I've seen for the scanner running high.
>>
>>>>> You are forcing CEF to constantly reresolve prefixes.
>>
>>>>> Rodney
>>
>>>>> On Thu, Jun 08, 2006 at 02:23:22PM +0200, Peter Salanki wrote:
>>
>>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>>> Hash: SHA1
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Process "CEF Scanner" is eating average 60% of the CPU on one
>>>>>> of my
>>>>>> Sup720-3BXL. This leads to snmp responses being delayed and full
>>>>>> BGP
>>>>>> updates taking a long time. I have not seen this on any of my
>>>>>> other
>>>>>> sup720s. What differs this box from the rest is that this one
>>>>>> has a
>>>>>> lot of directly connected hosts ~10 SVIs with 300 hosts each
>>>>>> (on /23
>>>>>> subnets). I have tried setting arp timeout to 1200 on those SVIs,
>>>>>> which resulted in a small CPU utilization decrease. What can I
>>>>>> do to
>>>>>> calm down the CEF Scanner? I'm running 12.2(18)SXF4.
>>>>>>
>>>>>> CPU utilization for five seconds: 44%/4%; one minute: 38%; five
>>>>>> minutes: 38%
>>>>>> PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY
>>>>>> Process
>>>>>> 119 103495040 719635 143819 35.40% 23.87% 21.54% 0 CEF
>>>>>> Scanner
>>>>>>
>>>>>> Sincerely
>>>>>>
>>>>>> Peter Salanki
>>>>>> Chief Network Engineer
>>>>>> Bahnhof AB (AS8473)
>>>>>> www.bahnhof.se
>>>>>> Office: +46855577132
>>>>>> Cell: +46709174932
>>>>>>
>>>>>>
>>>>>>
Med vänliga hälsningar
Peter Salanki
Nätansvarig
Bahnhof AB (AS8473)
www.bahnhof.se
Kontor: +46855577132
Mobil: +46709174932
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (Darwin)
iD8DBQFEidldiQKhdiFGiogRAmQhAJ9k3TfYSoBDPH6FsJEcfBfafbeP9QCfcqHs
O3zsjrmVg7gNAM8RsYR9KUc=
=hcDi
-----END PGP SIGNATURE-----
More information about the cisco-nsp
mailing list