[c-nsp] CEF Scanner eating CPU in Supervisor 720

Peter Salanki peter.salanki at bahnhof.net
Fri Jun 9 16:26:05 EDT 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Nope, no HSRP on the box at all.

Interface configuration sample:

interface Vlan304
   description Akalla Husby - A157 - 3
   ip address 10.16.33.1 255.255.255.0 secondary
   ip address 81.170.132.1 255.255.254.0
   ip access-group AUTO-A157_VLAN304-IN in
   ip helper-address 213.136.32.49
   ip helper-address 213.136.32.48
   no ip redirects
   ipv6 address 2001:9B0:1:1015::1/64
   ipv6 enable
   ipv6 nd prefix 2001:9B0:1:1015::/64 3600 3600
   arp timeout 1200
end

9 jun 2006 kl. 22.23 skrev Ian Dickinson:

> Is there HSRP on any of the interfaces where there were duplicate IPs?
> I've seen Sup's die a 100% CPU death in this scenario, and not  
> recover when
> the duplicate was removed.
>
> Ian
>
> Rodney Dunn wrote:
>> Peter,
>>
>> Someone mentioned to me (and thy were watching this alias and
>> our thread) that they were seeing a similar issue.
>>
>> I tried to recreate it in the lab with this:
>>
>> R1 --
>> R2 --  UUT --- R3 -- R4
>>
>> I sent 1k routes from R2 to UUT as EBGP routes.
>> I originated 1k OSPF routes adn 1k iBGP routes from R4.
>> I had MPLS on between UUT and R3.
>>
>> I then turned up R1 with duplicate ip address.
>>
>> I do see the adj flap as the two devices argue about who should
>> own that ip address.
>>
>> And I see the adj update in 'sh ip cef ev' but I'm not able to
>> get the CEF scanner to run high.
>>
>> On Monday I'll see if I can get a 76xx with SXE5 on it and try.
>>
>> If you could try without MPLS and or contact me offline and get
>> me remote access I'll look at it with you if you can recreate it.
>>
>> In newer code that is coming out the CEF scanner doesn't exist
>> anymore. But I'd still like to understand what's happening here
>> because even if one adj is flapping it doesn't seem normal that the
>> scanner would constantly run like that.
>>
>> Rodney
>>
>> On Fri, Jun 09, 2006 at 08:39:21AM -0400, Rodney Dunn wrote:
>>
>>> On Fri, Jun 09, 2006 at 08:56:47AM +0200, Peter Salanki wrote:
>>>
>> I think the problem s not at all MPLS related. I did a perl hack that
>> paresed the output of sh ip cef event new and added static arp on the
>> hosts which flapped rapidly, 7 IPs had abnormal activity. The load is
>> now down to a more acceptable level of 20% avg. I could remove the
>> statics and isable MPLS on the core facing interfaces just to make
>> sure that MPLS has nothing to do with it if you want.
>>>>
>>>> I hate asking for things from production networks but if you could
>>>> do that without too much trouble that would be a good data point
>>>> to have.
>>>>
>>>> Also, those macs are they on the downstream interfaces (non mpls
>>>> enabled interfaces)?
>>>>
>>>> Are there any routes resolving through those arp/mac's on those  
>>>> interfaces?
>>>>
>>>>
>>>> How are the arps changing?
>>>>
>>>>
>>>> Do you have any
>>>>
>> case about this, and/or any plans of "fixing" it?
>>>>
>>>> I need to understand a little more about what the root problem  
>>>> is first.
>>>>
>>>> I don't like the
>>>>
>> thought of directly connected kiddies being able to drain all cpu  on
>> my (imo. not cheap) sup720-3bxl by just stealing eachothers IP
>> addresses.
>>>>
>>>> Help me understand a bit more about what is actually going on to  
>>>> trigger
>>>> it and we'll see what we can do.
>>>>
>>>>
>>>> I'm not a l2 person but isn't there stuff about securing macs on  
>>>> ports, etc.?
>>>>
>>>>
>> 9 jun 2006 kl. 00.32 skrev Rodney Dunn:
>>
>>
>>> One trick is you can do a 'sh ip cef ev new' and do it over and
>>> over. See which ones are flapping.
>>
>>> How many routes do you have?
>>
>>> Can you turn off logging to the console: no logg con
>>
>>> and run a couple of mpls debugs and let's see what that says:
>>
>>> debug mpls lfib cef
>>> debug mpls lfib enc
>>
>>> Set the lot to a couple of meg.
>>
>>> Rodney
>>
>>> On Thu, Jun 08, 2006 at 11:13:02PM +0200, Peter Salanki wrote:
>>
>>>> The CEF Scanner is now eating almost all CPU :/
>>>>
>>>> The events table doesn't look any particular to me,
>>>> --SNAP--
>>>>
>>>> +00:00:00.000:           81.170.148.226/32     ADJ (Vl4001) update
>>>> [OK]
>>>> +00:00:00.024:           195.178.160.138/32    ADJ (Vl19) update
>>>> [OK]
>>>> +00:00:00.052:           81.170.138.13/32      ADJ (Vl604) update
>>>> [OK]
>>>> +00:00:00.232:           81.170.152.129/32     ADJ (Vl4003) update
>>>> [OK]
>>>> +00:00:00.240:           81.170.148.118/32     ADJ (Vl4001) update
>>>> [OK]
>>>> +00:00:00.304:           81.170.149.246/32     ADJ (Vl4001) update
>>>> [OK]
>>>> +00:00:00.320:           81.170.152.50/32      ADJ (Vl4003) update
>>>> [OK]
>>>> +00:00:00.380:           81.170.154.117/32     ADJ (Vl4004) update
>>>> [OK]
>>>> +00:00:00.388:           213.136.56.90/32      ADJ (Vl39) update
>>>> [OK]
>>>> +00:00:00.400:           81.170.136.79/32      ADJ (Vl504) update
>>>> [OK]
>>>> +00:00:00.416:           195.178.160.173/32    ADJ (Vl19) update
>>>> [OK]
>>>> +00:00:00.512:           81.170.164.163/32     ADJ (Vl4009) update
>>>> [OK]
>>>> +00:00:00.728:           81.170.130.75/32      ADJ (Vl204) update
>>>> [OK]
>>>> +00:00:00.736: [Default] 199.3.108.0/24        NBD modified
>>>> [OK]
>>>> +00:00:00.736: [Default] 199.3.109.0/24        NBD modified
>>>> [OK]
>>>> +00:00:00.820:           195.178.186.24/32     ADJ (Vl666) update
>>>> [OK]
>>>> +00:00:00.832:           81.170.160.3/32       ADJ (Vl4007) update
>>>> [OK]
>>>> +00:00:00.868:           81.170.164.33/32      ADJ (Vl4009) update
>>>> [OK]
>>>> +00:00:00.944:           81.170.132.159/32     ADJ (Vl304) update
>>>> [OK]
>>>> +00:00:00.952:           81.170.128.77/32      ADJ (Vl104) update
>>>> [OK]
>>>> +00:00:01.008:           81.170.149.246/32     ADJ (Vl4001) update
>>>> [OK]
>>>> +00:00:01.128:           194.68.123.141/32     ADJ (Vl15) update
>>>> [OK]
>>>> --More--
>>>>
>>>>
>>>> 8 jun 2006 kl. 19.40 skrev Rodney Dunn:
>>>>
>>>>
>>>>> Are you running MPLS on the box?
>>
>>>>> Check the sh ip cef event outut and see if you have a /32 ADJ
>>>>> for a mac constantly changing. That's the most common trigger
>>>>> I've seen for the scanner running high.
>>
>>>>> You are forcing CEF to constantly reresolve prefixes.
>>
>>>>> Rodney
>>
>>>>> On Thu, Jun 08, 2006 at 02:23:22PM +0200, Peter Salanki wrote:
>>
>>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>>> Hash: SHA1
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Process "CEF Scanner" is eating average 60% of the CPU on one  
>>>>>> of my
>>>>>> Sup720-3BXL. This leads to snmp responses being delayed and full
>>>>>> BGP
>>>>>> updates taking a long time. I have not seen this on any of my  
>>>>>> other
>>>>>> sup720s. What differs this box from the rest is that this one  
>>>>>> has a
>>>>>> lot of directly connected hosts ~10 SVIs with 300 hosts each
>>>>>> (on /23
>>>>>> subnets). I have tried setting arp timeout to 1200 on those SVIs,
>>>>>> which resulted in a small CPU utilization decrease. What can I
>>>>>> do to
>>>>>> calm down the CEF Scanner? I'm running 12.2(18)SXF4.
>>>>>>
>>>>>> CPU utilization for five seconds: 44%/4%; one minute: 38%; five
>>>>>> minutes: 38%
>>>>>> PID Runtime(ms)   Invoked      uSecs   5Sec   1Min   5Min TTY
>>>>>> Process
>>>>>> 119   103495040    719635     143819 35.40% 23.87% 21.54%   0 CEF
>>>>>> Scanner
>>>>>>
>>>>>> Sincerely
>>>>>>
>>>>>> Peter Salanki
>>>>>> Chief Network Engineer
>>>>>> Bahnhof AB (AS8473)
>>>>>> www.bahnhof.se
>>>>>> Office: +46855577132
>>>>>> Cell: +46709174932
>>>>>>
>>>>>>
>>>>>>

Med vänliga hälsningar

Peter Salanki
Nätansvarig
Bahnhof AB (AS8473)
www.bahnhof.se
Kontor: +46855577132
Mobil: +46709174932



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (Darwin)

iD8DBQFEidldiQKhdiFGiogRAmQhAJ9k3TfYSoBDPH6FsJEcfBfafbeP9QCfcqHs
O3zsjrmVg7gNAM8RsYR9KUc=
=hcDi
-----END PGP SIGNATURE-----



More information about the cisco-nsp mailing list