[c-nsp] CEF Scanner eating CPU in Supervisor 720

Fri Jun 9 15:50:46 EDT 2006

Peter,

Someone mentioned to me (and thy were watching this alias and
our thread) that they were seeing a similar issue.

I tried to recreate it in the lab with this:

R1 --
R2 --  UUT --- R3 -- R4

I sent 1k routes from R2 to UUT as EBGP routes.
I originated 1k OSPF routes adn 1k iBGP routes from R4.
I had MPLS on between UUT and R3.

I then turned up R1 with duplicate ip address.

I do see the adj flap as the two devices argue about who should
own that ip address.

And I see the adj update in 'sh ip cef ev' but I'm not able to
get the CEF scanner to run high.

On Monday I'll see if I can get a 76xx with SXE5 on it and try.

If you could try without MPLS and or contact me offline and get
me remote access I'll look at it with you if you can recreate it.

In newer code that is coming out the CEF scanner doesn't exist
anymore. But I'd still like to understand what's happening here
because even if one adj is flapping it doesn't seem normal that the
scanner would constantly run like that.

Rodney

On Fri, Jun 09, 2006 at 08:39:21AM -0400, Rodney Dunn wrote:
> On Fri, Jun 09, 2006 at 08:56:47AM +0200, Peter Salanki wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> > 
> > I think the problem s not at all MPLS related. I did a perl hack that  
> > paresed the output of sh ip cef event new and added static arp on the  
> > hosts which flapped rapidly, 7 IPs had abnormal activity. The load is  
> > now down to a more acceptable level of 20% avg. I could remove the  
> > statics and isable MPLS on the core facing interfaces just to make  
> > sure that MPLS has nothing to do with it if you want.
> 
> I hate asking for things from production networks but if you could
> do that without too much trouble that would be a good data point
> to have.
> 
> Also, those macs are they on the downstream interfaces (non mpls
> enabled interfaces)?
> 
> Are there any routes resolving through those arp/mac's on those interfaces?
> 
> 
> How are the arps changing?
> 
> 
>  Do you have any  
> > case about this, and/or any plans of "fixing" it? 
> 
> I need to understand a little more about what the root problem is first.
> 
> I don't like the  
> > thought of directly connected kiddies being able to drain all cpu  on  
> > my (imo. not cheap) sup720-3bxl by just stealing eachothers IP  
> > addresses.
> 
> Help me understand a bit more about what is actually going on to trigger
> it and we'll see what we can do.
> 
> 
> I'm not a l2 person but isn't there stuff about securing macs on ports, etc.?
> 
> > 
> > 9 jun 2006 kl. 00.32 skrev Rodney Dunn:
> > 
> > > One trick is you can do a 'sh ip cef ev new' and do it over and
> > > over. See which ones are flapping.
> > >
> > > How many routes do you have?
> > >
> > > Can you turn off logging to the console: no logg con
> > >
> > > and run a couple of mpls debugs and let's see what that says:
> > >
> > > debug mpls lfib cef
> > > debug mpls lfib enc
> > >
> > > Set the lot to a couple of meg.
> > >
> > > Rodney
> > >
> > > On Thu, Jun 08, 2006 at 11:13:02PM +0200, Peter Salanki wrote:
> > >> The CEF Scanner is now eating almost all CPU :/
> > >>
> > >> The events table doesn't look any particular to me,
> > >> --SNAP--
> > >>
> > >> +00:00:00.000:           81.170.148.226/32     ADJ (Vl4001) update
> > >> [OK]
> > >> +00:00:00.024:           195.178.160.138/32    ADJ (Vl19) update
> > >> [OK]
> > >> +00:00:00.052:           81.170.138.13/32      ADJ (Vl604) update
> > >> [OK]
> > >> +00:00:00.232:           81.170.152.129/32     ADJ (Vl4003) update
> > >> [OK]
> > >> +00:00:00.240:           81.170.148.118/32     ADJ (Vl4001) update
> > >> [OK]
> > >> +00:00:00.304:           81.170.149.246/32     ADJ (Vl4001) update
> > >> [OK]
> > >> +00:00:00.320:           81.170.152.50/32      ADJ (Vl4003) update
> > >> [OK]
> > >> +00:00:00.380:           81.170.154.117/32     ADJ (Vl4004) update
> > >> [OK]
> > >> +00:00:00.388:           213.136.56.90/32      ADJ (Vl39) update
> > >> [OK]
> > >> +00:00:00.400:           81.170.136.79/32      ADJ (Vl504) update
> > >> [OK]
> > >> +00:00:00.416:           195.178.160.173/32    ADJ (Vl19) update
> > >> [OK]
> > >> +00:00:00.512:           81.170.164.163/32     ADJ (Vl4009) update
> > >> [OK]
> > >> +00:00:00.728:           81.170.130.75/32      ADJ (Vl204) update
> > >> [OK]
> > >> +00:00:00.736: [Default] 199.3.108.0/24        NBD modified
> > >> [OK]
> > >> +00:00:00.736: [Default] 199.3.109.0/24        NBD modified
> > >> [OK]
> > >> +00:00:00.820:           195.178.186.24/32     ADJ (Vl666) update
> > >> [OK]
> > >> +00:00:00.832:           81.170.160.3/32       ADJ (Vl4007) update
> > >> [OK]
> > >> +00:00:00.868:           81.170.164.33/32      ADJ (Vl4009) update
> > >> [OK]
> > >> +00:00:00.944:           81.170.132.159/32     ADJ (Vl304) update
> > >> [OK]
> > >> +00:00:00.952:           81.170.128.77/32      ADJ (Vl104) update
> > >> [OK]
> > >> +00:00:01.008:           81.170.149.246/32     ADJ (Vl4001) update
> > >> [OK]
> > >> +00:00:01.128:           194.68.123.141/32     ADJ (Vl15) update
> > >> [OK]
> > >> --More--
> > >>
> > >>
> > >> 8 jun 2006 kl. 19.40 skrev Rodney Dunn:
> > >>
> > >>> Are you running MPLS on the box?
> > >>>
> > >>> Check the sh ip cef event outut and see if you have a /32 ADJ
> > >>> for a mac constantly changing. That's the most common trigger
> > >>> I've seen for the scanner running high.
> > >>>
> > >>> You are forcing CEF to constantly reresolve prefixes.
> > >>>
> > >>> Rodney
> > >>>
> > >>> On Thu, Jun 08, 2006 at 02:23:22PM +0200, Peter Salanki wrote:
> > >>>> -----BEGIN PGP SIGNED MESSAGE-----
> > >>>> Hash: SHA1
> > >>>>
> > >>>> Hello,
> > >>>>
> > >>>> Process "CEF Scanner" is eating average 60% of the CPU on one of my
> > >>>> Sup720-3BXL. This leads to snmp responses being delayed and full  
> > >>>> BGP
> > >>>> updates taking a long time. I have not seen this on any of my other
> > >>>> sup720s. What differs this box from the rest is that this one has a
> > >>>> lot of directly connected hosts ~10 SVIs with 300 hosts each  
> > >>>> (on /23
> > >>>> subnets). I have tried setting arp timeout to 1200 on those SVIs,
> > >>>> which resulted in a small CPU utilization decrease. What can I  
> > >>>> do to
> > >>>> calm down the CEF Scanner? I'm running 12.2(18)SXF4.
> > >>>>
> > >>>> CPU utilization for five seconds: 44%/4%; one minute: 38%; five
> > >>>> minutes: 38%
> > >>>> PID Runtime(ms)   Invoked      uSecs   5Sec   1Min   5Min TTY  
> > >>>> Process
> > >>>> 119   103495040    719635     143819 35.40% 23.87% 21.54%   0 CEF
> > >>>> Scanner
> > >>>>
> > >>>> Sincerely
> > >>>>
> > >>>> Peter Salanki
> > >>>> Chief Network Engineer
> > >>>> Bahnhof AB (AS8473)
> > >>>> www.bahnhof.se
> > >>>> Office: +46855577132
> > >>>> Cell: +46709174932
> > >>>>
> > >>>>
> > >>>> -----BEGIN PGP SIGNATURE-----
> > >>>> Version: GnuPG v1.4.2.2 (Darwin)
> > >>>>
> > >>>> iD8DBQFEiBa7iQKhdiFGiogRAr9aAJ9W+rryMPcg5qnAYrYTU9jbRg8PFgCdHDA3
> > >>>> QjIpm/Yk7kuf4VjZN5MqDq8=
> > >>>> =O029
> > >>>> -----END PGP SIGNATURE-----
> > >>>> _______________________________________________
> > >>>> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> > >>>> https://puck.nether.net/mailman/listinfo/cisco-nsp
> > >>>> archive at http://puck.nether.net/pipermail/cisco-nsp/
> > >>
> > >> Med vänliga hälsningar
> > >>
> > >> Peter Salanki
> > >> Nätansvarig
> > >> Bahnhof AB (AS8473)
> > >> www.bahnhof.se
> > >> Kontor: +46855577132
> > >> Mobil: +46709174932
> > >>
> > 
> > Sincerely
> > 
> > Peter Salanki
> > Chief Network Engineer
> > Bahnhof AB (AS8473)
> > www.bahnhof.se
> > Office: +46855577132
> > Cell: +46709174932
> > 
> > 
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.4.2.2 (Darwin)
> > 
> > iD8DBQFEiRuviQKhdiFGiogRApBTAJ9wZqTm+iAVcO4AgccM7OUvfCjlyACgltZ3
> > Mzw94W8HEF7+RGBjmObwqXc=
> > =EAim
> > -----END PGP SIGNATURE-----