[c-nsp] High CPU load from loose TE tunnels

Wed Jan 24 08:34:04 EST 2007

On Wed, Jan 24, 2007 at 10:26:29AM +0000, Anton Smith wrote:
> Rodney Dunn wrote:
> > What code?
> 
> My bad - it is actually a 6500. Code s72033-pk9sv-mz.122-18.SXD7.bin 
> (12.2(18)SXD7).
> 
> The other end (exhibiting no problems) is 12.2(20)S6 (7300). It actually 
> has the CEF Scanner still though:
> 
> #show processes  | i CEF
>    66 Lwe 40F4B92C     10196276    1711321    5958 4840/6000   0 CEF 
> Scanner
>    99 Lwe 40F4C260       780468   27792146      28 3704/6000   0 CEF 
> process
> 
> I forgot to note that there is a difference in the paths between the two 
> routers, probably easiest just to draw a small diagram:
> 
> R1(6500)--(loosely found hops)---(loose)--(strict)-x-(strict)--R2(7300)
> 
> The physical link down is where the x is above.
> 
> 
> 
> > The CEF scanner is gone in later code. We moved it to an entirely event
> > driven architecture.
> 
> Could you perhaps tell me in which code version is the CEF scanner removed?

12.2(28)SRA and forward. 12.2(28)SB and forward on the 73xx. The changes
are coming over to 12.4T branch just before the 12.5 mailine pull.

I'd have to log in and take a look while the problem is happening then.
I can't see how the CPU is high with CEF if the event logs or those
debugs don't show anything at all.

> 
> > 
> > A sh ip cef events might help explain what the scanner is trying to
> > do. It's a bit more complicated with MPLS enabled because some of
> > the MPLS code can make callbacks to the CEF scanner even though it's
> > not CEF at all.
> 
> sh ip cef events doesn't seem to show anything interesting w.r.t. the 
> tunnel in question.
> 
> > 
> > debug mpls lfib cef|enc|adj might tell the answer if that's what it
> > is.
> 
> The debugs don't seem to show anything useful either.
> 
> > 
> > Rodney
> > 
> 
> Cheers,
> Anton
> 
> 
> 
> >  
> > On Tue, Jan 23, 2007 at 05:01:53PM +0000, Anton Smith wrote:
> >> Hi all,
> >>
> >> I am having a problem with high CPU load on a 7600 because of a downed 
> >> TE tunnel that is configured to use a path that starts with a loose hop, 
> >> followed by strict hops. The tunnel is down because one of the strict 
> >> hops is not reachable (physical link down). The tunnel and path config 
> >> are as follows:
> >>
> >> interface Tunnel111
> >>   description Tunnel111
> >>   ip unnumbered Loopback1
> >>   tunnel destination x.x.x.4
> >>   tunnel mode mpls traffic-eng
> >>   tunnel mpls traffic-eng autoroute announce
> >>   tunnel mpls traffic-eng priority 2 2
> >>   tunnel mpls traffic-eng path-option 1 explicit name path1
> >>   tunnel mpls traffic-eng load-share 155
> >>
> >> ip explicit-path name path1 enable
> >>   index 2 next-address loose x.x.x.1
> >>   next-address x.x.x.2
> >>   next-address x.x.x.3
> >>   next-address x.x.x.4
> >>
> >>
> >> Other tunnels that are configured with paths that are very similar (and 
> >> are fully up) do not cause this kind of CPU load. But when they go down, 
> >> they also create the same kind of load.
> >>
> >> The CPU histogram looks as follows:
> >>
> >>        44444     44444     44444    444444444455555     44444
> >>       233333     66666     7777788886666655555444441111166666
> >> 100
> >>   90
> >>   80
> >>   70
> >>   60
> >>   50             *****     *****    ***************     *****
> >>   40   *****     *****     *****    ***************     *****
> >>   30   *****     *****     *****    ***************     *****
> >>   20   *****     *****     *****    ***************     *****
> >>   10   *****     *****     ************************     *****
> >>      0....5....1....1....2....2....3....3....4....4....5....5....
> >>                0    5    0    5    0    5    0    5    0    5
> >>
> >>                 CPU% per second (last 60 seconds)
> >>
> >> And the process that seems to be causing the load is CEF scanner:
> >>
> >> CPU utilization for five seconds: 44%/0%; one minute: 29%; five minutes: 18%
> >>   PID Runtime(ms)   Invoked      uSecs   5Sec   1Min   5Min TTY Process
> >>   114    43044320   1823442      23606 37.11% 21.28% 11.57%   0 CEF 
> >> Scanner
> >> ....
> >>
> >> If I admin shut the tunnel interface in question, the CPU load drops 
> >> back to near zero.
> >>
> >> I notice that when I run show mpls traffic-eng tunnels summary, the 
> >> activations and deactivations numbers increment steadily every few 
> >> seconds, as though the box is continually trying to bring up the tunnel 
> >> (more often than it should?):
> >>
> >> R1#show mpls traffic-eng tunnels summary
> >> Signalling Summary:
> >>      LSP Tunnels Process:            running
> >>      Passive LSP Listener:           running
> >>      RSVP Process:                   running
> >>      Forwarding:                     enabled
> >>      Head: 15 interfaces, 13 active signalling attempts, 13 established
> >>            23356 activations, 23343 deactivations
> >>      Midpoints: 0, Tails: 13
> >>      Periodic reoptimization:        every 300 seconds, next in 213 seconds
> >>      Periodic FRR Promotion:         Not Running
> >>      Periodic auto-bw collection:    every 300 seconds, next in 269 seconds
> >>
> >> I have another router (the tail), which has return tunnels built (also 
> >> using a combination of loose and strict hops). This router also has a 
> >> downed tunnel interface (for the same reason that the first router does 
> >> - i.e. a downed physical link on a strict hop), but it does not exhibit 
> >> high CPU load nor does it seem to be periodic. In addition, the 
> >> 'activations' and 'deactivations' counters do not increment. However, 
> >> this other router is a 7301. The tunnels are not administratively shut 
> >> on the 7301.
> >>
> >> Does anybody have any ideas? How frequently does a 7600 attempt to bring 
> >> up a tunnel interface? I imagine that the CPU load is coming from the 
> >> CSPF calculation being run every few seconds in an attempt to find a 
> >> path to the first (loose) hop. Is it possible to change this frequency? 
> >> (I have tried changing the reoptimisation timers but I do not believe 
> >> this is the problem, since they are by default 300 seconds - and I see 
> >> high CPU load every few seconds).
> >>
> >> Any help much appreciated :).
> >>
> >> Regards,
> >> Anton
> >> _______________________________________________
> >> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> >> https://puck.nether.net/mailman/listinfo/cisco-nsp
> >> archive at http://puck.nether.net/pipermail/cisco-nsp/
> >