[c-nsp] Sup720 CPU spikes, an academic question

Tue May 3 17:09:09 EDT 2011

Peter Rathlev <peter at rathlev.dk> wrote:
>
> I know a single 5 second interval of 100% CPU utilization now and then
> is rather irrelevant seen from an operational perspective. That's
> probably even more true when looking at a 600 MHz MIPS on a Sup720. This
> thing has me puzzled though. :-)
>
A burst of SNMPv3 with cryptographic operations can hurt a poor MIPS 
chip.  We run torrus[1] and it took me a while to realise the obvious 
that polling all our kit 3DES/MD5 was probably bad idea (it was brutal 
enough to the system that was doing the polling) so when with just 
SNMPv2c.

> The following is the output from "show proc cpu" (slightly reformatted)
> from a device that exceeded a 90% warning threshold we've configured. 
>
You really want to be looking at the '5min' sorted graph.

>  CPU utilization for five seconds: 100%/0%; one min: 10%; five min: 4%
>   PID Runtime(ms)   Invoked  uSecs  5Sec  1Min  5Min  Process
>     8   870373628  51977035  16745 1.27% 0.59% 0.64%  Check heaps
>   487    20306096  67521163    300 0.15% 0.04% 0.04%  Port manager per
>     2        9688   5187559      1 0.07% 0.00% 0.00%  Load Meter
>   358    18902200  40236967    469 0.07% 0.03% 0.02%  CEF: IPv4 proces
>    23    85574908 641372631    133 0.00% 0.12% 0.08%  IPC Seat Manager
>    51   111228136   4913752  22636 0.00% 0.07% 0.05%  Per-minute Jobs
>   272    28800268 228265577    126 0.00% 0.10% 0.07%  IP Input
>   561    55288392 590654988     93 0.00% 0.13% 0.09%  ISIS Adj
>   578    16540192 166947095     99 0.00% 0.05% 0.04%  HSRP IPv4
> 
> I've excluded processes with 0% utilization for all three periods. To me
> the above means that 0% time (?) was spent interrupt switching,
>
...in the previous 5sec interval.

> The spikes do not seem to correlate with a lot of traffic, neither
> traffic for the RP nor traffic generally being forwarded by the box. It
> also does not correlate with IGP or BGP events or anything I'd consider
> relevant. Even the odd loop or ridiculous multicast flooding dosn't tax
> the CPU under normal circumstances.
>
multicast from a directly connected VLAN at the router with the TTL of 
the packets set to 1 is how you can multicast 'attacks' on routers.  
Might be something occasionally firing up (Norton Ghost) probbing for a 
suitable TTL to put in it's multicast payload...but this I would expect 
to appear in your ring buffer.

> What puzzles me is: What causes the RP to max out at 100% utilization in
> a case like this? Should I just ignore it altogether?
> 
The sysadmin in me says look at the *runtime*/*uSecs* columns.

Good Hunting.

[1] http://torrus.org/

-- 
Alexander Clouter
.sigmonster says: pain, n.:
                  	One thing, at least it proves that you're alive!