[c-nsp] Best practice - Core vs Access Router

Wed Feb 10 14:18:53 EST 2010

Andy,
By excluding 0.00 your excluding those that have had 0.00 anywhere in the
time list. Just use sort and look at the top few. Although most likely the
same.

If you have a number of large Ethernet subnets with few systems on them,
then "sh ip arp" will contain a number of incompletes. If it is the entire
subnet filled with incompletes then someone is looking for all of your
systems and is most likely doing a ping sweep, then enabling "mls rate-limit
unicast cef glean" will be worthwhile. These are both Adj Manager and ARP
Input I believe. 

The other one is if you've run out of TCAM space, because your over the
limits with the number of routes you have. Don't know if you're running an
XL or not. 

CPU doesn't look out of order currently. Need to capture it ongoing to see
what process is pushing it to 24%, and even then it should still be
forwarding traffic. 

You might need to look at the DFC's as well, to see if one is having issues:
Remote command module X sh proc cpu sort

David

--
http://dcp.dcptech.com

> -----Original Message-----
> From: cisco-nsp-bounces at puck.nether.net [mailto:cisco-nsp-
> bounces at puck.nether.net] On Behalf Of Andy B.
> Sent: Wednesday, February 10, 2010 1:44 PM
> To: Phil Mayers
> Cc: nsp-cisco
> Subject: Re: [c-nsp] Best practice - Core vs Access Router
> 
> I am currently facing this strange behaviour once again. Nothing
> suspicious in terms of CPU:
> 
> #sh proc cpu sort | ex 0.00
> CPU utilization for five seconds: 7%/3%; one minute: 24%; five minutes:
> 23%
>  PID Runtime(ms)   Invoked      uSecs   5Sec   1Min   5Min TTY Process
>  123   823552748 891845755        923  1.35%  1.32%  1.24%   0 IP Input
>  142    42990360 548209142         78  0.63%  0.15%  0.06%   0 IP SNMP
>  176    81597832 313530395        260  0.63%  0.20%  0.12%   0 SNMP
> ENGINE
>  286    95557652  68837887       1388  0.31%  4.77%  4.27%   0 BGP
> Router
>   46        8724      6895       1265  0.31%  0.33%  0.24%   2 SSH
> Process
>  169    98755140   5844411      16897  0.31%  0.31%  0.31%   0 Adj
> Manager
>    9    92740444 222352412        417  0.23%  0.40%  0.41%   0 ARP
> Input
>  320    20411156 140247526        145  0.15%  1.64%  1.57%   0 BGP I/O
>  180    64470940  51288798       1257  0.15%  0.58%  0.44%   0 CEF
> process
>  167    27190044 390437731         69  0.15%  0.12%  0.10%   0 IPv6
> Input
> 
> #remote command switch sh proc cpu sort | ex 0.00
> CPU utilization for five seconds: 10%/0%; one minute: 14%; five
> minutes: 20%
>  PID Runtime(ms)   Invoked      uSecs   5Sec   1Min   5Min TTY Process
>  102   577414400  14603714      39539  5.19%  2.76%  2.58%   0 Vlan
> Statistics
>   42  11702922242664309865          0  3.91%  3.83%  3.87%   0 slcp
> process
>  257    79620728  46604862       1708  0.23%  1.31%  0.92%   0 CEF
> process
>  152    24224440  35123075        689  0.15%  0.08%  0.07%   0 CEF LC
> Stats
>   33    29231032 224654615        130  0.15%  0.08%  0.07%   0 SCP
> Download Lis
>  131    39865856   1338254      29789  0.07%  0.08%  0.11%   0 TCAM
> Manager pro
>  127    37865260 135955648        278  0.07%  0.07%  0.07%   0 Spanning
> Tree
>  187    12366092   3103775       3984  0.07%  0.04%  0.05%   0 v6fib
> stat colle
>  239    11888108   8600338       1382  0.07%  0.04%  0.03%   0 LTL MGR
> cc
> 
> Packet loss to the router (nothing behind it) is around 25%.
> And still loosing random BGP and OSPF sessions. SNMP graphs are not
> being generated either.
> 
> Currently feeling quite desperate, because I have no clue where to look
> next...
> 
> Andy
> 
> On Tue, Feb 9, 2010 at 6:56 PM, Phil Mayers <p.mayers at imperial.ac.uk>
> wrote:
> > On 09/02/10 17:39, Church, Charles wrote:
> >>
> >> I was going by the 'show proc cpu hist' he gave for both the SP and
> RP.
> >> Both looked pretty bad across the board.
> >
> > His graphs don't look that dis-similar to mine, and we have no such
> > problems. The peak/avg CPU don't look so unreasonable to me given the
> load
> > and setup he's described.
> >
> > To summarise in this thread, it has been suggested:
> >
> >  1. Netflow is the problem - to which the OP said he's already tried
> > disabling it
> >
> >  2. CPU punts, specifically gleans, are the problem - in which case
> CoPP or
> > MLS rate limiters can be tried, but the OP really IMHO needs to
> confirm this
> > with a span of the CPU
> >
> >  3. The 6500 is just no good buy a juniper or asr1k (!) which I
> strongly
> > dispute. It may be awkward and have odd limits, but it OUGHT TO
> HANDLE the
> > load we've been told about; therefore something is wrong
> >
> > ...and lots more besides. I'm exhausted from following the thread,
> but my
> > advice to the OP is to determine what is hitting the CPU *during an
> outage*,
> > then proceed from there.
> >
> > I'm going to stop reading now.
> > _______________________________________________
> > cisco-nsp mailing list  cisco-nsp at puck.nether.net
> > https://puck.nether.net/mailman/listinfo/cisco-nsp
> > archive at http://puck.nether.net/pipermail/cisco-nsp/
> >
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/