[c-nsp] 3550 High CPU - nothing in proc cpu

Sun Nov 22 19:01:57 EST 2009

Hector,

It is interesting that the cisco article tells you how to profile your cpu
but not how to interpret the results ;-)

There is only one way to interpret the results - contact Cisco to report the
abnormality. They will have to decode the address/es using the symbol files
for your device software which will reveal the culprit function/s. It should
be pretty straight forward to isolate cause and rectify thereafter.

FYI, seeing CPU spikes to X% during high traffic is not abnormal for most
non-distributed platforms that are groaning under an inappropriate switching
algorithm or overload.

Out of curiosity, is 40% cpu utilization above your benchmarked baseline? If
no, ignore. Also, any alignment corrections? device#sh align

Eninja
PS. Note to CPU profiler PM, help customers to help themselves - enhance cpu
profiler to display decoded addresses in *show profile terse* results and
display culprit functions so users can resolve these simple issues
themselves. Justification - reduction in TAC calls.

On Sat, Nov 21, 2009 at 5:01 PM, Hector Herrera <mail4hh at pobox.com> wrote:

> I had another opportunity to debug the high cpu usage on the 3550-12t.
>
> show proc cpu indicated that cpu load was 39% interrupt, 40% total
>
> So it's definitively a high interrupt rate that is using up the cpu.
>
> I also debugged the switching mechanism, and although I have high
> amounts of TTL-expired events, they only occur at a rate of 2-3 per
> second.
>
> I proceeded to profile the cpu usage with:
>
> profile <start> <end> <granularity>
> profile start
> ... 10 mins later
> profile stop
> show profile terse
>
> Granularity was 8 due to the largest free block being about half the
> size of the main:text section.
>
> This gave me a listing of all the memory ranges and a count of how
> many times the cpu was found to be in that memory location.
>
> System Total     = 000141506
> Interrupt Total  = 000056163 (39 percent)
> Sched Total      = 000094547 (66 percent)
>
> Interrupt [00] = 000056163 (39 percent)
>
> The interrupt breakdown is (top 3):
>
> 0x475F50 with 3281 counts (~5.4 per sec.)
> 0x4B82B8 with 1667 counts (~2.7 per sec)
> 0x4B8F90 with 1456 counts (~2.4 per sec)
>
> My question is:
>
> How do I convert those memory addresses into something that would tell
> me what interrupts are being triggered so much?
>
> Thank you,
>
> Hector
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
>