[c-nsp] 3550 high cpu & process switched traffic
Tassos Chatzithomaoglou
achatz at forthnet.gr
Sat Aug 19 06:19:15 EDT 2006
Ok, i disabled all incoming ospf routing updates through a distribute list and the problem seems to
be solved for the moment (in order to be sure i'll have to wait until the traffic reaches its max
again). So it -most probably- should be the routing table size that caused such a high cpu load.
Now comes the interesting part:
I had opened a case with cisco tac some months ago about the same problem. We had followed almost
the same steps like here and had come to the conclusion that it should be some special traffic
traversing the router causing it to be processed by the cpu. So we needed to capture a sample of
traffic through sniffing and examine it. Of course that capturing isn't very easy when the hardware
is located somewhere away. So the case ended there.
According to tac, although they had seen both "sh ip cef sum" & "sh ip route sum", the following
output was the one that actually showed there was no problem with the routing table size:
3550#sh l3tcam shadow
L3 TCAM: total 72 bit entries = 9216, used entries = 8774
L3 TCAM: total 144 bit entries = 4608, used entries = 0
What was even more strange is that we had another 3550 with exactly the same routing table but only
5% cpu load. And the main difference between these 2 3550's was just the amount of traffic passing
through. This 3550 had around 170Mbps max (95% cpu) while the other had around 30Mbps max (5% cpu).
When this 3550's traffic was decreasing (~80Mbps), cpu load was falling too (40%). I guess that was
another cause that made tac believe there was something "strange" in the traffic of this 3550.
I still wonder what might be the relationship between routing table size and amount of traffic, so
the same routing table size with different amounts of traffic is causing different cpu loads on a
3550. According to another cisco engineer, "generally" a 3550 should be able to forward in L3
whatever it's able to do in L2 with no apparent difference in cpu load.
Yesterday, Clinton's quote "You will see some due to IP options that can't be processed in the TCAM
IP forwarding" made me search through "sh ip traffic" where you can see the number of packets with
options (i wonder how tac didn't think of that; they were looking for redirects/unreachables only).
Since this number was very small, it probably shouldn't be the kind of traffic causing the high cpu
load. I'm not trying to reduce tac's job, but sometimes i believe they provide "hard to implement"
solutions (remote sniffing being one of them).
Thanks a lot guys for all your help.
Regards,
Tassos
Tassos Chatzithomaoglou wrote on 19/8/2006 12:11 πμ:
> Thanks a lot Clinton and everyone else for your answers...
>
> Since i would like first to test a quick "solution" for the routing size
> problem before i find a permanent solution for decreasing its size, if i
> use a in distribute-list under ospf which denies everything (so
> everything goes out through the default gateway), will that make the
> routing table shorter?
>
> Regards,
> Tassos
>
> Clinton Work wrote on 18/8/2006 10:03 μμ:
>>
>> Your problem is too many routes for the default SDM template. The
>> default template is limited to 8K Unicast routes. If you change the
>> SDM template to routing you can probably fit the 11K routes into the
>> TCAM. The routing template is 24K for the 3550-12G and 16K routes for
>> the regular 3550s.
>>
>> Please check "show sdm prefer".
>>
>> 3550#show sdm prefer
>> The current template is the default template.
>> The selected template optimizes the resources in
>> the switch to support this level of features for
>> 16 routed interfaces and 1K VLANs.
>>
>> number of unicast mac addresses: 6K
>> number of igmp groups: 6K
>> number of qos aces: 2K
>> number of security aces: 2K
>> number of unicast routes: 12K
>> number of multicast routes: 6K
>>
>>
>> Note, seeing a high pps of routing packets under "show controller cpu"
>> is bad. It means that the IP packets are being punted to the CPU for
>> forwarding. You will see some due to IP options that can't be
>> processed in the TCAM IP forwarding.
>>
>> 3550#sh contr cpu
>> routing packets : 3323805100 retrieved, 0 dropped, 0 errors
>>
>> Example from a 3550 routing about 50Mbps of Internet traffic and the
>> CPU is at 2%. Each of these commands is a couple of seconds apart.
>> The router is forwarding about 20,000 pps of Internet traffic.
>>
>>
>> 3550#show controllers cpu-interface | inc routing
>> routing protocol packets : 21492556 retrieved, 0 dropped
>> routing packets : 297482820 retrieved, 0 dropped
>>
>> 3550#show controllers cpu-interface | inc routing
>> routing protocol packets : 21492557 retrieved, 0 dropped
>> routing packets : 297482826 retrieved, 0 dropped
>>
>> 3550#show controllers cpu-interface | inc routing
>> routing protocol packets : 21492557 retrieved, 0 dropped
>> routing packets : 297482831 retrieved, 0 dropped
>>
>> 3550#show controllers cpu-interface | inc routing
>> routing protocol packets : 21492557 retrieved, 0 dropped
>> routing packets : 297482837 retrieved, 0 dropped
>>
>> 3550#show controllers cpu-interface | inc routing
>> routing protocol packets : 21492557 retrieved, 0 dropped
>> routing packets : 297482845 retrieved, 0 dropped
>>
>>
>>
>>
>> Tassos Chatzithomaoglou wrote:
>>>
>>> 3550#sh ip cef sum
>>> IP CEF with switching (Table Version 2714880), flags=0x0
>>> 11194 routes, 0 reresolve, 0 unresolved (0 old, 0 new), peak 3
>>> 11197 leaves, 683 nodes, 2233168 bytes, 2714439 inserts, 2703242
>>> invalidations
>>> 1 load sharing elements, 336 bytes, 1 references
>>> universal per-destination load sharing algorithm, id BAF66A0D
>>> 2(0) CEF resets, 446 revisions of existing leaves
>>> Resolution Timer: Exponential (currently 1s, peak 1s)
>>> 444 in-place/0 aborted modifications
>>> refcounts: 196815 leaf, 175104 node
>>>
>>> Table epoch: 0 (11197 entries at this epoch)
>>
>>
More information about the cisco-nsp
mailing list