[c-nsp] 6500 SUP720 High Latency and Jitter issues

Simon Leinen simon at limmat.switch.ch
Wed May 25 06:42:04 EDT 2005


Tim Stevenson writes:
> If you are measuring latency by pinging the RP itself, then that is
> not a good indicator of network performance at all. Are you saying
> that pings *through* the 6500 see the latency as well? or just pings
> to the RP? Your original mail suggests customers are being impacted,
> so either their traffic is getting s/w switched, or the CPU has
> little/nothing to do w/it.

The number behind the slash suggests that indeed much (possibly all)
traffic through the router is MSFC- (software-) switched.  Slightly
edited extract from Dan's message <429347B8.90201 at swingpad.com>:

DB> DCA-BV-RTR#sho processes cpu | exclude 0.00
DB> CPU utilization for five seconds: 70%/28%; 1': 40%; 5': 39%
DB> DCA-BV-RTR#sho processes cpu | exclude 0.001'       5'
DB> CPU utilization for five seconds: 35%/24%; 1': 39%; 5': 39%
DB> DCA-BV-RTR#sho processes cpu | exclude 0.001'       5'
DB> CPU utilization for five seconds: 30%/26%; 1': 37%; 5': 38%
DB> DCA-BV-RTR#sho processes cpu | exclude 0.001'       5'
DB> CPU utilization for five seconds: 32%/28%; 1': 36%; 5': 38%
DB> DCA-BV-RTR#sho processes cpu | exclude 0.001'       5'
DB> CPU utilization for five seconds: 32%/28%; 1': 36%; 5': 38%
DB> DCA-BV-RTR#sho processes cpu | exclude 0.001'       5'
DB> CPU utilization for five seconds: 35%/31%; 1': 36%; 5': 38%

Since traffic is switched by the MSFC, it competes for resources with
other things such as BGP processing.  BGP processing is probably what
is hurting in this case, because I think it does have these
table-walking jobs that run every minute.

> As long as the h/w is programmed correctly, the CPU can be at 100%
> and not effect latency through the system.

Yes, but the problem seems to be that traffic IS being sw-switched, so
it would be good to find out why.  You are hinting at that in

> Are the customers that are seeing the problem the same ones that are
> being NATted?

- although NAT/PAT is supposed to be hardware-accelerated on the
PFC-3BXL that is used here - and Chuck Church suggested in
<B6621ED4D0AD394BBA73CA657DFD897642F427 at MSPEXBE01.wamnet.inc> to look
for "Any weird QOS configured on it" - although many forms of QoS are
also implemented in hardware on the PFC-2/3.  There are many other
features that lead to software switching when enabled.  My personal
favorites are complex ACLs such as Reflexive ACLs or CBAC.

As Ian Cox said on this list, turning a 7600 into a 7200 (by forcing
it do to software forwarding) is never a good idea.  But it's easy to
do so by accident, and hard to find out after the fact why it
happened.

So I'd be interested in how you would look at a router to find out
*why* hardware-based forwarding is not being used.

Is there any finer-grained accounting on what MSFC-based forwarding is
actually doing, e.g. executing complex ACLs, doing NAT, performing
encapsulation and decapsulating for tunnel protocols that aren't
hardware-supported?

Another useful tool would be a config checker that would take a config
file and the description of the platform (e.g. Catalyst6500/7600 OSR
with Sup720-3BXL), and IOS version, and that would highlight which
parts of the configuration will cause packets to be software-switched,
and to which extent (e.g. will only those packets that enjoy the
feature be software-switched, or all packets on the interface/router
where the feature is activated).
-- 
Simon.



More information about the cisco-nsp mailing list