[c-nsp] Help with output drops
Randy McAnally
rsm at fast-serv.com
Mon Jul 13 00:30:29 EDT 2009
Hi all,
I just finished installing and configuring a new 6509 with dual sup7203bxl
(12.2(18)SXF15a) and a 6724 linecard. It serves a simple purpose of
maintaining a single BGP session, and managing layer3 (vlans) for various
access switches. No end devices are connected.
The problem is that I am getting constant output drops when the aggregation
uplink goes above ~400 mbps. Nowhere near the interface speed! See below,
take note of massive 'Total output drops' with no other errors (on either end):
rtr1.ash#sh int g1/1
GigabitEthernet1/1 is up, line protocol is up (connected)
Hardware is C6k 1000Mb 802.3, address is 00d0.01ff.5800 (bia 00d0.01ff.5800)
Description: PTP-UPLINK
Internet address is 209.9.224.68/29
MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,
reliability 255/255, txload 118/255, rxload 12/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 1000Mb/s, media type is T
input flow-control is off, output flow-control is off
Clock mode is auto
ARP type: ARPA, ARP Timeout 04:00:00
Last input 00:00:00, output 00:00:01, output hang never
Last clearing of "show interface" counters 05:01:25
Input queue: 0/1000/0/0 (size/max/drops/flushes); Total output drops: 718023
Queueing strategy: fifo
Output queue: 0/100 (size/max)
30 second input rate 47789000 bits/sec, 30797 packets/sec
30 second output rate 465362000 bits/sec, 48729 packets/sec
L2 Switched: ucast: 27775 pkt, 2136621 bytes - mcast: 24590 pkt, 1574763 bytes
L3 in Switched: ucast: 592150327 pkt, 95608889548 bytes - mcast: 0 pkt, 0
bytes mcast
L3 out Switched: ucast: 991372425 pkt, 1214882993007 bytes mcast: 0 pkt, 0 bytes
592554441 packets input, 95674494492 bytes, 0 no buffer
Received 33643 broadcasts (17872 IP multicasts)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 0 multicast, 0 pause input
0 input packets with dribble condition detected
991006394 packets output, 1214377864373 bytes, 0 underruns
0 output errors, 0 collisions, 0 interface resets
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 PAUSE output
0 output buffer failures, 0 output buffers swapped out
The CPU usage is nil:
rtr1.ash#sh proc cpu sort
CPU utilization for five seconds: 1%/0%; one minute: 0%; five minutes: 0%
PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
6 3036624 252272 12037 0.47% 0.19% 0.18% 0 Check heaps
316 195004 99543 1958 0.15% 0.01% 0.00% 0 BGP Scanner
119 267568 2962884 90 0.15% 0.03% 0.02% 0 IP Input
172 413528 2134933 193 0.07% 0.03% 0.02% 0 CEF process
4 16 48214 0 0.00% 0.00% 0.00% 0 cpf_process_ipcQ
3 0 2 0 0.00% 0.00% 0.00% 0 cpf_process_msg_
5 0 1 0 0.00% 0.00% 0.00% 0 PF Redun ICC Req
2 772 298376 2 0.00% 0.00% 0.00% 0 Load Meter
9 23964 157684 151 0.00% 0.01% 0.00% 0 ARP Input
7 0 1 0 0.00% 0.00% 0.00% 0 Pool Manager
8 0 2 0 0.00% 0.00% 0.00% 0 Timers
<<<snip>>>
I THINK I have determined the drops are caused by buffer congestion on the port:
rtr1.ash#sh queueing interface gigabitEthernet 1/1
rtr1.ash#sh queueing interface gigabitEthernet 1/1
Interface GigabitEthernet1/1 queueing strategy: Weighted Round-Robin
Port QoS is enabled
Port is untrusted
Extend trust state: not trusted [COS = 0]
Default COS is 0
Queueing Mode In Tx direction: mode-cos
Transmit queues [type = 1p3q8t]:
Queue Id Scheduling Num of thresholds
-----------------------------------------
01 WRR 08
02 WRR 08
03 WRR 08
04 Priority 01
WRR bandwidth ratios: 100[queue 1] 150[queue 2] 200[queue 3]
queue-limit ratios: 50[queue 1] 20[queue 2] 15[queue 3] 15[Pri Queue]
<<<snip>>>
Packets dropped on Transmit:
queue dropped [cos-map]
---------------------------------------------
1 719527 [0 1 ]
2 0 [2 3 4 ]
3 0 [6 7 ]
4 0 [5 ]
So it would appear all of my traffic goes into queue 1. It would also seem
that 50% buffers for queue 1 isn't enough? These are the default settings by
the way.
I'm pretty sure that wrr-queue queue-limit and wrr-queue bandwidth should help
us mitigate this frustrating packet loss, but I've no experience and would
like some insight and suggestions before I start making changes. I am totally
unfamiliar with these features (I come from Foundry/Brocade background) and
would like any suggestions or advise you might have before I try anything that
could risk downtime or further issues in a production environment.
And lastly, what should I look out for when modifying the buffers? Network
blips, more congestion, ect? This is a production switch and the last thing I
need to do is make matters worse.
Thank you!
--
Randy
More information about the cisco-nsp
mailing list