[c-nsp] low throughput of 10GbE line

Nick Hilliard nick at inex.ie
Mon Mar 15 17:17:24 EDT 2010


On 15/03/2010 16:26, Jirí Procházka wrote:
> When traffic on this link reaches aproximately 6Gbps, latence to servers
> gets rapidly worse (about 100-150ms, about 2ms before)

the 6708 card has 200 megs of buffers per port.  doing the sums, this works
out at about 160ms of latency, assuming you're seeing a 10Gb microburst.
So at a superficial level, it looks like you're seeing packet drops because
of full buffers.

Also, you're running the card in oversubscription mode.  How much traffic
is te4/7 pushing?  I'd hazard a guess that you're running into
over-subscription problems on the blade.

I can't find the more detailed guide to the 6708 architecture on the cisco
web site, but there's a brief overview here:

> http://www.cisco.com/en/US/prod/collateral/switches/ps5718/ps708/prod_white_paper0900aecd80673385.html#wp9000681

while it's not going to give you exact details on tiny microbursts, I'd
consider installing RTG or YRTG with a 30 second poll interval, and monitor
all the ports on blade 4, along with the following aggregates.

te4/1 + te4/2
te4/3 + te4/4
te4/5 + te4/6
te4/7 + te4/8

te4/1 + te4/2 + te4/3 + te4/4
te4/5 + te4/6 + te4/7 + te4/8

Do you have a second 6708 blade?  You may need to consider running these
ports in non-oversubscribed mode.

Nick

 and speed is
> unpredictably slowing. Servers are able to generate much more than
> 10Gbps. I have tried to assign IP from this VLAN directly to
> vlan-interface at 3750 and latence is bad as well.
> 
> 
> The two problems which I can see at 7606 are following:
> 
> 1) Input queue drops at the interface. They appear at the same time as
> the high latence. I tried to set lower hold-queue, but no difference.
> Any type of qos or other bandwidht limiting methods are applied.
> 
> sitel-edge-new#show int te4/8
> TenGigabitEthernet4/8 is up, line protocol is up (connected)
>  Hardware is C7600 10Gb 802.3, address is 001e.f7f7.bd5f (bia
> 001e.f7f7.bd5f)
>  Description: SITEL-TTC-New10GbE
>  MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec,
>     reliability 255/255, txload 3/255, rxload 126/255
>  Encapsulation ARPA, loopback not set
>  Keepalive set (10 sec)
>  Full-duplex, 10Gb/s
>  Transport mode LAN (10GBASE-R, 10.3125Gb/s)
>  input flow-control is off, output flow-control is off
>  ARP type: ARPA, ARP Timeout 04:00:00
>  Last input 00:00:07, output 00:00:41, output hang never
>  Last clearing of "show interface" counters 18:56:25
>  Input queue: 0/4096/96151289/0 (size/max/drops/flushes); Total output
> drops: 0
>  Queueing strategy: fifo
>  Output queue: 0/4096 (size/max)
>  30 second input rate 4942942000 bits/sec, 410813 packets/sec
>  30 second output rate 143433000 bits/sec, 241176 packets/sec
>     34434311308 packets input, 51018973201607 bytes, 0 no buffer
>     Received 21472 broadcasts (17607 multicasts)
>     0 runts, 0 giants, 0 throttles
>     0 input errors, 0 CRC, 0 frame, 96151289 overrun, 0 ignored
>     0 watchdog, 0 multicast, 0 pause input
>     0 input packets with dribble condition detected
>     19623750753 packets output, 3094225873410 bytes, 0 underruns
>     0 output errors, 0 collisions, 0 interface resets
>     0 babbles, 0 late collision, 0 deferred
>     0 lost carrier, 0 no carrier, 0 pause output
>     0 output buffer failures, 0 output buffers swapped out
> 
> 
> the second side of line looks ok
> 
> TTC-3750-MAIN#show int te2/0/2
> TenGigabitEthernet2/0/2 is up, line protocol is up (connected)
>  Hardware is Ten Gigabit Ethernet, address is 001e.7a4f.fb9e (bia
> 001e.7a4f.fb9e)
>  Description: TTC-SITEL-New10GbE
>  MTU 1600 bytes, BW 10000000 Kbit, DLY 10 usec,
>     reliability 255/255, txload 129/255, rxload 3/255
>  Encapsulation ARPA, loopback not set
>  Keepalive not set
>  Full-duplex, 10Gb/s, link type is auto, media type is 10GBase-SR
>  Media-type configured as  connector
>  input flow-control is off, output flow-control is unsupported
>  ARP type: ARPA, ARP Timeout 04:00:00
>  Last input 00:00:51, output 00:00:16, output hang never
>  Last clearing of "show interface" counters 01:36:19
>  Input queue: 0/4096/0/0 (size/max/drops/flushes); Total output drops: 0
>  Queueing strategy: fifo
>  Output queue: 0/4096 (size/max)
>  30 second input rate 149330000 bits/sec, 249506 packets/sec
>  30 second output rate 5062720000 bits/sec, 420886 packets/sec
>     1400794396 packets input, 104446053346 bytes, 0 no buffer
>     Received 0 broadcasts (1250 multicasts)
>     0 runts, 0 giants, 0 throttles
>     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
>     0 watchdog, 1250 multicast, 0 pause input
>     0 input packets with dribble condition detected
>     2427526833 packets output, 3649915537744 bytes, 0 underruns
>     0 output errors, 0 collisions, 0 interface resets
>     0 babbles, 0 late collision, 0 deferred
>     0 lost carrier, 0 no carrier, 0 PAUSE output
>     0 output buffer failures, 0 output buffers swapped out
> 
> 
> 
> 
> 2) It looks that concerned transciever is a little bit overheated..but I
> don't trust these sensors much..
> 
> sitel-edge-new#show interfaces transceiver
> Transceiver monitoring is disabled for all interfaces.
> 
>                                           Optical   Optical
>           Temperature  Voltage  Current   Tx Power  Rx Power
> Port       (Celsius)    (Volts)  (mA)      (dBm)     (dBm)
> ---------  -----------  -------  --------  --------  --------
> Te4/1        39.3       0.00      36.8      -2.8      -1.0
> Te4/2        34.6       0.00      43.1      -2.6      -3.3
> Te4/3        36.8       0.00      29.7      -2.5      -1.0
> Te4/4        35.3       0.00       5.9      -1.9      -1.7
> Te4/5        44.1       0.00      45.4      -2.0      -7.7
> Te4/6        40.0       0.00      36.0      -3.4      -2.2
> Te4/7        40.8       0.00      34.0      -3.3      -0.5 +
> Te4/8        71.9 +     0.00       6.0      -3.2      -3.4
> 
> 
> 
> 
> some more debug info:
> 
> 7606 ->
> 
> sitel-edge-new#show platform hardware capacity fabric
> Switch Fabric Resources
>  Bus utilization: current: 35%, peak was 47% at 19:53:03 CET Thu Mar 11
> 2010
>  Fabric utilization:     Ingress                    Egress
>    Module  Chanl  Speed  rate  peak                 rate  peak
>    1       0        20G   14%   21% @18:26 11Mar10   19%   31% @18:20
> 12Mar10
>    2       0        20G   25%   39% @17:50 11Mar10    2%   10% @02:07
> 12Mar10
>    2       1        20G   10%   26% @19:41 11Mar10   33%   49% @18:06
> 11Mar10
>    4       0        20G   25%   63% @13:00 14Mar10    5%   20% @18:15
> 11Mar10
>    4       1        20G   45%   79% @18:07 11Mar10   50%   73% @09:43
> 13Mar10
>    5       0        20G    2%    5% @14:22 12Mar10   12%   19% @19:08
> 11Mar10
>  Switching mode: Module                                        Switching
> mode
>                  1 truncated
>                  2 truncated
>                  4 compact
>                  5                                               flow
> through
> 
> 
> 
> 
> I'm going to replace the "overheated" transciever in 7606 this night and
> hope it's the solution..but don't trust it much.
> 
> 
> Any advice would be really appreciated!
> 
> 
> Best regards,
> 
> 
> Jiri Prochazka
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/


-- 
Network Ability Ltd. | Head of Operations      | Tel: +353 1 6169698
3 Westland Square    | INEX - Internet Neutral | Fax: +353 1 6041981
Dublin 2, Ireland    | Exchange Association    | Email: nick at inex.ie


More information about the cisco-nsp mailing list