[c-nsp] Interface drops

Chris Knipe savage at savage.za.org
Sat Jan 9 11:10:28 EST 2016


Hi All,

I have a pair of C3750G's that in a stack (WS-C3750G-48TS, 12.2(40)SE,
IPBASE).  Numerous EtherChannels are configured spanning the two switches.

I am seeing output drops, and the counters decrements as well as increments.
This leads me to believe that I may be hitting CSCtq86186  What I am not
understanding, is that 'show platform port-asic stats drop' doesn't show any
drops at all (granted, mls qos is disabled which I think may be a
requirement here).  

The switch cluster SHOULD be doing a fair amount of traffic, but I am not
getting a fraction of the traffic that I am expecting however.  Is it safe
to assume that the output drops are my issue (lack of buffers) and it is
time to upgrade, or is there any other reliable method to determine just how
much packets are being dropped?  I don't think there's a way for me to
determine (from the switch at least), exactly how much are being dropped?

Just an FYI:
# sh int po1
Port-channel1 is up, line protocol is up (connected) 
  Hardware is EtherChannel, address is 0026.52e8.f984 (bia 0026.52e8.f984)
  Members in this channel: Gi1/0/1 Gi1/0/2 Gi1/0/3 Gi1/0/4 Gi2/0/1 Gi2/0/2
Gi2/0/3 Gi2/0/4 
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops:
14482944
  30 second input rate 801422000 bits/sec, 88307 packets/sec
  30 second output rate 494159000 bits/sec, 64104 packets/sec
     548164331323 packets output, 556661107062424 bytes, 0 underruns

Now, 14482944 of 548164331323 packets is a mere 0.002%... Hence - should
this be a cause of concern?


#sh int po7
Port-channel7 is up, line protocol is up (connected) 
  Hardware is EtherChannel, address is 001c.b1e8.9627 (bia 001c.b1e8.9627)
  Members in this channel: Gi2/0/38 Gi2/0/39 Gi2/0/40 
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops:
780850784
  30 second input rate 155029000 bits/sec, 17659 packets/sec
  30 second output rate 204147000 bits/sec, 19389 packets/sec
     17002461535 packets output, 23299760423613 bytes, 0 underruns

Again, here we have a bit more, sitting at 4.592% 

I'm trying to establish here IF I do indeed have a networking issue, or
whether the issue is elsewhere (such as servers and a lack of IOPS for
example).   We run NFS over these ports, and frequently get NFS timeouts and
what not (latency sub 1ms)...   The servers are physically not under a lot
of load, except for Disk IO.  The network, well, it's idling based on the
above stats (baring the output drops).

Naturally, if the switch's stats is wrong (which I think it is), we may well
be dropping significantly more than what the switch indicates, which would
explain the slow throughput / NFS issues.  But it's not set in stone, as it
could very well also be the disks in the servers which can't cope with the
concurrent read/write requests...

So yes - let's forget about the fact that it's 3750's.  Do I buy new
(upgrade) switches, or do I buy new (upgrade) servers?  Given bug CSCtq86186
how could I establish which one of the two is the more severe issue?

I did bench the servers, and locally on the servers (without using the
network), I do get SIGNIFICANTLY better performance (like 10 x increase)...
My gut is telling me that I should get better performance with the existing
server hardware by upgrading the network, but I don't want to go down that
route if it's not pretty much guaranteed to solve my issues.  We're talking
a lot of money here at the end of the day.



Many thanks,
Chris.




More information about the cisco-nsp mailing list