[c-nsp] Dynamic output buffer allocation on Cisco 4948

Thu Sep 26 23:12:01 EDT 2013

I would try host to host on the same vlan, then consider flow-control impact

Thanks

~mike

On Sep 26, 2013, at 8:18 AM, John Neiberger <jneiberger at gmail.com> wrote:

> It was host to host, so it was really Host A to Host B and vice versa. The
> expected RTT was less than a millisecond, which is what they often got, but
> the latency would spike regularly up to as high as 24 ms. I initially
> thought it was a problem on one of the hosts but they can ping to and from
> devices on the same vlan with no variable latency. The latency only occurs
> in one direction when going from one vlan to the other. We manipulated the
> HSRP configs to shift traffic to different routers and switches but the
> behavior didn't change. From Host A to Host B we saw variable latency, but
> never ever did we see it if the ping originated from Host B even though,
> depending on the HSRP configuration, the packets were traversing exactly
> the same path. It has me completely stumped.
> 
> 
> On Thu, Sep 26, 2013 at 9:04 AM, Blake Dunlap <ikiris at gmail.com> wrote:
> 
>> This may seem like a stupid question, but when you were pinging, were you
>> pinging from hosts, or from the routers?
>> 
>> -Blake
>> 
>> 
>> On Thu, Sep 26, 2013 at 9:38 AM, John Neiberger <jneiberger at gmail.com>wrote:
>> 
>>> Thanks! I talked to our Cisco NCE about this and he gave me these
>>> commands:
>>> 
>>> show qos  interface gigabitEthernet x/y- will show you 4 queues and also
>>> whether QoS is disabled or not
>>> 
>>> sh int gi x/y counters detail -you will see packet counters in queue #1-4
>>> incrementing
>>> 
>>> Sh platform hardware interface g x/y stat | in TxB
>>> 
>>> 
>>> I'm nearly certain that this big buffer issue is the answer to my high
>>> variable latency problem, but there is still one mystery about this that
>>> has me completely perplexed. The high variable latency was only occurring
>>> in one direction (from VLAN A to VLAN B) but not in the other (from VLAN B
>>> to VLAN A). What really makes that weird is that because of some hsrp
>>> differences, we really had a circular topology for a bit. The path was
>>> *exactly* the same no matter which direction you were pinging. The ICMP
>>> packets had to travel around the same circle through the same devices and
>>> interfaces. So if we have big buffers on congested interfaces that are
>>> introducing variable latency, why would we only see it in one direction?
>>> 
>>> 
>>> When VLAN A pings VLAN B, it is the initial ICMP packet that would have
>>> been delayed, while the response would come in on a different interface
>>> that wasn't congested. When VLAN B pings VLAN A, the initial ping would
>>> not
>>> hit congested interfaces but the ping reply would. The total round trip
>>> time should have been similar, but it never was. I'm completely stumped by
>>> that. I even had Cisco HTTS on this for a couple of days and they couldn't
>>> figure it out.
>>> 
>>> 
>>> Thanks,
>>> 
>>> John
>>> 
>>> 
>>> On Thu, Sep 26, 2013 at 1:50 AM, Terebizh, Evgeny <eterebizh at amt.ru>
>>> wrote:
>>> 
>>>> Try also
>>>> "show platform hardware interface gigabitEthernet 1/1 tx-queue".
>>>> I guess it's gonna show the actual values for queue utilisation.
>>>> Please let me know if this helps.
>>>> 
>>>> /ET
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On 9/24/13 11:17 PM, "John Neiberger" <jneiberger at gmail.com> wrote:
>>>> 
>>>>> I've been helping to troubleshoot an interesting problem with variable
>>>>> latency through a 4948. I haven't run into this before. I usually have
>>>>> seen
>>>>> really low latency through 4948s, but this particular application
>>> requires
>>>>> consistent low latency and they've been noticing that latency goes up
>>> on
>>>>> average as load goes up. It didn't seem to be a problem on their
>>> servers,
>>>>> but communication through busy interfaces seemed to dramatically
>>> increase
>>>>> the latency. They were used to <1ms latency and it was bouncing up to
>>> 20+
>>>>> ms at times. I'm starting to think this is due to the shared output
>>> buffer
>>>>> in the 4948 causing the output buffer on the uplink to dynamically get
>>>>> bigger.
>>>>> 
>>>>> I've been trying to find more details on how the 4948 handles its
>>> shared
>>>>> output queue space, but I haven't been able to find anything. Do any of
>>>>> you
>>>>> know more about how this works and what commands I could use to
>>>>> troubleshoot? I can't find anything that might show how much buffer
>>> space
>>>>> a
>>>>> particular interface is using at any given time, or if it even makes
>>> sense
>>>>> to think of it that way. If I knew the size of the buffer at any given
>>>>> moment, I could calculate the expected latency and prove whether or not
>>>>> that was the problem.
>>>>> 
>>>>> Thanks!
>>>>> John
>>>>> _______________________________________________
>>>>> cisco-nsp mailing list  cisco-nsp at puck.nether.net
>>>>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>>>>> archive at http://puck.nether.net/pipermail/cisco-nsp/
>>> _______________________________________________
>>> cisco-nsp mailing list  cisco-nsp at puck.nether.net
>>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>>> archive at http://puck.nether.net/pipermail/cisco-nsp/
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/