[c-nsp] Dynamic output buffer allocation on Cisco 4948

Thu Sep 26 23:55:31 EDT 2013

Its hard to make any inferences on your vodoo 1 way round trip latency
without more detail like diagrams, so i'll take a step back and ask is this
overly delay sensitive app the main load on the switch, or just a rounding
error as far as total traffic?

If it is the first option, honestly I don't really know what you can do
besides upgrading your uplinks with the next step in speed, using more
active channels/paths, lowering your oversubscription ratio with more
hardware, or just giving up and choosing between delaying the microbursts
or dropping them. If it is the second, then have you tried setting up LLQ
and treating your app as EF?

-Blake

On Thu, Sep 26, 2013 at 10:34 PM, John Neiberger <jneiberger at gmail.com>wrote:

> Host to host on the same VLAN was always far less than 1ms RTT. I never
> once saw it go over that. It was usually far less. We only saw the problem
> when going from a host in VLAN A to a host in VLAN B, never the other way
> around. I thought this was a problem on the host in VLAN B, but any other
> server in the same VLAN could ping it with no latency problems at all.
>
>
> On Thu, Sep 26, 2013 at 9:12 PM, Fwissue <fwissue at gmail.com> wrote:
>
>> I would try host to host on the same vlan, then consider flow-control
>> impact
>>
>> Thanks
>>
>> ~mike
>>
>> On Sep 26, 2013, at 8:18 AM, John Neiberger <jneiberger at gmail.com> wrote:
>>
>> > It was host to host, so it was really Host A to Host B and vice versa.
>> The
>> > expected RTT was less than a millisecond, which is what they often got,
>> but
>> > the latency would spike regularly up to as high as 24 ms. I initially
>> > thought it was a problem on one of the hosts but they can ping to and
>> from
>> > devices on the same vlan with no variable latency. The latency only
>> occurs
>> > in one direction when going from one vlan to the other. We manipulated
>> the
>> > HSRP configs to shift traffic to different routers and switches but the
>> > behavior didn't change. From Host A to Host B we saw variable latency,
>> but
>> > never ever did we see it if the ping originated from Host B even though,
>> > depending on the HSRP configuration, the packets were traversing exactly
>> > the same path. It has me completely stumped.
>> >
>> >
>> > On Thu, Sep 26, 2013 at 9:04 AM, Blake Dunlap <ikiris at gmail.com> wrote:
>> >
>> >> This may seem like a stupid question, but when you were pinging, were
>> you
>> >> pinging from hosts, or from the routers?
>> >>
>> >> -Blake
>> >>
>> >>
>> >> On Thu, Sep 26, 2013 at 9:38 AM, John Neiberger <jneiberger at gmail.com
>> >wrote:
>> >>
>> >>> Thanks! I talked to our Cisco NCE about this and he gave me these
>> >>> commands:
>> >>>
>> >>> show qos  interface gigabitEthernet x/y- will show you 4 queues and
>> also
>> >>> whether QoS is disabled or not
>> >>>
>> >>> sh int gi x/y counters detail -you will see packet counters in queue
>> #1-4
>> >>> incrementing
>> >>>
>> >>> Sh platform hardware interface g x/y stat | in TxB
>> >>>
>> >>>
>> >>> I'm nearly certain that this big buffer issue is the answer to my high
>> >>> variable latency problem, but there is still one mystery about this
>> that
>> >>> has me completely perplexed. The high variable latency was only
>> occurring
>> >>> in one direction (from VLAN A to VLAN B) but not in the other (from
>> VLAN B
>> >>> to VLAN A). What really makes that weird is that because of some hsrp
>> >>> differences, we really had a circular topology for a bit. The path was
>> >>> *exactly* the same no matter which direction you were pinging. The
>> ICMP
>> >>> packets had to travel around the same circle through the same devices
>> and
>> >>> interfaces. So if we have big buffers on congested interfaces that are
>> >>> introducing variable latency, why would we only see it in one
>> direction?
>> >>>
>> >>>
>> >>> When VLAN A pings VLAN B, it is the initial ICMP packet that would
>> have
>> >>> been delayed, while the response would come in on a different
>> interface
>> >>> that wasn't congested. When VLAN B pings VLAN A, the initial ping
>> would
>> >>> not
>> >>> hit congested interfaces but the ping reply would. The total round
>> trip
>> >>> time should have been similar, but it never was. I'm completely
>> stumped by
>> >>> that. I even had Cisco HTTS on this for a couple of days and they
>> couldn't
>> >>> figure it out.
>> >>>
>> >>>
>> >>> Thanks,
>> >>>
>> >>> John
>> >>>
>> >>>
>> >>> On Thu, Sep 26, 2013 at 1:50 AM, Terebizh, Evgeny <eterebizh at amt.ru>
>> >>> wrote:
>> >>>
>> >>>> Try also
>> >>>> "show platform hardware interface gigabitEthernet 1/1 tx-queue".
>> >>>> I guess it's gonna show the actual values for queue utilisation.
>> >>>> Please let me know if this helps.
>> >>>>
>> >>>> /ET
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> On 9/24/13 11:17 PM, "John Neiberger" <jneiberger at gmail.com> wrote:
>> >>>>
>> >>>>> I've been helping to troubleshoot an interesting problem with
>> variable
>> >>>>> latency through a 4948. I haven't run into this before. I usually
>> have
>> >>>>> seen
>> >>>>> really low latency through 4948s, but this particular application
>> >>> requires
>> >>>>> consistent low latency and they've been noticing that latency goes
>> up
>> >>> on
>> >>>>> average as load goes up. It didn't seem to be a problem on their
>> >>> servers,
>> >>>>> but communication through busy interfaces seemed to dramatically
>> >>> increase
>> >>>>> the latency. They were used to <1ms latency and it was bouncing up
>> to
>> >>> 20+
>> >>>>> ms at times. I'm starting to think this is due to the shared output
>> >>> buffer
>> >>>>> in the 4948 causing the output buffer on the uplink to dynamically
>> get
>> >>>>> bigger.
>> >>>>>
>> >>>>> I've been trying to find more details on how the 4948 handles its
>> >>> shared
>> >>>>> output queue space, but I haven't been able to find anything. Do
>> any of
>> >>>>> you
>> >>>>> know more about how this works and what commands I could use to
>> >>>>> troubleshoot? I can't find anything that might show how much buffer
>> >>> space
>> >>>>> a
>> >>>>> particular interface is using at any given time, or if it even makes
>> >>> sense
>> >>>>> to think of it that way. If I knew the size of the buffer at any
>> given
>> >>>>> moment, I could calculate the expected latency and prove whether or
>> not
>> >>>>> that was the problem.
>> >>>>>
>> >>>>> Thanks!
>> >>>>> John
>> >>>>> _______________________________________________
>> >>>>> cisco-nsp mailing list  cisco-nsp at puck.nether.net
>> >>>>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>> >>>>> archive at http://puck.nether.net/pipermail/cisco-nsp/
>> >>> _______________________________________________
>> >>> cisco-nsp mailing list  cisco-nsp at puck.nether.net
>> >>> https://puck.nether.net/mailman/listinfo/cisco-nsp
>> >>> archive at http://puck.nether.net/pipermail/cisco-nsp/
>> > _______________________________________________
>> > cisco-nsp mailing list  cisco-nsp at puck.nether.net
>> > https://puck.nether.net/mailman/listinfo/cisco-nsp
>> > archive at http://puck.nether.net/pipermail/cisco-nsp/
>>
>
>