[c-nsp] WRR Confusion on 6748 blades

Wed Jun 27 13:05:52 EDT 2012

> queue-limit and bandwidth values (ratios/weights) are *different* things.
>
> The queue-limit physically sizes the queue. It says how much of the total
> physical buffer on the port is set aside exclusively for each class (where
> class is based on DSCP or COS). Traffic from other classes can NEVER get
> access to the buffer set aside for another class, ie, there could be plenty
> of available buffer in other queues even as you're dropping traffic in one
> of the queues.
>
> The bandwidth ratios, on the other hand, determine how frequently each of
> those queues is serviced, ie, how often the scheduler will dequeue/transmit
> a frame from the queue. If there is nothing sitting in one queue, other
> queues can get access to that bandwidth, ie, "bandwidth" is not a hard
> limit, you can think of it as a minimum guarantee when there is
> congestion/contention.
>

That part I think I understand. Mostly.  :)  When I say bandwidth in
this context, i'm referring to the bandwidth ratio weight.

>> are fairly hard limits. That is in line with what we
>> were experiencing because we were seeing output queue drops when the
>> interface was not fully utilized. Increasing the queue bandwidth got
>> rid of the output queue drops.
>
>
>
> What this should be doing is just causing us to service the queue more
> frequently. That could certainly reduce/eliminate drops in the event of
> congestion, but only if there is traffic in the other queues that is also
> contending for the bandwidth.
>
> In other words, if there is only one active queue (ie only one queue has
> traffic in it), then it can & should get full unrestricted access to the
> entire link bandwidth. Can you confirm whether there's traffic in the other
> queues?
>

I'm not certain whether or not we have traffic in the other queues. In
nearly all cases, the output drops are all in one queue with zero in
the other queues. That seems to indicate that either all of our
traffic is one queue or there just isn't a lot of traffic in the other
queues.

>
>
>> For one particular application
>> traversing this link, that resulted in a file transfer rate increase
>> from 2.5 MB/s to 25 MB/s. That's a really huge difference and all we
>> did was increase the allocated queue bandwidth. At no point was that
>> link overutilized.
>
>
>
> We frequently see 'microburst' situations where the avg rate measured over
> 30sec etc is well under rate, but at some instantaneous moment there is a
> burst that exceeds line rate and can cause drops if the queue is not deep
> enough. Having a low bandwidth ratio, with traffic present in other queues,
> is another form of the queue not being deep enough, ie, the queue may have a
> lot of space but if packets are not dequeued frequently enough that queue
> can still fill & drop.
>
>
>
>> In fact, during our testing of that particular
>> application, the link output never went above 350 Mbps. We used very
>> large files so that the transfer would take a while and we'd get a
>> good feel for what was happening. Doing nothing but increasing the
>> queue bandwidth fixed the problem there and has fixed the same sort of
>> issue elsewhere.
>
>
> This suggests to me that there is traffic in other queues contending for the
> available bandwidth, and that there's periodically instantaneous congestion.
> Alternatively you could try sizing this queue bigger and using the original
> bandwidth ratio. Or a combination of those two (tweaking both bandwidth &
> queue-limit).
>
> Is there some issue with changing the bandwidth ratio on this queue (ie, are
> you seeing collateral damage)? Else, seems like you've solved the problem
> already ;)

Nope, we don't have a problem with it. That's what we've been doing.
We haven't really been adjusting the queue limit ratios, though. In
most cases, we were just changing the bandwidth ratio weights. I'm
looking at an interface right now where the 30-second weighted traffic
rate has never gone above around 150 Mbps but I'm still seeing OQDs in
one of the queues only. How do you think we should be interpreting
that?

>
> Hope that helps,
> Tim

It helps a lot! thanks!

John