[c-nsp] Hierarchical QoS Policies

Tue Apr 12 04:25:43 EDT 2005

Hi Luan,

Thanks for rebuilding my lab test.

On Fri, Apr 08, 2005 at 04:06:28PM -0400, Luan Nguyen wrote:
> I looked at your set up with 2691 running 12.3.13 and 1841 running 12.3.8T6
> as well as 12.3.14T and they all have the same results as yours.

So this clearly is the normal behavior.

> Basically, like what you stated:  jumped from 10ms to ~120 or so = the
> serialization delay of 1500 bytes data over the 100kbps link.
> So it does apply to the shaper as well.  From what I understand of shaper,
> it just smooth out bursty traffic.

Yep. One could implement a shaper using a virtual serialization that is
slower than the native one of a given interface but else directly using
that interfaces queues and semantics. But it was my impression that the
IOS MQC shaper is an entity of its own which - at least theoretically -
would allow to deal with LLQ packets in a special way. But if it is not
implemented that way, I'll have to live with it. At the real world
bandwidths I'm currently deploying at (2320kbps raw SDSL/ATM cell
bitrate), there is not much of a problem with RTT and jitter as RTT
averages around 10ms and jitter is only seeing some stray RTTs of
40ms or such. The customer has, on the other hand, sites connected
with 262kbps SDSL and there, I'm getting into an RTT and jitter area
that is not really nice for voice. Seemingly the only way around that
is upgrading them, which is a good idea anyway ;)

> "A shaper typically delays excess traffic
> using a buffer, or mechanism, to hold packets and shape the flow when the
> data rate of the source is higher than expected. Traffic shaping smoothes
> traffic by storing traffic above the configured rate in a queue. Therefore,
> shaping increases buffer utilization on a router, but causes
> non-deterministic packet delays.

That's the point: Non-deterministic delays. When I'm able to combine
a shaper (which in best effort mode will introduce non-deterministic
delays, no question) with priority queuing aka low latency queueing,
I'd expect to get back deterministic delays for that traffic class.
As applying or not applying LLQ doesn't significantly change RTTs and
jitter, it means the LLQ doesn't make them more deterministic. At the
moment we seem to have to live with that, but maybe Cisco enhances that
code path one day.

> the token bucket mechanism used for traffic shaping has both a
> token bucket and a queue used to delay packets. If the token bucket did not
> have a data buffer, it would be a policer. For traffic shaping, packets that
> arrive that cannot be sent immediately (because there are not enough tokens
> in the bucket) are delayed in the data buffer.

Ok, if that is the way it is implemented, it explains the virtual head
of line blocking that takes place: If a 1500 byte data packet just
emptied the token bucket while beeing given to the real queues, any
following packet (be it LLQ or not) will have to wait for the TB to
refill to the amount required for that packet to pass, what will look
essentially like a serialization delay. You could queue an LLQ packet
in front of a data packet that is currently beeing delayed, but you
cannot work around the "just-emptied-the-TB" case this way. Maybe that
explains it all.

> Packet flow is implemented using three queues. The first, the shaping queue,
> is WFQ-based and shapes traffic according to the specified rate using a
> token bucket model.  This WFQ-based scheduling on the entry of the shaper
> provides fair scheduling within a traffic class (the match all default class
> you have set up). This queue dispatches packets to the software queue, which
> may be configured with other queuing mechanisms (PQ, CQ, WFQ or FIFO). If

In our example, it made absolutely no difference whether the interface
is in WFQ or FIFO mode. I haven't tested anything else. The mentioning
of PQ is interesting, though: Should I try to PQ on the Ethernet and
give ICMP absolute priority there? Worth a try I think, but I expect it
will not change anything as packets are delayed by the shaper, not by
the interface.

> the software queue is empty, traffic is forwarded directly to the output
> hardware queue."
> Like someone else said, with serial link, one could use LFI to help out a
> bit, but in the ethernet (third party modem DSL), can't do anything about
> it.  Ip TCP adjust-mss doesn't consistently help with what I tested.  I used
> ftp to push traffic through the shaper.  On the router itself, are there any
> commands/ways/debugs to look at how ip tcp adjust-mss work and work
> properly?

I don't know - easiest way to test would be a sniffer to see whether the
TCP segments are really at the expected size. If they are and it doesn't
really help, maybe things are more complicated than we currently assume.

> P.S  If I change the shaper config to:
> policy-map TS
>  class shape
>   bandwidth 100
>   shape peak 100000
>   service-policy icmp-priority
>  
> I get a better RTT out of the pings.

Me too, but:

a) This required me to remove WFQ from the interface where the policy is
   applied in egress direction
b) It gives better RTTs, but it doesn't any longer shape as expected.
   The measured download rate of the HTTP GET increased to 23kByte/s
   which is approximately twice the rate I was supposed to shape to.
c) When using "shape average" instead, it doesn't make a difference
   whether the "bandwidth" statement is applied or not (I used
   class-default, though, which might be a special case).

I assume (but have not tested) that tuning of "shape peak" in a way where
it would come out with a total limit similar to "shape average", the RTTs
would go up the same way as they do with shape average directly. Just
increasing the shape average to 200000bps essentially comes out with
similar RTTs and throughput.

> As for the bitrate, I use CPE behind third party modem to do vpn, so I just
> do like Cisco suggested..."rate limit the traffic transmitted towards the
> DSL modem to the rate that we are guaranteed from the Service Provider on
> the DSL link"

I'm having a chain of three boxes now:

                                  +-I Voice VLAN
                                 /
ATM/SDSL----CPE--I--PIX--I--C1760
                                 \
                                  +-I Data VLAN

The CPE is Carrier proprietary but a Router (we are running routed IP in
SNAP encapsulation and centrally terminate with ATM), the PIX is doing the
VPN stuff (it was there already so no reason to do that on the 1760 even
if that would be possible). So the C1760 is VLAN trunking to the Cat3550s
for the Voice, Data and PIX Transit broadcast domains and finally builds
a GRE tunnel with OSPF routing to a central 3745.

I have to do the same here (and of course in mirror image mode on the
3745): Limit the traffic that goes out via the carriers lines to what
they gurantee. I'm not using rate-limit though, as this would badly
break voice - shaping is behaving way more nicely when it comes to
leaving packets alive.

-- 
                  The _S_anta _C_laus _O_peration
  or "how to turn a complete illusion into a neverending money source"

-> Andre Beck    +++ ABP-RIPE +++    IBH Prof. Dr. Horn GmbH, Dresden <-