[c-nsp] Out of order queuing

Sat Oct 24 12:04:06 EDT 2009

> We have a customer with load-balanced path to us.  TCP throughput is
> affected by some out-of-order packets, and we were looking for a way
> to queue the interface in order to try and mitigate this.  Is it
> possible to use any queueing mechanism to re-order packets received
> from this customer before transmitting them, even at the cost of
> latency?!

> I tried experimentation with CBWFQ with little to no success.  Any
> tips?

You mean you are receiving packets out-of-order from your customer, and
want to bring them in the right order again? I don't think you'll find
this in any standard router.  Maybe there are specialized "WAN
accelerator" devices that can do it.  The problem is that to find the
correct order, you'd need to track state; in particular, per-flow
state.

Many years ago, we used "inverse multiplexers" to bundle multiple (very
expensive transatlantic) T1/E1 links.  This worked very well, and did
exactly what you describe: make the bundle behave exactly like a single
faster link, but at the cost of additional latency, mostly because the
links had different delays.  You do need them on both ends of the bundle
- not just on the receiving end.  We bought the inverse multiplexers
when the built-in router load balancing (per destination at the time,
that was pre-CEF) failed to achieve full utilization at three or four
parallel E1s.  The inverse mux allowed us to fill six links, after which
we moved to different technology.  That was highly aggregated traffic;
throughput of individual flows wasn't much of an issue then, although it
was probably also very good with that solution.

What kind of a bundle (bandwidth, underlying technology) is your
customer connection?

Is per-flow throughput really so important? For many use cases, there
are now solutions that can use multiple flows; e.g. download managers or
parallel SSH variants for bulk file transfers.  If those can be used,
then you can use load-balancing techniques based on L4 connection hashes
(at least 7600/Cat6500 can do this in hardware) and get good utilization
without per-flow reordering.
-- 
Simon.