[c-nsp] Troubleshoot UDP out-of-sequence

Lamar Owen lowen at pari.edu
Mon Sep 12 16:08:16 EDT 2011


On Monday, September 12, 2011 07:38:38 AM Persio Pucci wrote:
> I am having some problems trying to figure out what could be causing UDP
> packets get out-of-sequence on some multicast streams (market data) between
> Sao Paulo and New York.

You may know all of what I'm going to say below... but just in case, I'll say it anyway.

Quoting RFC 768 (User Datagram Protocol):
"This protocol  provides  a procedure  for application  programs  to send messages  to other programs  with a minimum  of protocol mechanism.  The protocol  is transaction oriented, and delivery and duplicate protection are not guaranteed.  Applications requiring ordered reliable delivery of streams of data should use the Transmission Control Protocol (TCP) [2]."

Unless the routers/bridges/transport can be set up to guarantee in-order delivery, by its very nature UDP does not guarantee in-order delivery, and so the network equipment doesn't take pains to guarantee in-order forwarding.  The problem is that in-order UDP works most of the time, and your 0.001% out of order rate is very good indeed, in my experience, and so we tend to rely on something that was never intended to be reliable. (all of the above is true with TCP, too, it's just that TCP contains protocol fields that track the packet order and mechanisms that resequence the packets inside the end hosts' protocol stacks)

The transport, in your case OC12 PoS, may very well be implemented as a ring or a set of rings with your OC12 portion add/drop multiplexed at multiple locations, and packet re-ordering could in corner cases occur within the SDH framer ASIC on the transport cards; it may depend on where the packet falls within the frame, and maybe even which line of the frame it's on, or even perhaps if it spans frames.  Then again, are PoS links even guaranteed ordered-delivery like IEEE 802 links are?

You didn't say what kind of routers were in use, nor did you say what other sort of traffic is crossing this WAN link, as those may be factors in the UDP stream getting out of order.  It could be a bug causing the packets to become out-of-order, but no router vendor is going to consider out-of-order UDP packets as a bug.

You may want to review RFC3048 ( https://tools.ietf.org/html/rfc3048 ) and see how other protocols deal with the issues of multicast and reliability.

If the application relies on 100% in-order delivery, it shouldn't be using straight UDP.  The PANA protocol, for instance, adds sequence numbers for EAP packets due to its transport being UDP, and due to EAP being packet-order sensitive.

See also:
http://msdn.microsoft.com/en-us/magazine/cc163648.aspx which says: "With UDP, no connection is maintained—each packet sent is treated independently. UDP makes no attempt to control transmission speeds based on congestion. If a packet is lost, the application must detect and remedy the situation. If a packet arrives out of order, you're on your own again."  Source: Microsoft.

Also see: http://blogs.oracle.com/lmukadam/entry/tcp_udp_unicast_multicast_i_th which says "You might think that UDP is unreliable, because, you know, TCP is supposed to be the reliable one of the siblings. But in fact, over the same network segment, or over LANs with good quality gear and not excessive traffic, UDP is in practice very reliable. If there's no packet loss and packets arrive in order (which is almost always the case on a short LAN link), there's no need for any retransmissions of packets, so all the acknowledgements and waiting around of TCP is just a bunch of wasted overhead, creating latency."  Almost always, on a short LAN link, you can get 100% UDP packet order, but not always.  Source: Oracle.

And last, from the page of a package providing reliable, guaranteed in-order multicast packet delivery (not using UDP), http://www.jgroups.org/overview.html , we have: "UDP is unreliable, packets may get lost, duplicated, may arrive out of order, and there is a maximum packet size restriction. "

Getting 0.001% out-of-order is fantastic results for a protocol not designed to guarantee packet order.

It may be difficult to determine just exactly what is causing a particular packet to 'leap' ahead or drop behind other packets; much may depend upon the exact traffic mix and the exact vendor OS version and hardware type.  



More information about the cisco-nsp mailing list