[j-nsp] ATM PIC and congestion

Wed, 30 Oct 2002 17:06:17 -0600

Those of you on the NANOG list may remember seeing from me an inquiry
there a short while back asking if anyone saw congestion at the Chicago
SBC/AADS NAP.

After much hair pulling by myself, my team and vendors, we've finally
figured out a latency problem that ended up being due to the way the
Juniper ATM1 PIC handles high traffic loads.  The following is a summary
of the problem and some information about ATM buffer management that is
not yet available publicly from Juniper (I was told it was OK to share).

We had an M5 with an OC3c at the Chicago SBC/AADS NAP using the Juniper
ATM1 PIC.  We had a high rate of utilization on the entire link, but
traffic on the outbound to the Internet was maxed, with the highest
percentage of outbound traffic going to our primary upstream.  Recently
we begun experiencing latency on the order of hundreds of milliseconds.

At first it looked like a latency problem on multiple PVCs, but
eventually realized it was concentrated on our PVC with the most
outbound traffic.

After checking for latency in the ATM switch network, on the far end and
problems with our own gear, Juniper support determined that it appeared
to be how the ATM interface buffers operate.  So we tweaked those queue
lengths and the latency problem went away.  Now instead we get packet
drops, but that is more normal and we'll handle that in other ways (the
CoS and the ATM2 PIC thread today is very relevant for us as you might
guess).  Below is some information some of you may find useful.  This
information was apparently written by a Juniper escalation engineer and
hasn't made it into any available documentation yet.

  ATM1 PICs contain a transmit buffer pool of 16382 buffers, which 
  are shared by all PVCs currently configured on the PIC.  Even on 
  multi-phy ATM PICs, there is still a single buffer pool shared 
  by all the PHYs.

  By default, the PIC allows PVCs to consume as many buffers as they
  require.  If the sustained traffic rate for a PVC exceeds it's shaped 
  rate, buffers will be consumed.  Eventually, all buffers on the PIC 
  will be used, which will starve other PVC's.  This results in 
  head-of-line blocking.

  The queue-length parameter can be set (on a per-PVC basis) to prevent 
  this situation.  It sets a limit on the number of transmit packets 
  (and ultimately buffers) that can be queued up to a PVC.  New packets 
  that would exceed this limit get dropped (ie: tail-dropping).

  queue-length, configured under the shaping heirarchy, represents the
  maximum number of packets which can be queued for the PVC using the
  global buffers. It should be configured for all PVCs when more than 
  one PVC is configured on an ATM1 PIC. It perfroms two functions. 

  1) It stops head of line blocking occuring since it limits the 
  number of packets and hence buffers that can be consumed by each
  configured PVC.

  2) It sets the maximum lifetime which can be sustained by packets
  over the PVC when traffic has oversubscribed the configured shaping 
  contract.

  The total of all the queue-length settings must not be greater then 
  the total number of packets which can be held in the buffer space 
  available on the PIC.

  The total number of packets which can be held by the buffers is 
  calculated dependent on the MTU setting for the interfaces on the 
  PIC. The MTU used should include all ecapsulation overhead and hence
  is the physical interface MTU. The following formula can be used to 
  calculate the total number of packets the buffer space can hold:

  16,382 / ( Round Up ( MTU / 480 ) )

  For exmaple, with the default MTU settings for ATM1 PIC interfaces, 
  the total for the number of packets which can be held is:

  16,382 / ( Round Up ( 4,482 / 480 ) ) = 1638 packets.

  Thus, when configuring the queue-lengths for each of the PVCs 
  configured on an ATM1 PIC using default MTU settings, they must not 
  total to more then 1638. They can total to less.

  Setting a queue-length to a very low value is possible, yet doing
  this risks not being able to buffer small bursts in packets transiting
  the PVC.

  The maximum lifetime which could be sustained by packets transiting 
  a PVC can be calculated dependent on the shaping rate configured for 
  the PVC, the setting for queue-length and the MTU. The following
  formula can be used:

  ( PVC queue-length in packets x MTU ) / ( PVC shaping in bits per
  second / 8 )

  For example, say a PVC is configured on an ATM1 PIC interface with the

  default MTU and a CBR shaping rate of 3,840,000bps (10,000 cells per 
  second). The queue-length has been set to 25 packets. The maximum 
  lifetime is:

  ( 25 x 4,482 ) / ( 3,840,000 / 8 ) = 233ms.

  This is the worst case lifetime assuming all packets in the queue are
  MTU  sized and the traffic using the PVC is oversubscribing its  
configured  shaping contract.

  In general its a good design practice to keep this maximum lifetime to
  a value under 500ms.

So, if you've got high load and high latency over your ATM1 PIC, you may
need to tweak your lengths using the info above.

John