[c-nsp] The dreaded microburst - definition and troubleshooting

Fri Apr 24 12:56:21 EDT 2009

The definition I generally use is this:

[snip]
A microburst is when packet drops occur when there is not sustained or
noticeable congestion upon a link or device.

Example: The 1 minute utilization of a link is 20% and packet drops are
occurring.

Microbursts happen in every packet based network where flow control is
not extended end to end in all types of switches both blocking and
non-blocking.
[end snip]

I principally work on high end switching and the 4 main ways we see it
occur:
- Speed Mismatch (10G into 1G, 10G into 10M) More extreme the speed
mismatch the more dramatic the issue.
- Network Oversubscription, Example 20 1G hosts using 1x10G uplink
- L2 Unicast Flood - Synchronization of flood from multiple hosts
- L2 Multicast - Synchronization of multicast from multiple hosts in
large any to any multicast environments.

Microbursts are how packet networks work that do not have end to end
flow control. (End to end flow control is no panacea, you then have to
create  ways to prevent deadlock situations)  You can use larger buffers
to mask the issue, but that increases the latency and causes jitter. You
can end up with the situation of packets arriving in extreme cases tens
of seconds latter. Dropping packets is not the end of the world, it puts
a limit on how large the latency and jitter can grow.

Ian

Dale Shaw wrote:
> Hi all,
> 
> Is there a universally agreed upon definition for a 'microburst'?
> 
> Is there a defined time measurement - i.e. 5ms, 10ms, 50ms, 100ms,
> 1000ms - during which a certain bps or pps threshold must be
> met/exceeded?
> 
> Does anyone have any tips for troubleshooting microbursts,
> particularly in relation to the c7200 platform exhibiting "no buff"
> drops? We're going to capture some data (w/SPAN on an adjacent switch)
> but it would be nice to be able to look at the data and somehow marry
> it up with incrementing drop counters on the affected c7200 interface.
> 
> It would be nice to be able to explain such drops like "within the
> measurement window, we saw traffic at bps/pps rate x, and we know that
> anything beyond bps/pps rate y will result in drops".
> 
> I suppose it's platform-specific, but how does one come up with an
> accurate benchmark? Is such precision just wishful thinking in the
> murky world of microbursts? :-)
> 
> cheers,
> Dale
> _______________________________________________
> cisco-nsp mailing list  cisco-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-nsp
> archive at http://puck.nether.net/pipermail/cisco-nsp/
>