[c-nsp] Weird Multicast microburst amplification issue

Jeff Bacon bacon at walleyesoftware.com
Tue Dec 13 09:30:24 EST 2011

> It definitely looks like a classic microburst output buffer
> overflow problem, but with a Sup720 and a 6748 module, I
> haven't seen this at this volume. Ticker volume has peeked
> recently, and that might contribute to it. It appears to start
> happening with more than 120Mbps and/or 12,000 pps output on
> the port. Other than moving to 10GB, I don't see any solutions.
> Given the 6748 buffer size, I'm surprised it's overrunning it
> at this volume.

It could very well be. The port buffers on a 6748 are only
about so big, after all. 

The amplification factor may come from the simple fact that
you have a 10G pipe between the switches. "But the input is
only coming in at 1G!" you say. Yes, but it's then being
intermingled on the 10G pipe. Probably on a 6708/6716 with
200mb port buffers. After going through replication engine. 
(Ingress mode? Egress? Shouldn't matter though.) So while in
theory the traffic should be coming through the 10G at 1G
rates, it isn't necessarily, and you have to consider the
possibility that you are, yes, facing the ol' 10G->1G
neck-down problem. 100 packets @ 1500 bytes == 1.5mbyte ==
buffer go boom.

If the packets are large, you also have serialization delay to
consider. What takes 3 micros to get out the 1G pipe only takes
1 micro to come in the 10G pipe. Multiply. 

I'm not going to point at any of these and say "that's it" - 
but I can see where it can happen, as annoying as crap as it
might be. Someone suggested running a 1G pipe between the
switches to see whether the problem went away - I suspect
that is what they were pointing at.

I've been moving hosts off the 6500s and onto 10G off aristas
fed off the 6500s. Let the 6500 drive the WAN, the aristas
handle fan-out.  I am actually sitting here debating swapping
out a pair of VS720s with sup-2T kit - not even because the hardware
is working particularly hard as-is (by the stats, they've got
life downright good) but because sledgehammer overkill seems
to be about the only safe option in dealing with these kinds
of flows, I know it will take me 3 months to swap the sups 
and cards out, and it might be better to start now, however
little thrilled I am at forking $20k more per switch than I had 
originally intended (the VS720 parts swapped would be used to
populate some new chassis in a DR/test facility, the original
idea was just to buy VS720 parts, but my vendor came up with
better prices on sup2t kit than I'd seen even a couple
months ago so now it's just in the range of "argh...maybe..."
instead of "no, way too much").

Or to quote one of my employees, "who knows what MOAR will 
be asked for next..." - or, when will OPRA blow the cap again?

I suspect you just helped me decide.


More information about the cisco-nsp mailing list