[c-nsp] Making SUP720 cope better under BGP load

Nick Hilliard nick at foobar.org
Sun Dec 9 16:00:46 EST 2012


On 07/12/2012 12:36, Simon Lockhart wrote:
> its knees. The "BGP Router" process takes all the available CPU while it tries
> to re-establish the BGP sessions. While this is happening, the SUP720 seems to
> give up processing other stuff in a timely manner - and I see MPLS LDP drop,
> OSPF neighbours drop, and then BGP sessions drop due to hold timer expires.
> With all these drops, it causes even more CPU load, and the cycle continues.
> 
> I've been talking to other SUP720 using ISPs, and it seems that some see this
> same effect, and others don't.

There are two problems here: IOS and a slow cpu; one exacerbates the effect
of the other.  IOS is a non-preemptive multitasking system, and if a single
process on it decides to suck up all available CPU - particularly a high
priority process like the bgp router - then other processes will suffer.
This is why ldp and ospf sessions drop.  The BGP sessions also drop, but
that's an internal "BGP Router" process scheduling thing which causes the
code which generates bgp keepalives not to be run as often as necessary.
When keepalives are not sent, bgp sessions are torn down, which causes
churn, which causes more cpu load which causes keepalives not to be sent, etc.

This is a classic performance knee problem brought on by insufficient cpu
resources and a poor quality scheduler and there's no way of fixing it.
Some versions of the SX train appear to cope slightly better than others,
but as I haven't run a sup720 on a large IXP, I'm not going to give advice
about which ios versions to try.  You could play around with the "scheduler
interval" command, but I doubt it would make any difference.

fwiw, ixp route server operators running quagga bgpd ran into almost
exactly the same performance knee.  The fix for this was to run bgp
keepalives in a separate thread in the quagga bgp daemon, but you can only
do that on an operating system with pre-emptive threading support.  Maybe
one day if Cisco split the bgp router out the iosd process on XE, that
might help things as a long term approach towards dealing with this
problem, but I don't think that we'll ever see XE on the sup720.

We hit 300k prefixes in the dfz in July 2009 and 400k in Feb 2012 which
works out as 33% growth in 2.5 years.  However badly a sup720 is handling
large IXP operation now, it's not going to get any better.  Unfortunately,
the sup720 is not suitable for DFZ operation these days for this among a
variety of other reasons including poor ipv6 support, difficulties with
control plane policing, bad netflow implementation and several other things.

> And, as a follow-on question, given that the SUP720 is so under-powered for
> BGP, what other options do I have which would cope better? SUP-2T? Or, if
> I need to move away from the 6500, what's good for BGP routing with about 
> 20-40G of throughput (i.e. 4-8 * 10GE ports)? How does the ASR9k or ASR1k
> range fair for BGP performance?

The ASR1k doesn't look to me like a good choice for raw packet forwarding
at 10G+ due to high cost and limited performance (although it can do quite
smart stuff at lower speeds if you need that instead).  If you can live
with less than 12 x 10GE ports over the expected operational lifetime of
the unit, the ASR9001 is ravishing.

The SUP2T also looks good and may be a cheaper option than either while
providing an overall greater port density.  Be aware that if you're
upgrading from an older version, you're either going to be stuck with the
limitations of the 6704-10ge line cards or if you're using 6708-10ge cards,
you will need to replace the lot of them with 6908-10G line cards - the
6708-10ge cards will not work at all with the sup2t.  Inexplicably, Cisco
have continued with their lovefest for X2 optics on the newer sup2t line
cards rather than conforming with the industry standard of sfp+ or xfp.

If you budget stretches to it / you're doing green-field stuff, the chassis
based asr9000 is the platform of choice for larger installations these
days, but this can work out quite expensive per port.

Nick



More information about the cisco-nsp mailing list