[c-nsp] realistic max full bgp peers on a sup720?

Fri May 28 01:10:33 EDT 2010

On May 27, 2010, at 11:37 PM, matthew zeier wrote:

> Running into performance issue with a couple 6503/Sup720-3BXL routers with about 8 or more peers.  Each peer's sending a full BGP table.

Anemic ppc cpu blues? performance issues? say it aint so!

(fwiw, rsp720 is not much of an improvement. maybe the sup2t will be, but i suspect it will not be faster than other platforms available from C today.)

> If a couple peers flap, the box typically stays at 100% long enough to either drop more peers or drop OSPF.  

Yup!

> Cisco's site is vague, only mentioning 1m v4 routes.

in your case, it's not a tcam/pfc programming issue, it's a bgp rib update & cef fib update process interaction (understatement?) which is likely causing cpu soak-age, and for you, trouble.

I'd suggest a simple tweak, starting with:

process-max-time 20

then, within the ospf router config:

process-min-time percent 20

if you already have these configured, then I guess it's time for an upgrade to mx80 or 240 (or asr, or crs, I guess).

in the unlikely event that your msfc is somehow getting 'slammed' with punts or other let-throughs from the pfc during said flaps (i.e. actually forwarding some traffic, versus handling only protocol chatter), you may want to also add this (or something like it) to your config:

scheduler allocate 8000 4000

FWIW, a test box running 12.2(18)SXF17a on a sup2/msfc2/pfc2 has six active peers sending a full table (filtered upon reception to ~200k each), with peer overlap/uniques, the fib ends up holding about ~240k (just under pfc2 max). The timers for all six neighbors are:

 neighbor transit-in peer-group
 neighbor transit-in timers 15 45

I did observe 'cascading bgp flapp-age' when I had bgp session timers set to 2 second hellos, and 10 second dead intervals.

With 15 hello/45 dead, administratively flapping any/any/subset of neighbors doesn't affect anything on the lab box. LDP and OSPF are tuned to short-ish values (1 sec helos, 3 to 4 sec dead for each), and none flap when bgp does 'stuff.' 

Even worse, a few sessions of BFD are running alongside this mix, and also seems fine. An example of the otherwise cool-runingness on this particular low-end platform is:

MinTxInt: 50000, MinRxInt: 100000, Multiplier: 4
Received MinRxInt: 100000, Received Multiplier: 4
Holdown (hits): 400(0), Hello (hits): 100(1981407)
Rx Count: 2032111, Rx Interval (ms) min/max/avg: 80/104/90 last: 76 ms ago
Tx Count: 1981412, Tx Interval (ms) min/max/avg: 80/112/92 last: 20 ms ago
Registered protocols: OSPF
Uptime: 2d03h

Best to lab this up, in the hopes you're able to wring out said demons in a somewhat more controlled environment.

-Tk