[c-nsp] 7600 RSP720 SRD4 bgp bounce triggers cpu exhaustion

Wed Nov 2 15:56:32 EDT 2011

I am having the same problem and have a case open with Cisco.
We use the 7600s as route reflectors to a number of neighbors (20+).
BGP is a high priority process and the number of packets exchanged is huge (100K?).
Of course they all have to get assembled in the input queue before they get processed.
We are seeing the same types of issues and have done some packet captures and event logs.
We have also seen this on the GSR platform running 12.0(33) code.
We aren't seeing the Input process spike but we are seeing the BGP process eat the CPUs lunch.

LR Mack McBride
Network Architect

-----Original Message-----
From: cisco-nsp-bounces at puck.nether.net [mailto:cisco-nsp-bounces at puck.nether.net] On Behalf Of Steven Raymond
Sent: Tuesday, November 01, 2011 11:40 AM
To: Cisco Network Service Providers
Subject: Re: [c-nsp] 7600 RSP720 SRD4 bgp bounce triggers cpu exhaustion

On Nov 1, 2011, at 4:57 AM, Nick Hilliard wrote:
> this can be caused by routing loops which happen because the RIB gets out
> of date with respect to the rest of the network.  This will result in lots
> of icmp ttl-expired packets to be generated by the RP, which will cause IP
> Input to get excited.  Try configuring the following:
> 
> mls rate-limit all ttl-failure <xxx>
> mls rate-limit all mtu-failure <xxx>
> mls rate-limit unicast ip icmp redirect <xxx>
> mls rate-limit unicast ip icmp unreachable <xxx>
> 
> If you already have lots of other mls rate limiters configured, you may
> need to be careful when configuring these, because there are more mls rate
> limiter options than the hardware is capable of supporting at the same time.

Thanks for the commands.  Have you seen this shortcoming particularly with the RSP720?  My experience is that the SUP720s are getting pretty tired CPU-wise, but seemed that after upgrading to RSPs they have a somewhat more "legs", so to speak.  Is it simply too much to expect that these processors handle a fairly large bgp reset without falling apart?  As I mentioned before, am pretty certain this wasn't happening with SRC code, and about the only thing feature-wise I've added since then is some HSRP groups.

One thing I did notice since starting this thread is that my bgp fall-over command needed some work.  While the core router was correctly dropping the lost bgp routes immediately on link down, the other IBGP routers were not, presumably because they still had a route to the now down neighbor via a less specific backbone aggregate.  So they were waiting on BGP keepalives to fail, and I presume still forwarding packets down the invalid path for a while.  Using "neighbor .. fall-over route-map meh" and matching on the IGP source of the IBGP neighbor /32 address should fix it, and get the rest of the IBGP neighbors to start re-converging sooner.  Should that help the IP Input settle down?

Anyone have recommendations for current decent IOS version for SR train?

Thanks again

_______________________________________________
cisco-nsp mailing list  cisco-nsp at puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/