[c-nsp] Quick question regarding BGP route churn & PRP-2

Thu Mar 7 16:35:52 EST 2013

There is a bug in some of the PRP-2 code relating to the BGP process leaking memory and utilizing more and more CPU.
Two things you can do to help eliminate the bug, one is upgrade code, the second is to fully remove inactive BGP sessions.
It appears that the router process 'saves' updates for the inactive BGP sessions.
I encountered this problem on our PRP-2s.

LR Mack McBride
Network Architect

-----Original Message-----
From: cisco-nsp-bounces at puck.nether.net [mailto:cisco-nsp-bounces at puck.nether.net] On Behalf Of Drew Weaver
Sent: Thursday, March 07, 2013 6:21 AM
To: cisco-nsp at puck.nether.net
Subject: [c-nsp] Quick question regarding BGP route churn & PRP-2

Howdy,

One of our routers in a smaller facility is still rocking a pair of PRP-2s and we've been getting notices lately that it has been failing to respond to SNMP queries.

Makes sense, as my cell phone likely has a better CPU than the PRP-2 but I wanted to see if there was any way to extend the life of this thing just a little longer.

I tracked the CPU usage and it appears that the BGP ROUTER process is what is eating all of the CPU time when this issue happens.

There seems to be a constant deadly drip of routing updates coming in from one of the upstream providers attached to this router:

The below were taken just 1 second apart:

Neighbor        V           AS MsgRcvd     MsgSent   TblVer        InQ OutQ Up/Down  State/PfxRcd
x.x.x.13          4        3356 164983466 1610722 334466344   13    0            4w2d       434626

Neighbor        V           AS MsgRcvd     MsgSent   TblVer        inQ OutQ Up/Down  State/PfxRcd
x.x.x.13          4        3356 164983671 1610725 334466843    0     0             4w2d       434664

Neighbor        V           AS MsgRcvd     MsgSent   TblVer        InQ OutQ Up/Down  State/PfxRcd
x.x.x.13          4        3356 164983700 1610725 334466843    0    0              4w2d       434674

We have only been getting the notices that SNMP is failing 2-3 times a day out of 480 pollings but it's enough to cause alerts and operational events to be created, etc.

Awhile back we had a problem with another upstream where it would take an hour sometimes to download the full table from them so we implemented PMTUD which helped in that scenario; the max data segment size on this particular neighbor appears to be 1460 bytes.

Does anyone have any tips or tricks on ways to lessen the impact of the constant route churn aside from either "Don't use a PRP-2 because it sucks" or "don't import a full table".

We're already scheduled to be upgrading to ASR9000s with RSP440s fairly soon but like I said I do need to squeeze out just a few more months on this beastie.

This particular router is still running IOS and hasn't been upgraded to XR.

Thanks!
-Drew

_______________________________________________
cisco-nsp mailing list  cisco-nsp at puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/