[j-nsp] route BGP stall bug

Jared Mauch jared at puck.nether.net
Tue Jul 17 19:39:55 EDT 2012


Try the hidden show krt queue command when this happens. Should give you an idea what is going on. 

Jared Mauch

On Jul 17, 2012, at 6:03 PM, Tim Vollebregt <tim at interworx.nl> wrote:

> Hi All,
> 
> This morning during a maintenance I experienced the route stall bug Richard mentioned a few times already on j-nsp.
> 
> Hardware kit:
> -MX480 with SCB (non-e)
> -2 x RE-S-1800x4
> -4 x MPC 3D 16x 10GE
> Software version: 10.4R8.5
> During this maintenance I was placing 2 new routing engines into the router, replacing the 'old' RE-S-2000. This router is pushing a lot of traffic and receiving 14 x full BGP tables from eBGP peers/1 RR session to it's 'mate'/several iBGP peers with partial tables
> 
> After replacing the RE's the FPC's initialized and BGP sessions were being established it took quite some time before the RIB was completely filled. After checking some hosts I came to the conclusion that there were unreachable destinations however the RIB was looking fine.
> 
> When checking the FIB by issuing command: show route forwarding-table summary I saw that there were only 11K prefixes pushed to the FIB and it was hanging.
> As I was aware of the bug I waited for some time. And it eventually took about 30 minutes to fill the FIB with 414K prefixes. During these 30 minutes a lot of destinations were unreachable and traffic was being blackholed as exchanging RIB with peers was fine.
> 
> As there was still some time left in the maintenance window and I really wanted to have some workaround for dealing with this bug I did the following.
> I deactivated all eBGP peer groups and did a switchover to the other routing engine. When the PFC's were initialized the router started building it's iBGP sessions towards the core routers, and it's RR session (full table).
> 
> This worked out quite well, the FIB was being filled with the full table within 5 minutes. Afterwards I activated all eBGP peergroups again and monitored the FIB, eventually it took about 30 minutes to fill the FIB with the correct next-hops. But this time the blackholing was just for a limited amount of time.
> 
> It seems this bug is there since release 10.0 (MPC), and there doesn't seem to be a fix yet. Does anyone have more information about it, PR number etc?
> 
> IMHO this is a really bad one, and can be a showstopper in some cases.
> 
> Thanks for your time.
> 
> BR, Tim
> 
> 
> 
> 
> 
> 
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp



More information about the juniper-nsp mailing list