[j-nsp] route BGP stall bug
Jared Mauch
jared at puck.nether.net
Tue Jul 17 19:39:55 EDT 2012
Try the hidden show krt queue command when this happens. Should give you an idea what is going on.
Jared Mauch
On Jul 17, 2012, at 6:03 PM, Tim Vollebregt <tim at interworx.nl> wrote:
> Hi All,
>
> This morning during a maintenance I experienced the route stall bug Richard mentioned a few times already on j-nsp.
>
> Hardware kit:
> -MX480 with SCB (non-e)
> -2 x RE-S-1800x4
> -4 x MPC 3D 16x 10GE
> Software version: 10.4R8.5
> During this maintenance I was placing 2 new routing engines into the router, replacing the 'old' RE-S-2000. This router is pushing a lot of traffic and receiving 14 x full BGP tables from eBGP peers/1 RR session to it's 'mate'/several iBGP peers with partial tables
>
> After replacing the RE's the FPC's initialized and BGP sessions were being established it took quite some time before the RIB was completely filled. After checking some hosts I came to the conclusion that there were unreachable destinations however the RIB was looking fine.
>
> When checking the FIB by issuing command: show route forwarding-table summary I saw that there were only 11K prefixes pushed to the FIB and it was hanging.
> As I was aware of the bug I waited for some time. And it eventually took about 30 minutes to fill the FIB with 414K prefixes. During these 30 minutes a lot of destinations were unreachable and traffic was being blackholed as exchanging RIB with peers was fine.
>
> As there was still some time left in the maintenance window and I really wanted to have some workaround for dealing with this bug I did the following.
> I deactivated all eBGP peer groups and did a switchover to the other routing engine. When the PFC's were initialized the router started building it's iBGP sessions towards the core routers, and it's RR session (full table).
>
> This worked out quite well, the FIB was being filled with the full table within 5 minutes. Afterwards I activated all eBGP peergroups again and monitored the FIB, eventually it took about 30 minutes to fill the FIB with the correct next-hops. But this time the blackholing was just for a limited amount of time.
>
> It seems this bug is there since release 10.0 (MPC), and there doesn't seem to be a fix yet. Does anyone have more information about it, PR number etc?
>
> IMHO this is a really bad one, and can be a showstopper in some cases.
>
> Thanks for your time.
>
> BR, Tim
>
>
>
>
>
>
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
More information about the juniper-nsp
mailing list