[j-nsp] 8.2R2.4 -> 8.4R2.4 route installation delay

Richard A Steenbergen ras at e-gerbil.net
Thu Dec 13 01:30:30 EST 2007


On Wed, Dec 12, 2007 at 08:50:37PM -0600, Kevin Day wrote:
> 
> Tonight we upgraded from 8.2R2.4 to 8.4R2.4 on a production M20.  
> Everything seemed to go well except for one problem that I'm not sure  
> I can explain. I did a full reboot of the router after the upgrade.  
> BGP sessions started coming up fine, but the router was sending  
> "network unreachable" messages for routes that "show route" was  
> displaying. Doing a "show route extensive" showed that many routes  
> were in the state "<Record Pending>". The header of "show bgp sum"  
> said that there were 150,000+ routes stuck in the "pending" column.

Hehe welcome ot my hell. I've been dealing with this issue across many 
platforms (ranging from old M160s w/RE-2.0 to new MX960s w/2GHz REs) for 
quite some time now. I actually posted about this behavior on this list a 
while back even, at the time I thought it was strictly an RE-2.0 issue, 
but tests on newer platforms and REs seems to indicate that it isn't. It 
definitely started somewhere in the mid 7.x's at any rate, never saw this 
issue in earlier code.

It looks like the actual issue is with the installation of the routes to 
the PFE. BGP has no problem selecting the new paths quickly, but something 
causes it to block the installation of the new paths to hardware (for 
anywhere from a few minutes to MANY minutes) until eventually it seems to 
go pop and install all the pending updates. If you look at a specific 
route (show route) when this is happening, you'll see a + entry for the 
newly selected path, and a - entry on the old path its trying to remove. 
As long as its in this state, the hardware is still forwarding on the old 
path (which is a really bad thing if that old path is now down, the router 
WILL sit there and blackhole your bits for extended periods of time).

I've been beating my head (and Juniper's :P) on this one for well over a 
year now, and despite a few attempted fixes so far nobody seems to have a 
clue what the real issue is. I know I'm not the only one seeing this, I've 
had a dozen other people tell me privately about seeing the exact same 
behavior, but I can't seem to find anything which we do or don't have in 
common that would be causing it. It's also difficult to reproduce on 
demand, sometimes there is no issue at all, 5 minutes later you can do the 
same routing change and see major impact on a few or large number of 
routes. I've seen it block on installation of anything from a full table 
after a major policy change, reboot, RE swap, etc, to 50 routes blocking 
for 10 minutes after clearing a small bgp session on an otherwise unloaded 
router. I've even seen it happen when I added a static route. :)

-- 
Richard A Steenbergen <ras at e-gerbil.net>       http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)


More information about the juniper-nsp mailing list