[j-nsp] 8.2R2.4 -> 8.4R2.4 route installation delay
Richard A Steenbergen
ras at e-gerbil.net
Thu Dec 13 01:30:30 EST 2007
On Wed, Dec 12, 2007 at 08:50:37PM -0600, Kevin Day wrote:
>
> Tonight we upgraded from 8.2R2.4 to 8.4R2.4 on a production M20.
> Everything seemed to go well except for one problem that I'm not sure
> I can explain. I did a full reboot of the router after the upgrade.
> BGP sessions started coming up fine, but the router was sending
> "network unreachable" messages for routes that "show route" was
> displaying. Doing a "show route extensive" showed that many routes
> were in the state "<Record Pending>". The header of "show bgp sum"
> said that there were 150,000+ routes stuck in the "pending" column.
Hehe welcome ot my hell. I've been dealing with this issue across many
platforms (ranging from old M160s w/RE-2.0 to new MX960s w/2GHz REs) for
quite some time now. I actually posted about this behavior on this list a
while back even, at the time I thought it was strictly an RE-2.0 issue,
but tests on newer platforms and REs seems to indicate that it isn't. It
definitely started somewhere in the mid 7.x's at any rate, never saw this
issue in earlier code.
It looks like the actual issue is with the installation of the routes to
the PFE. BGP has no problem selecting the new paths quickly, but something
causes it to block the installation of the new paths to hardware (for
anywhere from a few minutes to MANY minutes) until eventually it seems to
go pop and install all the pending updates. If you look at a specific
route (show route) when this is happening, you'll see a + entry for the
newly selected path, and a - entry on the old path its trying to remove.
As long as its in this state, the hardware is still forwarding on the old
path (which is a really bad thing if that old path is now down, the router
WILL sit there and blackhole your bits for extended periods of time).
I've been beating my head (and Juniper's :P) on this one for well over a
year now, and despite a few attempted fixes so far nobody seems to have a
clue what the real issue is. I know I'm not the only one seeing this, I've
had a dozen other people tell me privately about seeing the exact same
behavior, but I can't seem to find anything which we do or don't have in
common that would be causing it. It's also difficult to reproduce on
demand, sometimes there is no issue at all, 5 minutes later you can do the
same routing change and see major impact on a few or large number of
routes. I've seen it block on installation of anything from a full table
after a major policy change, reboot, RE swap, etc, to 50 routes blocking
for 10 minutes after clearing a small bgp session on an otherwise unloaded
router. I've even seen it happen when I added a static route. :)
--
Richard A Steenbergen <ras at e-gerbil.net> http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
More information about the juniper-nsp
mailing list