[j-nsp] iBGP convergence time

Richard A Steenbergen ras at e-gerbil.net
Fri Feb 16 21:30:14 EST 2007


On Fri, Feb 02, 2007 at 04:13:27AM -0500, Richard A Steenbergen wrote:
> You know I haven't had any free time to run real tests or anything, but I 
> noticed a significant increase in BGP convergence time when upgrading from 
> JUNOS 7.2/7.3 (and I think some 7.4) to 7.6. When you make a policy 
> change, the routes take several minutes (from 2 to 7) to install. If you 
> do a show route you can see the new routes sitting in + and the old routes 
> sitting in - for minutes, RPD is spinning its heels at 100% cpu, and the 
> packets continue to forward over the old path while it is processing.

Ok so, after about a dozen people contacted me privately to confirm that 
they were seeing similar issues that hadn't been fully acknowledged, I ran 
off and did a little testing to replicate the issue. The one thing I can 
definitely confirm right now is that it only appears to affect M160 (or at 
least, not M5-M40).

On an RE-2.0 on a single switch board platform, performance is about what 
you would expect from a 7+ year old routing engine running modern code on 
a modern routing table. It syncs a full table inbound (from empty) on an 
otherwise unused RE-2.0 in just under 7 minutes, and it processes a switch 
of best path and installed routes on a full table is just barely under 2 
minutes (policy-statement with then local-preference only, no other 
processing). However, on an M160 the switch of best path which leads to 
installing new routes in the HW takes between 8-15 minutes in my tests. I 
haven't yet had time to go through every version one at a time to find 
exactly where there starts, but the behavior is definitely evident.

The following is based on absolutely nothing except my random guess as to 
what is happening, so someone please let me know if I'm warm or cold. It 
seems that the easiest way to replicate the state of new route showing up 
as "+" and the old route showing up as "-" is to intentionally break 
connectivity between the RE and PFE (easy with an m40 :P). My guess is 
that this is a kind of transactional FIB installation system, where the RE 
doesn't update its RIB to reflect that the new route has been installed 
until the switch board processes it and confirms it (and allowing it to 
retry the install if necessary), to prevent Customer Enragement Feature 
with Juniper's move towards distributed forwarding tables on the Gibson 
architecture. Whatever is going on with the M160 on RE-2.0 however, it is 
significantly slower. Maybe this just wasn't sufficiently regression 
tested on the M160 platform, or maybe it is just a natural effect of 
having 4 switch boards which all need to be updated, but it is very 
noticable. The offical Juniper line seems to be "just upgrade your REs", 
but it would be nice if we had an alternate option.

So, two things. The most obvious question is, is there a way to turn this 
behavior off or revert it to the previous behavior (if infact my guess as 
to the cause is correct :P)? The next question is, I noticed in the 
release notes for 8.2 that there is a new option to support indirect 
next-hops which may significantly reduce the number of FIB updates. My 
take on this feature is that you are changing from installing BGP route -> 
physical next-hop to BGP route -> BGP next-hop and BGP next-hop -> 
physical next-hop, so that when you make a routing change to the BGP 
nexthop you only have to update the 1 entry instead of the potentially 
thousands of entries for the BGP route itself (which I kinda thought 
Juniper had done since forever :P). Am I correct in that interpretation, 
or is there something else going on there?

-- 
Richard A Steenbergen <ras at e-gerbil.net>       http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)


More information about the juniper-nsp mailing list