[j-nsp] iBGP convergence time

Tue Feb 20 04:17:16 EST 2007

On Fri, 16 Feb 2007, Richard A Steenbergen wrote:
> On Fri, Feb 02, 2007 at 04:13:27AM -0500, Richard A Steenbergen wrote:
>> You know I haven't had any free time to run real tests or anything, but I
>> noticed a significant increase in BGP convergence time when upgrading from
>> JUNOS 7.2/7.3 (and I think some 7.4) to 7.6. When you make a policy
>> change, the routes take several minutes (from 2 to 7) to install. If you
>> do a show route you can see the new routes sitting in + and the old routes
>> sitting in - for minutes, RPD is spinning its heels at 100% cpu, and the
>> packets continue to forward over the old path while it is processing.
>
> Ok so, after about a dozen people contacted me privately to confirm that
> they were seeing similar issues that hadn't been fully acknowledged, I ran
> off and did a little testing to replicate the issue. The one thing I can
> definitely confirm right now is that it only appears to affect M160 (or at
> least, not M5-M40).
...

As said before, I have seen this (or at least what looks similar) on 
T320 as well, just today when I upgraded a 7.5SR to 8.2R1... so, 
upgrading to faster REs doesn't seem like a real solution here. I also 
took a full BGP traceoptions log if there is something specific to 
look at.

Convergence time here seems to be around 7 minutes after BGP sessions 
are up, though writing 300MB of traceoptions may have been a factor in 
this.  This includes receiving a full table from an eBGP peer (about 
39000 'BGP RECV' lines in log), sending a partial table to 5 peers 
(about 5*3500 'BGP SEND' lines), and receiving a full table from an 
iBGP peer (about 39000 'BGP RECV' lines in log).

Receiving the full table seems to be chronologically linearly 
distributed (takes over 5 minutes each! -- about 50-300 updates/sec). 
The weird thing is that in 'BGP RECV' and 'BGP SEND', the Update 
length is almost always less than 100 bytes.  No wonder it takes a 
while to process messages.  Maybe the BGP update packing (for sends) 
or processing (receives) algorithm has changed?  Sending partial 
tables occur after the full tables are synced (topping 1000+ 
updates/sec, mostly less than 100).  7 minutes seems rather long as 
the links between peers are between 2.5G to 10G.

Between various '+', '-', and '*' states, some of the transiting IP 
traffic seemed to be dropped even though a route to the destination 
exists on RE (also according to the BGP logs).  So it seemed as if 
updating the forwarding table is delayed or somehow fails if more 
updates are on the way.

One thing I was left wondering about is whether this has also been 
seen if 'mtu-discovery' is disabled?  At least in my case, 
mtu-discovery is enabled, and the BGP process sets a higher (max?) MSS 
(8192) than path MTU (usually around 4484), which would cause major 
problems if PMTUD didn't work as expected.

Btw, I wonder what 'RELEASE' messages related to a prefix in BGP logs 
means..

-- 
Pekka Savola                 "You each name yourselves king, yet the
Netcore Oy                    kingdom bleeds."
Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings