[j-nsp] iBGP convergence time
pekkas at netcore.fi
Tue Feb 20 04:17:16 EST 2007
On Fri, 16 Feb 2007, Richard A Steenbergen wrote:
> On Fri, Feb 02, 2007 at 04:13:27AM -0500, Richard A Steenbergen wrote:
>> You know I haven't had any free time to run real tests or anything, but I
>> noticed a significant increase in BGP convergence time when upgrading from
>> JUNOS 7.2/7.3 (and I think some 7.4) to 7.6. When you make a policy
>> change, the routes take several minutes (from 2 to 7) to install. If you
>> do a show route you can see the new routes sitting in + and the old routes
>> sitting in - for minutes, RPD is spinning its heels at 100% cpu, and the
>> packets continue to forward over the old path while it is processing.
> Ok so, after about a dozen people contacted me privately to confirm that
> they were seeing similar issues that hadn't been fully acknowledged, I ran
> off and did a little testing to replicate the issue. The one thing I can
> definitely confirm right now is that it only appears to affect M160 (or at
> least, not M5-M40).
As said before, I have seen this (or at least what looks similar) on
T320 as well, just today when I upgraded a 7.5SR to 8.2R1... so,
upgrading to faster REs doesn't seem like a real solution here. I also
took a full BGP traceoptions log if there is something specific to
Convergence time here seems to be around 7 minutes after BGP sessions
are up, though writing 300MB of traceoptions may have been a factor in
this. This includes receiving a full table from an eBGP peer (about
39000 'BGP RECV' lines in log), sending a partial table to 5 peers
(about 5*3500 'BGP SEND' lines), and receiving a full table from an
iBGP peer (about 39000 'BGP RECV' lines in log).
Receiving the full table seems to be chronologically linearly
distributed (takes over 5 minutes each! -- about 50-300 updates/sec).
The weird thing is that in 'BGP RECV' and 'BGP SEND', the Update
length is almost always less than 100 bytes. No wonder it takes a
while to process messages. Maybe the BGP update packing (for sends)
or processing (receives) algorithm has changed? Sending partial
tables occur after the full tables are synced (topping 1000+
updates/sec, mostly less than 100). 7 minutes seems rather long as
the links between peers are between 2.5G to 10G.
Between various '+', '-', and '*' states, some of the transiting IP
traffic seemed to be dropped even though a route to the destination
exists on RE (also according to the BGP logs). So it seemed as if
updating the forwarding table is delayed or somehow fails if more
updates are on the way.
One thing I was left wondering about is whether this has also been
seen if 'mtu-discovery' is disabled? At least in my case,
mtu-discovery is enabled, and the BGP process sets a higher (max?) MSS
(8192) than path MTU (usually around 4484), which would cause major
problems if PMTUD didn't work as expected.
Btw, I wonder what 'RELEASE' messages related to a prefix in BGP logs
Pekka Savola "You each name yourselves king, yet the
Netcore Oy kingdom bleeds."
Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings
More information about the juniper-nsp