[j-nsp] rib-sharding and NSR update

Andrey Kostin ankost at podolsk.ru
Tue Jul 23 17:15:09 EDT 2024


Tried to enable rib-sharding on several routers in last weeks and got 
bunch of problems.
First, PE router with rib-sharding was losing connectivity to indirect 
routes after every MPLS LSP autobandwidth adjustment. Let's PE-A has a 
static route for X.X.X.X/29 pointing to IP Y.Y.Y.1 reachable via 
connected interface with IP Y.Y.Y.0/31. PE-A advertises X.X.X.X/29 with 
next-hop Y.Y.Y.1, and Y.Y.Y.0/31 with next-hop Z.Z.Z.Z/32 from lo0 
address used as iBGP session source. PE-B resolves Z.Z.Z.Z/32 via RSVP 
LSP with label L0, and X.X.X.X/29 is resolved via Y.Y.Y.1 via Z.Z.Z.Z/32 
to the same label L0. When regular autobandwidth adjustment happens, 
PE-B calculates and signals the new path with label L1 using 
make-before-brake, and then switches traffic to the new path by updating 
the label from L0 to L1 for prefixes that are using it. It turns out 
that the label is updated for Z.Z.Z.Z/32, Y.Y.Y.0/31, but not for 
X.X.X.X/29. After hold-down timer expires, PE-B signals deletion of path 
with label L0, but still uses L0 for X.X.X.X/29 and traffic is 
blackholed because downstream router has already deleted the label. 
Disabling rib-sharding on PE-B solved this issue right away.
Next, a memory leak happened on a non-RR router, eating memory from 17 
to 95% in three weeks. After disabling rib-sharding memory usage is at 
14% so far.
And finally, two regional route-reflectors without rib-sharding peered 
with central RRs with sharding enabled, got to 100% CPU utilization 
right after BGP sessions were established. It caused very slow route 
updates with intermittent connectivity even for routes that haven't 
changed. Changes were reverted on one of these routers, and another one 
was running at 100% RE CPU until rib-sharding was disabled on one of 
central RRs. After disabling rib-sharding on one central RR, CPU on the 
peered regional RR dropped to 30-40% but still was higher than usual. 
Only when rib-sharding was disabled on the second central RR, CPU 
utilization returned to normal 20-25%.

YMMV, but I don't think we're going to try this feature again in the 
foreseeable future.

Kind regards,
Andrey

Luca Salvatore писал(а) 2024-06-26 15:18:
> For what it's worth, we're happily running rib-sharding on many MX10K
> devices on 22.2R3-S2.
> NSR is fine and we haven't had any issues
> 
> On Sun, Jun 2, 2024 at 10:26 PM Gustavo Santos via juniper-nsp
> <juniper-nsp at puck.nether.net> wrote:
> 
>> I tried it again on JUNOS 21.4R3-S3.4 hit some bugs that crashed rpd
>> daemon and I gave up.
>> 
>> We will try it again later this year. If update threading /
>> rib-sharding
>> works as expected it will be better than having non stop routing
>> running.
>> 
>> Last time we had an issue caused by bgp routing update, it tooks
>> about 50
>> minutes to advertise all needed routes to one of the transit
>> providers,
>> because the time it takes to send full routing tables feed to remote
>> peers.
>> 
>> Em sex., 10 de mai. de 2024 às 16:45, Andrey Kostin via juniper-nsp
>> <
>> juniper-nsp at puck.nether.net> escreveu:
>> 
>>> Hi juniper-nsp,
>>> 
>>> Just hit exactly the same issue as described in the message found
>> in the
>>> list archives:
>>> 
>>> Gustavo Santos
>>> Mon Jan 4 15:13:18 EST 2021
>>> 
>>> Hi,
>>> 
>>> We got another MX10003 and we are updating it before get in
>> production.
>>> Reading the 19.4R3 release notes, we noticed that two
>>> features update-threading  and  rib-sharding and I really liked
>> what it
>>> "promises" as faster BGP updates .
>>> 
>>> But there is a catch. We can't use this new feature with non-stop
>>> routing
>>> enabled.
>>> 
>>> The question is , are these features worth the non-stop routing
>> loss?
>>> 
>>> Regards
>>> "
>>> bgp {
>>> ##
>>> ## Warning: Can't be configured together with routing-options
>>> nonstop-routing
>>> ##
>>> rib-sharding;
>>> ##
>>> ## Warning: Update threading can't be configured together
>> with
>>> routing-options nonstop-routing
>>> ##
>>> update-threading;
>>> }
>>> "
>>> 
>>> That message seems didn't get any response.
>>> However, I found an explanation at the bottom the page:
>>> 
>>> 
>> 
> https://www.juniper.net/documentation/us/en/software/junos/cli-reference/topics/ref/statement/rib-sharding-edit-protocols-bgp.html
>>> Support for NSR with sharding introduced in Junos OS Release 22.2.
>>> BGP sharding supports IPv4, IPv6, L3VPN and BGP-LU from Junos OS
>> Release
>>> 20.4R1.
>>> 
>>> Still need to test and confirm on this platform, but on another
>> router
>>> it already works.
>>> 
>>> --
>>> Kind regards,
>>> Andrey



More information about the juniper-nsp mailing list