[j-nsp] rib-sharding and NSR update

Mileto Tales miletotalesgter at gmail.com
Tue Jul 23 18:07:31 EDT 2024


Hi Andrey,

can you share in what Junos version did you had these issues?



> On 23 Jul 2024, at 18:15, Andrey Kostin via juniper-nsp <juniper-nsp at puck.nether.net> wrote:
> 
> Tried to enable rib-sharding on several routers in last weeks and got bunch of problems.
> First, PE router with rib-sharding was losing connectivity to indirect routes after every MPLS LSP autobandwidth adjustment. Let's PE-A has a static route for X.X.X.X/29 pointing to IP Y.Y.Y.1 reachable via connected interface with IP Y.Y.Y.0/31. PE-A advertises X.X.X.X/29 with next-hop Y.Y.Y.1, and Y.Y.Y.0/31 with next-hop Z.Z.Z.Z/32 from lo0 address used as iBGP session source. PE-B resolves Z.Z.Z.Z/32 via RSVP LSP with label L0, and X.X.X.X/29 is resolved via Y.Y.Y.1 via Z.Z.Z.Z/32 to the same label L0. When regular autobandwidth adjustment happens, PE-B calculates and signals the new path with label L1 using make-before-brake, and then switches traffic to the new path by updating the label from L0 to L1 for prefixes that are using it. It turns out that the label is updated for Z.Z.Z.Z/32, Y.Y.Y.0/31, but not for X.X.X.X/29. After hold-down timer expires, PE-B signals deletion of path with label L0, but still uses L0 for X.X.X.X/29 and traffic is blackholed because downstream router has already deleted the label. Disabling rib-sharding on PE-B solved this issue right away.
> Next, a memory leak happened on a non-RR router, eating memory from 17 to 95% in three weeks. After disabling rib-sharding memory usage is at 14% so far.
> And finally, two regional route-reflectors without rib-sharding peered with central RRs with sharding enabled, got to 100% CPU utilization right after BGP sessions were established. It caused very slow route updates with intermittent connectivity even for routes that haven't changed. Changes were reverted on one of these routers, and another one was running at 100% RE CPU until rib-sharding was disabled on one of central RRs. After disabling rib-sharding on one central RR, CPU on the peered regional RR dropped to 30-40% but still was higher than usual. Only when rib-sharding was disabled on the second central RR, CPU utilization returned to normal 20-25%.
> 
> YMMV, but I don't think we're going to try this feature again in the foreseeable future.
> 
> Kind regards,
> Andrey
> 
> Luca Salvatore писал(а) 2024-06-26 15:18:
>> For what it's worth, we're happily running rib-sharding on many MX10K
>> devices on 22.2R3-S2.
>> NSR is fine and we haven't had any issues
>>> On Sun, Jun 2, 2024 at 10:26 PM Gustavo Santos via juniper-nsp
>>> <juniper-nsp at puck.nether.net> wrote:
>>> I tried it again on JUNOS 21.4R3-S3.4 hit some bugs that crashed rpd
>>> daemon and I gave up.
>>> We will try it again later this year. If update threading /
>>> rib-sharding
>>> works as expected it will be better than having non stop routing
>>> running.
>>> Last time we had an issue caused by bgp routing update, it tooks
>>> about 50
>>> minutes to advertise all needed routes to one of the transit
>>> providers,
>>> because the time it takes to send full routing tables feed to remote
>>> peers.
>>> Em sex., 10 de mai. de 2024 às 16:45, Andrey Kostin via juniper-nsp
>>> <
>>> juniper-nsp at puck.nether.net> escreveu:
>>>> Hi juniper-nsp,
>>>> Just hit exactly the same issue as described in the message found
>>> in the
>>>> list archives:
>>>> Gustavo Santos
>>>> Mon Jan 4 15:13:18 EST 2021
>>>> Hi,
>>>> We got another MX10003 and we are updating it before get in
>>> production.
>>>> Reading the 19.4R3 release notes, we noticed that two
>>>> features update-threading  and  rib-sharding and I really liked
>>> what it
>>>> "promises" as faster BGP updates .
>>>> But there is a catch. We can't use this new feature with non-stop
>>>> routing
>>>> enabled.
>>>> The question is , are these features worth the non-stop routing
>>> loss?
>>>> Regards
>>>> "
>>>> bgp {
>>>> ##
>>>> ## Warning: Can't be configured together with routing-options
>>>> nonstop-routing
>>>> ##
>>>> rib-sharding;
>>>> ##
>>>> ## Warning: Update threading can't be configured together
>>> with
>>>> routing-options nonstop-routing
>>>> ##
>>>> update-threading;
>>>> }
>>>> "
>>>> That message seems didn't get any response.
>>>> However, I found an explanation at the bottom the page:
>> https://www.juniper.net/documentation/us/en/software/junos/cli-reference/topics/ref/statement/rib-sharding-edit-protocols-bgp.html
>>>> Support for NSR with sharding introduced in Junos OS Release 22.2.
>>>> BGP sharding supports IPv4, IPv6, L3VPN and BGP-LU from Junos OS
>>> Release
>>>> 20.4R1.
>>>> Still need to test and confirm on this platform, but on another
>>> router
>>>> it already works.
>>>> --
>>>> Kind regards,
>>>> Andrey
> 
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp


More information about the juniper-nsp mailing list