[j-nsp] rib-sharding and NSR update
Andrey Kostin
ankost at podolsk.ru
Tue Jul 23 17:15:09 EDT 2024
Tried to enable rib-sharding on several routers in last weeks and got
bunch of problems.
First, PE router with rib-sharding was losing connectivity to indirect
routes after every MPLS LSP autobandwidth adjustment. Let's PE-A has a
static route for X.X.X.X/29 pointing to IP Y.Y.Y.1 reachable via
connected interface with IP Y.Y.Y.0/31. PE-A advertises X.X.X.X/29 with
next-hop Y.Y.Y.1, and Y.Y.Y.0/31 with next-hop Z.Z.Z.Z/32 from lo0
address used as iBGP session source. PE-B resolves Z.Z.Z.Z/32 via RSVP
LSP with label L0, and X.X.X.X/29 is resolved via Y.Y.Y.1 via Z.Z.Z.Z/32
to the same label L0. When regular autobandwidth adjustment happens,
PE-B calculates and signals the new path with label L1 using
make-before-brake, and then switches traffic to the new path by updating
the label from L0 to L1 for prefixes that are using it. It turns out
that the label is updated for Z.Z.Z.Z/32, Y.Y.Y.0/31, but not for
X.X.X.X/29. After hold-down timer expires, PE-B signals deletion of path
with label L0, but still uses L0 for X.X.X.X/29 and traffic is
blackholed because downstream router has already deleted the label.
Disabling rib-sharding on PE-B solved this issue right away.
Next, a memory leak happened on a non-RR router, eating memory from 17
to 95% in three weeks. After disabling rib-sharding memory usage is at
14% so far.
And finally, two regional route-reflectors without rib-sharding peered
with central RRs with sharding enabled, got to 100% CPU utilization
right after BGP sessions were established. It caused very slow route
updates with intermittent connectivity even for routes that haven't
changed. Changes were reverted on one of these routers, and another one
was running at 100% RE CPU until rib-sharding was disabled on one of
central RRs. After disabling rib-sharding on one central RR, CPU on the
peered regional RR dropped to 30-40% but still was higher than usual.
Only when rib-sharding was disabled on the second central RR, CPU
utilization returned to normal 20-25%.
YMMV, but I don't think we're going to try this feature again in the
foreseeable future.
Kind regards,
Andrey
Luca Salvatore писал(а) 2024-06-26 15:18:
> For what it's worth, we're happily running rib-sharding on many MX10K
> devices on 22.2R3-S2.
> NSR is fine and we haven't had any issues
>
> On Sun, Jun 2, 2024 at 10:26 PM Gustavo Santos via juniper-nsp
> <juniper-nsp at puck.nether.net> wrote:
>
>> I tried it again on JUNOS 21.4R3-S3.4 hit some bugs that crashed rpd
>> daemon and I gave up.
>>
>> We will try it again later this year. If update threading /
>> rib-sharding
>> works as expected it will be better than having non stop routing
>> running.
>>
>> Last time we had an issue caused by bgp routing update, it tooks
>> about 50
>> minutes to advertise all needed routes to one of the transit
>> providers,
>> because the time it takes to send full routing tables feed to remote
>> peers.
>>
>> Em sex., 10 de mai. de 2024 às 16:45, Andrey Kostin via juniper-nsp
>> <
>> juniper-nsp at puck.nether.net> escreveu:
>>
>>> Hi juniper-nsp,
>>>
>>> Just hit exactly the same issue as described in the message found
>> in the
>>> list archives:
>>>
>>> Gustavo Santos
>>> Mon Jan 4 15:13:18 EST 2021
>>>
>>> Hi,
>>>
>>> We got another MX10003 and we are updating it before get in
>> production.
>>> Reading the 19.4R3 release notes, we noticed that two
>>> features update-threading and rib-sharding and I really liked
>> what it
>>> "promises" as faster BGP updates .
>>>
>>> But there is a catch. We can't use this new feature with non-stop
>>> routing
>>> enabled.
>>>
>>> The question is , are these features worth the non-stop routing
>> loss?
>>>
>>> Regards
>>> "
>>> bgp {
>>> ##
>>> ## Warning: Can't be configured together with routing-options
>>> nonstop-routing
>>> ##
>>> rib-sharding;
>>> ##
>>> ## Warning: Update threading can't be configured together
>> with
>>> routing-options nonstop-routing
>>> ##
>>> update-threading;
>>> }
>>> "
>>>
>>> That message seems didn't get any response.
>>> However, I found an explanation at the bottom the page:
>>>
>>>
>>
> https://www.juniper.net/documentation/us/en/software/junos/cli-reference/topics/ref/statement/rib-sharding-edit-protocols-bgp.html
>>> Support for NSR with sharding introduced in Junos OS Release 22.2.
>>> BGP sharding supports IPv4, IPv6, L3VPN and BGP-LU from Junos OS
>> Release
>>> 20.4R1.
>>>
>>> Still need to test and confirm on this platform, but on another
>> router
>>> it already works.
>>>
>>> --
>>> Kind regards,
>>> Andrey
More information about the juniper-nsp
mailing list