[j-nsp] MX204-IR RIB->FIB sync?

Wed Dec 12 20:47:48 EST 2018

Hi all,

I’ve been playing around with rLFA in a small lab using a pair each of MX204-IR, ASR920, ME3600s in a ring:

MX1-et0/0-MX2-xe0/1-ASR2-ge-ME2-ge-ME1-ge-ASR1-te-MX1

They're all running BFD (150ms x 3), LDP, ISIS, LDP-IGP sync (infinite holddown), LDP session protection and LDP GR (not that GR really applies here, but…)

MX1, MX2, ASR1 and ASR2 have a single armed host hanging off each of them.  The test consists of H-ASR1 pings to H-MX2, and H-ASR2 pings to H-MX1.  In both cases, it’s sudo ping -fi 0.001 ${host}

ASR1 has rLFA for H-MX2 subnet, ASR2 has rLFA for H-MX1 subnet, MX1 has rLFA for H-ASR2 subnet, and MX2 has rLFA for H-ASR1 subnet.

With this topology, when I pull the plug between MX1 and MX2, I lose a ping or two, but when I connect MX1 and MX2 again, there’s about 500ms of loss on all tests.
If I re-jig the topology and move the MX1-MX2 link from et0/0 to also be xe-0/1, the failover and tailback both lose about 500ms worth of traffic.
If I re-jig the topology and remove the MX1-MX2 links entirely and instead create a link between ASR1 and ASR2 to close the ring, there’s zero loss on the failover of that ASR1-ASR2 link, and only about 10ms loss on the fail back.

This is all really surprising to me.  Surprising that when the MX1-MX2 link is on et0/0 it behaves one way, and another way when the MX1-MX2 link is on xe0/1 (PIC0 vs PIC1).  Also, surprising that it didn’t seem to perform as well as when the ASRs were closing the ring instead of the MXs.

I really thought this testing would be much more uneventful than it has been.  If I think through this as rationally as I understand it, the only thing that seems to make sense is that on the MX, the FIB is a bit out of sync with the RIB, worse based on the PIC.

I’m pretty new to JunOS and Juniper HW architecture in general, so there’s no doubt much that I don’t know.

Thoughts?