[c-nsp] LDPv6 Census Check

Fri Jun 12 16:22:34 EDT 2020

> Rewrites on MPLS is horrible from a memory perspective as maintaining the state and label transition to explore all possible discrete paths across the overall end-to-end path you are trying to take is hugely in-efficient. Applying circuit switching to a packet network was bad from the start. SR doesn't resolve that, as you are stuck with a global label problem and the associated lack of being able to engineer your paths, or a label stack problem on ingress that means you need a massive ASIC's and memories there.
> 
> I don't think rewrites are horrible, but just very flexible and this *can* come up with a certain price. Irt to your memory argument that path engineering takes in vanilla TE a lot of forwarding slots we should remind us that this is not a design principle of MPLS. Discrete paths could also be signalled in MPLS with shared link-labels so that you will end up with the same big instructional headend packet as in SR. There are even implementations offering this.

Except that is actually the problem if you look at it in hardware. And to be very specific, I'm talking about commodity hardware, not flexible pipelines like you find in the MX and a number of the ASR's. I'm also talking about the more recent approach of using Clos in PoP's instead of "big iron" or chassis based systems. On those boxes, it's actually better to not do shared labels, as this pushes the ECMP decision to the ingress node. That does mean you have to enumerate every possible path (or some approximate) through the network, however the action on the commodity gear is greatly reduced. It's a pure label swap, so you don't run into any egress next-hop problems. You definitely do on the ingress nodes. Very, very badly actually.

So you can move to a shared label mode. Now the commodity boxes have to perform ECMP. That means they also have to have a unique ECMP group for every site/any-cast label passing through them, as every label is being swapped differently. You get no reuse for two labels that are on identical paths because the "swaps" are not identical. So you hit up against ECMP next-hop group starvation, forcing you to lower radix and limiting total any-/site-cast count.

> IP at least gives you rewrite sharing, so in a lite-core you have way better trade-off on resources, especially in a heavily ECMP'ed network. Such as one build of massive number of open small boxes vs. a small number of huge opaque ones. Pick your poison but saying one is inheriantly better then another in all cases is just plane false.
> 
> If I understand this argument correctly then it shouldn't be one because of "rewrite sharing" being irrelevant for the addressability of single nodes in a BGP network. Why a header lookup depth of 4B per label in engineered and non-engineered paths should be a bad requisite for h/w designers of modern networks is beyond me. In most MPLS networks (unengineered L3VPN) you need to read less of headers than in a eg. VXLAN fabric to make ECMP work (24B vs. 20B).

What I'm getting at is that IP allows re-write sharing in that what needs to change on two IP frames taking the same paths but ultimately reaching different destinations are re-written (e.g. DMAC, egress-port) identically. And, at least with IPIP, you are able to look at the inner-frame for ECMP calculations. Depending on your MPLS design, that may not be the case. If you have too deep of a label stack (3-5 depending on ASIC), you can't look at the payload and you end up with polarization.

David