[c-nsp] Wierd MPLS/VPLS issue

Saku Ytti saku at ytti.fi
Thu Nov 24 05:43:51 EST 2016


On 23 November 2016 at 18:30, Alexandre Snarskii <snar at snar.spb.ru> wrote:

Hey,

>> > https://www.nanog.org/meetings/nanog57/presentations/Tuesday/tues.general.SnijdersWheeler.MACaddresses.14.pdf
>
> Small correction: Juniper does check if packet starting with '4x:' is indeed
> ipv4 packet and falls back to 'outer' ethernet + mpls hash if this check fails.
> As a result, all eompls traffic to such address will be forwarded to the
> single link of aggregate interface and may cause congestion on this link
> despite there are plenty capacity on other links, but at least this does
> not lead to heavy reordering..

MPLS gives transit no ability to determine what it is carrying.
Juniper (Trio) seeks to IPv4 'total length' field, and if that
contains same value as packet bytes read from wire, it is determined
that we actually saw IPv4. This does not mean there won't be
reordering, it means that in first nibble you need to be have ^4 (i.e.
DMAC needs to start with 4 _AND_ your last 8 bits from OUI and first 8
bits from the local side of MAC matches your packet length, by
accident, it'll reorder.

One could debate that such added heuristic is even more harmful, while
it will fail less often, when it _DOES_ fail, what is your probability
of getting the problem solved? How will customer know to blame you,
how will you know to blame your core, how will your vendor know to
blame balancing? I can only imagine how strange the troublecase would
be.

I know that Huawei (Solar) in addition checks IPv4 checksum, but
again, is that useful? Cisco checks nothing but the first checksum.



Now some people on the list propose control-word. This does not fix
the problem in Juniper, it does fix the problem in Cisco (EZChip),
because Cisco does not have platforms which inspect inside
pseudowires. Juniper by default inspects inside pseudowires. Granted,
you need 'zero-control-word' toggle, which is available in 16.1 to
allow it to skip control-word (in some cases) to find actual IP
payload inside pseudowire. But even absence of 'zero-control-word',
you can still mistakenly think you're seeing XEROX DMAC, and if eType,
ipVer and ipLen happen to be 'correct', you're again balancing
incorrectly, this time problem will be even more confusing, as it will
affect even fewer subset of clients,  as some of these leaks in the IP
SADDR area. There is extremely low probability that you'll see this
ever, but if you do see it, I don't think you're ever going to be able
to solve it.

To have deterministic behaviour _AND_ balancing, you'll need:

a) be dumb, don't try to find IP inside pseudowires
set forwarding-options enhanced-hash-key family mpls no-ether-pseudowire

b) on ingressPE, look for hash keys, and put hash result in MPLS label
set groups DEFAULT protocols l2circuit neighbor <*> interface <*>
flow-label-receive
set groups DEFAULT protocols l2circuit neighbor <*> interface <*>
flow-label-transmit

c) have 100% of pseudowires with control-word, so that ^4 and ^6 allow
us to know it's IP, and that we can dig inside the packet in transit
for hash keys



-- 
  ++ytti


More information about the cisco-nsp mailing list