[c-nsp] Wierd MPLS/VPLS issue

Saku Ytti saku at ytti.fi
Thu Jan 12 19:36:07 EST 2017


On 13 January 2017 at 02:09, James Bensley <jwbensley at gmail.com> wrote:

>> The always correct behaviour is entropy or fat + control-word + no
>> peek inside pseudowires and balance on labels.
>
> Hmm me too it seems, I was not suggesting that the control-word will
> fix all your problems, as you said its a flawed approach from the
> get-go. +1 for the above method instead.

I mentioned it fixes reordering in Cisco EZchip but not on Juniper
Trio (Because Trio peeks in pseudowire by default) and that in Trio is
may be actively harmful, because it will open up rarer, harder to
detect/troubleshoot reordering issues than simple ^4/^6 issue.

Considering on the wire we cannot tell
a) did I see control-word or XEROX DMAC => if we guess wrong we're
subsequently peeking 32bits in wrong position inside the frame, if we
happen to find valid IP keys due to bad/good luck, we have very hard
to debug problems
b) even if we always use control-word, so that operator config ensures
platform always makes right guess (i.e. if it's not ^4 or ^6, it's
control-word), and we start to peek at correct offset for ethernet,
there is no guarantees pseudowire is carrying ethernetII, but due to
bad/good luck might have bits in correct offsets which satisfies
heurestics that it is indeed ethernetII carrying IP.

bottom line is, MPLS does not know, by design, what it is carrying,
and adding heuristics will just make problem occur less often, but
when it does occur, it will be increasingly hard to troubleshoot.

Would customer know to blame the operator, if the network over
operator pseudowire works perfectly, then they also implement GRE
tunneling (over the operator pseudowire) and after implementing GRE
tunneling 1 host starts to experience bad TCP performance.
If customer would know to blame the operator, could operator after
such explanation believe customer, that it's problem in the network?
After all customer admitted everything works just fine for all
machines without GRE and even after GRE on all but one machine.
If we'd believe it's our fault, would our vendor's TAC be able to help us?


-- 
  ++ytti


More information about the cisco-nsp mailing list