[j-nsp] LAG/ECMP hash performance

Saku Ytti saku at ytti.fi
Wed Aug 28 03:20:54 EDT 2019


On Wed, 28 Aug 2019 at 09:54, James Bensley
<jwbensley+juniper-nsp at gmail.com> wrote:

> No. Out of curiosity, have you, which is what lead you to post this?
> If yes, what platform?

I've had two issues where I cannot explain why there is imbalance. One
in MX2020 another in PTX. I can't find any elephant flows in netflow,
but I can find traffic grouped around with modest amount of IP address
entropy (like 20-32 SADDR + 20-32 DADDR + 1 SPORT + RND DPORT). My
understanding is, that just that RND DPORT should guarantee fair
balancing, in absence of elephant flows and when flow count is
sufficient.

I did briefly talk to some people, and one person mentioned they saw
this problem in NOK in their VOD distribution, again similarly flows
are grouped together, but ostensibly enough entropy. Curiously the NOK
case was fixed by adding static bits to the hash input, for every
single hash calculation (host IP). I think the solution supports hash
weakness, moving the input bits around caused the changing bit to move
from more 'vulnerable' bit locations to less vulnerable.
Another person mentioned seeing this in Jericho.

I did trivial lab test on MX2020, which I'll post at the end of the
email, which appears (not controlled enough to say for sure) to
support that hashing is less than idea.

> That is my understanding of CRC32 also, although I didn't know it was
> being widely used for load-balancing so I had never though of it as an
> actual piratical issue. One thing to consider is that not all CRC32
> sums are the same, what kind of polynomial is used varies and so $box1
> doing CRC32 for load-balancing might produce different results to
> $box2, if they use different polynomials. I have recorded some common
> ones here: https://null.53bits.co.uk/index.php?page=crc-and-checksum-error-detection#polynomial

Yes, I'm sure vendors have put some thought to this and have tried to
fix, what seems fundamental CRC quality of not being hash function
which has particularly good diffusion quality.

> It looks like the standard IEEE 802.3 value 0x04C11DB7 is being used
> for these tests, here
> https://github.com/jwbensley/Ethernet-CRC32/blob/master/crc32.c
>
> Other polys are used though, e.g. for larger packets. When using jumbo
> frames and stretching the amount of data the CRC has to protect
> against with the same sized sum (32 bits) other polynomials can be
> more effective. It's probably a safe bet that most implementations
> that use CRC32 for hashing use the same standard poly value but I'm
> keen to hear more about this.

Do you think that with other parameters it would achieve better
diffusion quality? Statistically you should see half of the output
bits change, when single input bit changes. And it may be that CRC
fundamentally does not satisfy this. And I think it makes sense,
because goal of CRC is to catch as much as possible of _small_
changes. Like Ethernet FCS will catch all single bit flips, and I
think maybe even all double bit flips, and then perhaps all evens or
odds count flips, I forgot which. And if you are spending the range
for this, fundamentally important goal in this application, then I
don't think you're going to achieve good diffusion with same
algorithm. Testing if hash function is good for ECMP/LAG should be
fairly trivial, as you can analyse large segment of the practical
space and see if there are statistical bias in diffusion.













SRC: (single 100GE interface, single unit 0)
  23.A.B.20 .. 23.A.B.46
  TCP/80
DST: (N*10GE LACP)
  157.C.D.20 .. 157.C.D.35
  TCP 2074..65470 (RANDOM, this alone, everything else static, should
have guaranteed fair balancing)

I'm running this through IXIA and my results are:

3*10GE Egress:
  port1 10766516pps
  port2 10766543pps
  port3  7536578pps
after (set forwarding-options enhanced-hash-key family inet
incoming-interface-index)
  port1 9689881pps
  port2 11791986pps
  port3 5383270pps
after removing s-int-index and setting adaptive
  port1 9689889pps
  port2 9689892pps
  port3 9689884pps

I think this supports that the hash function diffuses poorly. It
should be noted that 2nd step adds entirely _static_ bits to the input
of the hash, source interface does not change. And it's perfectly
repeatable. This is to be expected, the most affected weakness bits
shift, either making the problem worse or better.
I.e. flows are 100% perfectly hashable, but not without biasing the
hash results. There aren't any elephants.


4*10GE Egress:
  port1 4306757pps
  port2 8612807pps
  port3 9689893pps
  port4 6459931pps
after adding incoming-interface-index)
  port1 6459922pps
  port2 8613236pps
  port3 9691485pps
  port4 4306620pps
after removing s-index and adding adaptive:
  port1 7536562pps
  port2 7536593pps
  port3 6459928pps
  port4 7536566pps
after removing adaptive and adding no-destination-port + no-source-port
  port1: 5383279pps
  port2: 9689886pps
  port3: 7536588pps
  port4: 6459922pps
after removing no-source-port (i.e. destination port is used for hash)
  port1: 8613235pps
  port2: 5383272pps
  port3: 5383274pps
  port4: 9689884pps

It is curious that it actually balances more fairly, without using TCP
ports at all! Even thought there is _tons_ of entropy there due to
random DPORT.


--
  ++ytti


More information about the juniper-nsp mailing list