[j-nsp] LAG/ECMP hash performance

Tue Nov 26 11:26:44 EST 2019

On Wed, 28 Aug 2019 at 08:21, Saku Ytti <saku at ytti.fi> wrote:
> SRC: (single 100GE interface, single unit 0)
>   23.A.B.20 .. 23.A.B.46
>   TCP/80
> DST: (N*10GE LACP)
>   157.C.D.20 .. 157.C.D.35
>   TCP 2074..65470 (RANDOM, this alone, everything else static, should
> have guaranteed fair balancing)
>
> I'm running this through IXIA and my results are:
>
> 3*10GE Egress:
>   port1 10766516pps
>   port2 10766543pps
>   port3  7536578pps
> after (set forwarding-options enhanced-hash-key family inet
> incoming-interface-index)
>   port1 9689881pps
>   port2 11791986pps
>   port3 5383270pps
> after removing s-int-index and setting adaptive
>   port1 9689889pps
>   port2 9689892pps
>   port3 9689884pps
>
> I think this supports that the hash function diffuses poorly. It
> should be noted that 2nd step adds entirely _static_ bits to the input
> of the hash, source interface does not change. And it's perfectly
> repeatable. This is to be expected, the most affected weakness bits
> shift, either making the problem worse or better.
> I.e. flows are 100% perfectly hashable, but not without biasing the
> hash results. There aren't any elephants.
>
>
> 4*10GE Egress:
>   port1 4306757pps
>   port2 8612807pps
>   port3 9689893pps
>   port4 6459931pps
> after adding incoming-interface-index)
>   port1 6459922pps
>   port2 8613236pps
>   port3 9691485pps
>   port4 4306620pps
> after removing s-index and adding adaptive:
>   port1 7536562pps
>   port2 7536593pps
>   port3 6459928pps
>   port4 7536566pps
> after removing adaptive and adding no-destination-port + no-source-port
>   port1: 5383279pps
>   port2: 9689886pps
>   port3: 7536588pps
>   port4: 6459922pps
> after removing no-source-port (i.e. destination port is used for hash)
>   port1: 8613235pps
>   port2: 5383272pps
>   port3: 5383274pps
>   port4: 9689884pps
>
> It is curious that it actually balances more fairly, without using TCP
> ports at all! Even thought there is _tons_ of entropy there due to
> random DPORT.

Better late than never....

100G link from Ixia to ASR9K Hu0/1/0/3, with a pseudowire attachment
interface configured on Hu0/1/0/3.4001, 3x100G core facing LAG links
(Hu0/0/0/0, Hu0/0/0/5, Hu0/0/0/6).

The packet stream sent from Ixia has an Ethernet header with random
dest MAC, random src MAC, VLAN ID 4001 to match into pseudowire AC,
IPv4 headers are next with random dest IP and random src IP, TCP
headers follow with random dest port and random src port. Payload is
random data. Frame size is 1522 bytes.

Everything is re-randomised every frame. Sending ~100Mbps of traffic...

The default load-balancing method on ASR9K for L2VPNs is
per-pseudowire so initially everything falls onto one core facing LAG
member:

ar0-ws.bllab         Monitor Time: 00:15:42          SysUptime: 312:06:24
                     Last Clear:   00:10:36
Protocol:General
Interface             In(bps)      Out(bps)     InBytes/Delta  OutBytes/Delta
Hu0/1/0/3             99.1M/  0%        0/  0%     2.7G/24.8M         0/0
Hu0/0/0/0             11000/  0%    15000/  0%   495110/3226     639642/4198
Hu0/0/0/5             12000/  0%   100.3M/  0%   467544/2958       2.7G/25.1M
Hu0/0/0/6             13000/  0%    12000/  0%   523510/3328     483334/3020

Switch to src+dst MAC load-balancing and we get a more or less perfect
distribution:
!
l2vpn
 load-balancing flow src-dst-mac
!

ar0-ws.bllab         Monitor Time: 00:20:56          SysUptime: 312:11:38
                     Last Clear:   00:17:02
Protocol:General
Interface             In(bps)      Out(bps)     InBytes/Delta  OutBytes/Delta
Hu0/1/0/3             99.7M/  0%        0/  0%     2.9G/24.9M         0/0
Hu0/0/0/0             12000/  0%    31.7M/  0%   371774/2972     993.0M/8.6M
Hu0/0/0/5             12000/  0%    33.4M/  0%   366524/2958     980.9M/8.1M
Hu0/0/0/6             12000/  0%    33.3M/  0%   373604/3442     979.3M/8.4M

When switching to src+dst IP load-balancing we get basically the same
distribution:
!
l2vpn
 load-balancing flow src-dst-ip
!

ar0-ws.bllab         Monitor Time: 00:23:22          SysUptime: 312:14:04
                     Last Clear:   00:21:58
Protocol:General
Interface             In(bps)      Out(bps)     InBytes/Delta  OutBytes/Delta
Hu0/1/0/3             99.7M/  0%        0/  0%     1.0G/24.9M         0/0
Hu0/0/0/0             11000/  0%    31.2M/  0%   135550/2888     355.8M/8.4M
Hu0/0/0/5             12000/  0%    33.6M/  0%   131840/3396     353.1M/8.4M
Hu0/0/0/6             12000/  0%    33.4M/  0%   134639/3091     351.1M/8.3M

Tomahawk NPU is using CRC32 for load-balancing so not sure why the
MX2020 box you tested was so uneven if also using CRC32. It could be
implementation specific as you mentioned with the Nokia owner who
added a 32b static value. Despite having TCP headers on top of IP
headers, if I remove TCP, or set the TCP ports to be static, random,
incrementing etc., it has no impact on the above, so the ASR9K isn't
feeding layer 4 keys into the CRC32 (which is exactly as the Cisco
documentation states).

This is not 100% apples to apples, because I'm interested in tested
how pseudowire traffic is load-balanced towards the core and I expect
your looking at layer 3 ECMP, however it is kind of the same; The
pseudowire ingress PE has access to the layer 2 / 3 / 4 headers of the
L2VPN payload traffic, so it has the same keys to feed into a CRC32.

Just a 2nd data point for you...

Cheers,
James.