[c-nsp] Troubleshooting ECMP/bundling issue (5-tuple black holing)

James Bensley jwbensley at gmail.com
Fri Mar 10 04:42:54 EST 2017


On 9 March 2017 at 11:29, Lukas Tribus <luky-37 at hotmail.com> wrote:
> Hey guys,
>
>
> troubleshooting routing issues on paths external to our network that lead to blackholing of specific 5-tuple combinations here, very likely due to ECMP/Bundling issues (we are link is up/up and used for load-balancing, but cannot actually transmit or receive traffic, therefor dropping those packets on the floor).
>
>
> Now, this happened a few times over the years, and I am wondering if you guys have any suggestions or tools that you use in those cases, other than tcpdump'ing at both ends, generating thousands of 5-tuple combinations and then analyzing them in wireshark.

Many people on this list are probably doing this already but with
hardware devices like Ixia testers. You can use the control software
for them to generate different flows in a pragmatic fashion. We
haven’t the budget for them I’m afraid but I think we can test all we
want with these pieces of software below.

You can check out Cisco’s TRex (https://trex-tgn.cisco.com/) or
MoonGen (https://trex-tgn.cisco.com/). Both are built on DPDK so you
need to install that as a prerequisite. These will let you generate
large numbers of flows in a scriptable fashion and record the results.

I haven’t had time (story of my life!) but ideally I want to set up a
new test server in our lab an get either/both of these installed so
that each time we test a new device we can generate a range of
traffic/flows to test the device forwards as desired, to test load
balancing and hashing, testing ACLs, QoS etc.

At the minute we have a couple of low end devices and make do with the
following open source tools to generate single packets or single flows
for just basic speed testing or testing that traffic drops into a
specific queue, matches an ACL etc:

Generating single customer packets: http://ostinato.org/ and
http://packeth.sourceforge.net/packeth/Home.html

Layer 2 Ethernet/MPLS: https://github.com/jwbensley/Etherate

Layer3/4 IP/TCP/UDP: https://github.com/esnet/iperf

Layer 2/3/4: http://pktgen.readthedocs.io/en/latest/

Specifically for testing ECMP have a look at this (I haven’t had a
chance to play with it personally yet):
https://github.com/facebook/UdpPinger

> Also, after obtaining a list of affected and unaffected 5-tuples, any particular easy way to find out how this is getting hashed, so that we could find the likely number of bundle members (this could be very useful multiple interconnection and parties are involved).
>

You are probably going to need to dig into vendor specifics; what
vendors are in play and the configs deployed (what load balancing /
hashing options/knobs have been configured), then look at the hardware
documentation with regards to what is supported by the hardware, does
that match what is configured? If you test it empirically does it add
up? The vendor documentation should say how the load-balancing is
done.

You can roughly work out the hashing mechanism by say sending a fake
flow from 10.0.0.1 to 10.0.0.2, proto TCP, src port 1, dst port 1.
Then just increment one field by one, dst port == 2, then dst == 3
etc. Look as the traffic moves between links. If you keep incrementing
you can brute force you way through an eventually you might see the
same pattern of hashing results emerge.

Also some boxes have a command to test the hashing, example from a  Cisco 4500X:

#show platform software etherchannel port-channel 1 map l4-port
1.1.1.1 100 2.2.2.2 200 | i is Te
Map port for l4-port 1.1.1.1:100, 2.2.2.2:200 is Te1/1/16(Po1)


Cheers,
James.


More information about the cisco-nsp mailing list