[c-nsp] VMware teaming Nic's and multiple switches

Peter Rathlev peter at rathlev.dk
Thu Sep 20 04:56:00 EDT 2012


On Thu, 2012-09-20 at 10:20 +0200, Gert Doering wrote:
> It's easy if one of the physical links goes down ("do not use that!"), 
> but I'm thinking more about the uplink network getting partitioned, or
> one of the uplink switches failing in interesting ways (link still up,
> but no packets get forwarded anymore).  In Linux bonding, I can do that
> by having the bonding driver send out ARP requests & monitor incoming 
> responses...

VMware has something almost similar, although a little inferior, called
"beaconing". With beaconing enabled, every link send out probes to every
other link 10 times per second (IIRC) and every link expects to see
these probes from all other links. If a link stops seeing the probes it
is considered bad and is pulled from the pool of active links. As on can
quickly see, this only works reliably with three or more links.

If does not work as well as ARP probing since it doesn't actually test
reachability towards the gateway, only reachability between the physical
links. That means it cannot detect uplink failure in a scenario link
this:

    +-----------+       +-----------+
    | L3 agg #1 |-------| L3 agg #2 |
    +-----------+       +-----------+
          |                    |
          |                    |    <---- uplinks
          |                    |
    +-----------+       +-----------+
    | Switch #1 |-------| Switch #2 |
    +-----------+       +-----------+
             \             /
              \           /
               \         /
             +-------------+
             | VMware host |
             +-------------+

... since the connection between switches #1 and #2 forwards the probe
frames fine. We use link state tracking to catch simple uplink failure
to somewhat mitigate this.

Beware that according to documentation it precludes the use of
Etherchannels, though I don't know why. We don't use Etherchannels.

On the positive side it should put a lot less load on the gateway(s)
compared to ARP probes, since the RP no longer has to process anything.

VMware ESXi 5 is relatively new to us, so I'm not sure all of this is
still correct. But it should be easy to test with a SPAN session.

http://blogs.vmware.com/vsphere/2008/12/using-beaconing-to-detect-link-failures-or-beaconing-demystified.html

-- 
Peter




More information about the cisco-nsp mailing list