[j-nsp] Network, trouble after customer created a loop *inside* a VM host

Jeff Meyers Jeff.Meyers at gmx.net
Fri Nov 7 09:18:35 EST 2014


Hello everybody,

I'm writing to this list because I can't seem to find the reason for 
what we saw twice meanwhile. Here is the setup:



    Juniper MX480 	no RSTP
          ||
          ae0
          ||
   Juniper EX4550 VC	RSTP bridge id 0
          ||
          ae0
          ||
   Juniper EX4200 VC	RSTP bridge id 16k
           |
     ProCurve 2824	RSTP bridge id 32k
           |
       Windows Host


So the router itself is not part of the Spanning-Tree, everything below 
is. On the Windows host, the customer is running ESXi with just one 
uplink towards the HP ProCurve switch so there is not even a real danger 
for a physical loop. Now: on the host are two VMs running. Each of them 
has a virtual NIC which is bridged to the physical one of the host. 
Because of a mistake, the customer accidentally bridged his two VMs 
together as well which caused a loop inside the Host. So far, so good.

The trouble begins at this point because immediately we saw partial 
network outages resulting in router messages like this:

Nov  7 14:30:47  cr0 l2ald[2545]: L2ALD_MAC_MOVE_NOTIFICATION: MAC Moves 
detected in the system


This message repeated over and over and the ARP counter decreased 
continueously. Host flapped and vanished for seconds or minutes and 
internal smokeping measured a lot of loss.

The HP ProCurve logged only excessive broadcast for the customer port 
and that's it. Spanning-Tree didn't recognize anything. The same applies 
to the EX4200 VC and the EX4550 VC: nothing was detected by the loop 
preventing procotol and it was only a lucky shot, that we knew where to 
look because the customer called by phone and told us, what he did.

The question is: how can that be and what can I do?

On the EX-series switches, each downlink port is configured with

set protocols rstp interface ge-0/0/0 no-root-port

storm-control is enabled on all ports with 85% (but none was detected). 
There is no special configuration on the ProCurve besides the general 
RSTP activation (which is set to RSTP and not STP).


So can anybody help with that? I am really stuck here.. :(


Thanks in advance,
Jeff


More information about the juniper-nsp mailing list