[j-nsp] Network, trouble after customer created a loop *inside* a VM host

Alexander Arseniev arseniev at btinternet.com
Sat Nov 8 09:44:02 EST 2014


Hello,
I think we are missing some important details here.
AFAIK, in order to detect MAC moves, the port must be in a 
bridge-domain/VPLS instance.
So Your MX480 ae0 must be a L2/"bridged" port, not a L3/routed one.
So the question would be -  are there any other ports on this MX480 in 
same bridge-domain(BD)/VPLS instance?
If not, but You have an IRB interface in this BD, does it have "IS-IS 
passive" enabled by any chance? "IS-IS passive" does not actually stop 
ES-IS PDUs being sent out, so these pesky ES-IS mcast frames could be 
the ones which looped.
Additionally, MAC move limiting is not supported on EX4550 VC and in 
mixed EX4200-4500/4550 VC so if Your EX4200 VC is actually a mixed 
EX4200-4500/4550 VC there is no chance getting it stopped on EX.
https://www.juniper.net/techpubs/en_US/release-independent/junos/topics/concept/ex-series-software-features-overview-vc.html#port-security-features-by-platform-table 

Thanks
Alex

On 07/11/2014 14:18, Jeff Meyers wrote:
> Hello everybody,
>
> I'm writing to this list because I can't seem to find the reason for 
> what we saw twice meanwhile. Here is the setup:
>
>
>
>    Juniper MX480     no RSTP
>          ||
>          ae0
>          ||
>   Juniper EX4550 VC    RSTP bridge id 0
>          ||
>          ae0
>          ||
>   Juniper EX4200 VC    RSTP bridge id 16k
>           |
>     ProCurve 2824    RSTP bridge id 32k
>           |
>       Windows Host
>
>
> So the router itself is not part of the Spanning-Tree, everything 
> below is. On the Windows host, the customer is running ESXi with just 
> one uplink towards the HP ProCurve switch so there is not even a real 
> danger for a physical loop. Now: on the host are two VMs running. Each 
> of them has a virtual NIC which is bridged to the physical one of the 
> host. Because of a mistake, the customer accidentally bridged his two 
> VMs together as well which caused a loop inside the Host. So far, so 
> good.
>
> The trouble begins at this point because immediately we saw partial 
> network outages resulting in router messages like this:
>
> Nov  7 14:30:47  cr0 l2ald[2545]: L2ALD_MAC_MOVE_NOTIFICATION: MAC 
> Moves detected in the system
>
>
> This message repeated over and over and the ARP counter decreased 
> continueously. Host flapped and vanished for seconds or minutes and 
> internal smokeping measured a lot of loss.
>
> The HP ProCurve logged only excessive broadcast for the customer port 
> and that's it. Spanning-Tree didn't recognize anything. The same 
> applies to the EX4200 VC and the EX4550 VC: nothing was detected by 
> the loop preventing procotol and it was only a lucky shot, that we 
> knew where to look because the customer called by phone and told us, 
> what he did.
>
> The question is: how can that be and what can I do?
>
> On the EX-series switches, each downlink port is configured with
>
> set protocols rstp interface ge-0/0/0 no-root-port
>
> storm-control is enabled on all ports with 85% (but none was 
> detected). There is no special configuration on the ProCurve besides 
> the general RSTP activation (which is set to RSTP and not STP).
>
>
> So can anybody help with that? I am really stuck here.. :(
>
>
> Thanks in advance,
> Jeff
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp



More information about the juniper-nsp mailing list