[j-nsp] Network, trouble after customer created a loop *inside* a VM host
Alexander Arseniev
arseniev at btinternet.com
Sat Nov 8 09:44:02 EST 2014
Hello,
I think we are missing some important details here.
AFAIK, in order to detect MAC moves, the port must be in a
bridge-domain/VPLS instance.
So Your MX480 ae0 must be a L2/"bridged" port, not a L3/routed one.
So the question would be - are there any other ports on this MX480 in
same bridge-domain(BD)/VPLS instance?
If not, but You have an IRB interface in this BD, does it have "IS-IS
passive" enabled by any chance? "IS-IS passive" does not actually stop
ES-IS PDUs being sent out, so these pesky ES-IS mcast frames could be
the ones which looped.
Additionally, MAC move limiting is not supported on EX4550 VC and in
mixed EX4200-4500/4550 VC so if Your EX4200 VC is actually a mixed
EX4200-4500/4550 VC there is no chance getting it stopped on EX.
https://www.juniper.net/techpubs/en_US/release-independent/junos/topics/concept/ex-series-software-features-overview-vc.html#port-security-features-by-platform-table
Thanks
Alex
On 07/11/2014 14:18, Jeff Meyers wrote:
> Hello everybody,
>
> I'm writing to this list because I can't seem to find the reason for
> what we saw twice meanwhile. Here is the setup:
>
>
>
> Juniper MX480 no RSTP
> ||
> ae0
> ||
> Juniper EX4550 VC RSTP bridge id 0
> ||
> ae0
> ||
> Juniper EX4200 VC RSTP bridge id 16k
> |
> ProCurve 2824 RSTP bridge id 32k
> |
> Windows Host
>
>
> So the router itself is not part of the Spanning-Tree, everything
> below is. On the Windows host, the customer is running ESXi with just
> one uplink towards the HP ProCurve switch so there is not even a real
> danger for a physical loop. Now: on the host are two VMs running. Each
> of them has a virtual NIC which is bridged to the physical one of the
> host. Because of a mistake, the customer accidentally bridged his two
> VMs together as well which caused a loop inside the Host. So far, so
> good.
>
> The trouble begins at this point because immediately we saw partial
> network outages resulting in router messages like this:
>
> Nov 7 14:30:47 cr0 l2ald[2545]: L2ALD_MAC_MOVE_NOTIFICATION: MAC
> Moves detected in the system
>
>
> This message repeated over and over and the ARP counter decreased
> continueously. Host flapped and vanished for seconds or minutes and
> internal smokeping measured a lot of loss.
>
> The HP ProCurve logged only excessive broadcast for the customer port
> and that's it. Spanning-Tree didn't recognize anything. The same
> applies to the EX4200 VC and the EX4550 VC: nothing was detected by
> the loop preventing procotol and it was only a lucky shot, that we
> knew where to look because the customer called by phone and told us,
> what he did.
>
> The question is: how can that be and what can I do?
>
> On the EX-series switches, each downlink port is configured with
>
> set protocols rstp interface ge-0/0/0 no-root-port
>
> storm-control is enabled on all ports with 85% (but none was
> detected). There is no special configuration on the ProCurve besides
> the general RSTP activation (which is set to RSTP and not STP).
>
>
> So can anybody help with that? I am really stuck here.. :(
>
>
> Thanks in advance,
> Jeff
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
More information about the juniper-nsp
mailing list