[j-nsp] Weird behaviour in network after customer created a bridge inside a Windows VM
Morgan McLean
wrx230 at gmail.com
Thu May 29 12:36:32 EDT 2014
I'm not aware of any inherent loop protection when running a VC setup that
substitutes for STP.
Thanks,
Morgan
On Thu, May 29, 2014 at 7:35 AM, Jeff Meyers <Jeff.Meyers at gmx.net> wrote:
> Hi everybody,
>
> recently we saw a strange bahviour in our network. A customer with a
> Proxmox server had a Windows 2k8 VM with 2 virtual NICs (both bridged to
> eth0 of the server which faces the internet) and bridged them together
> INSIDE the VM. This caused immediately high latency and partial packet-loss
> within the whole network with the following messages in the router log:
>
> May 29 04:06:33 cr0 l2ald[2545]: L2ALD_MAC_MOVE_NOTIFICATION: MAC Moves
> detected in the system
>
>
> This is all I saw, no other device detected anything suspicious. This is
> the setup of the network:
>
>
> MX480 router with DPCE and irb-interfaces but no (R)STP or any other STP
> flavour. This router connects 2x 10G as ae0 to a virtual-chassis consisting
> of 2x EX4550. This is a pure Layer2 device and the RSTP root bridge with
> priority 0. Furthermore, each server room is equipped with 2x EX4200 in a
> VC as well with a RSTP priority of 16k. The ToR switches might have RSTP
> enabled or not and are usually connected with 1x GE to the EX4200 stack.
> Here is a scheme which describes the setup hopefully good enough:
>
>
>
> +-------+
> | MX480 | ------- L3 edge-router, no STP
> +-------+
> || ------- ae0 with 2x XGE
> +----------------+
> | EX4550 | ------- L2 only
> +----------------+ ------- RSTP priority 0
> | EX4550 |
> +----------------+
> || ||
> +----------+ +----------+
> | EX4200 | | EX4200 |
> +----------+ +----------+ ------- RSTP priority 16k
> | EX4200 | | EX4200 |
> +----------+ +----------+
> | | | ------- 1GE links to ToR
> +-------+ +-------+ ...
> | HP SW | | HP SW | ------- RSTP priority 32k
> +-------+ +-------+
> |||||||
> |...
> +--------------+
> | Proxmox Host |
> +--------------+
>
>
>
> In this particular scenario, the HP ProCurve switch had STP disabled and
> did not participate in the spanning-tree protocol. However, for my
> understanding that shouldn't be required anyways because the VC of the
> EX4200 switches should identify a potential loop on their own as long as
> there is no BPDU filter present.
> The visible behaviour was high latency to several devices and sometimes up
> to 100% packet-loss. Not only within the customer's vlan but globally in
> every L2 segment. The arp count did not change on the router, I checked
> that. No TC event was recognized by any device. The questions are now:
>
> - How is it possible that a bridge on a host with just ONE physical uplink
> can cause such problems?
> - Am I correct that RSTP on the ToR switches is not required as long as
> they do not filter BPDUs?
>
> - Why would the MX router create such a message and indicate, that the MAC
> changes although there is only one interface (ae0) facing the L2 network?
> Even if a MAC moves on the EX4550 stack from one port to another, the MX
> would never see that.
>
>
> Please let me know if you need any further details. I am very curious to
> find out what went wrong here. Do I have a misunderstanding on how STP
> behaves in that scenario? Are there any ways to dig deeper into analysis of
> the causes for that strange MX log message?
>
>
> Thanks a lot!
>
>
> Best regards,
> Jeff
>
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>
More information about the juniper-nsp
mailing list