[j-nsp] Weird behaviour in network after customer created a bridge inside a Windows VM
Jeff Meyers
Jeff.Meyers at gmx.net
Thu May 29 10:35:36 EDT 2014
Hi everybody,
recently we saw a strange bahviour in our network. A customer with a
Proxmox server had a Windows 2k8 VM with 2 virtual NICs (both bridged to
eth0 of the server which faces the internet) and bridged them together
INSIDE the VM. This caused immediately high latency and partial
packet-loss within the whole network with the following messages in the
router log:
May 29 04:06:33 cr0 l2ald[2545]: L2ALD_MAC_MOVE_NOTIFICATION: MAC Moves
detected in the system
This is all I saw, no other device detected anything suspicious. This is
the setup of the network:
MX480 router with DPCE and irb-interfaces but no (R)STP or any other STP
flavour. This router connects 2x 10G as ae0 to a virtual-chassis
consisting of 2x EX4550. This is a pure Layer2 device and the RSTP root
bridge with priority 0. Furthermore, each server room is equipped with
2x EX4200 in a VC as well with a RSTP priority of 16k. The ToR switches
might have RSTP enabled or not and are usually connected with 1x GE to
the EX4200 stack. Here is a scheme which describes the setup hopefully
good enough:
+-------+
| MX480 | ------- L3 edge-router, no STP
+-------+
|| ------- ae0 with 2x XGE
+----------------+
| EX4550 | ------- L2 only
+----------------+ ------- RSTP priority 0
| EX4550 |
+----------------+
|| ||
+----------+ +----------+
| EX4200 | | EX4200 |
+----------+ +----------+ ------- RSTP priority 16k
| EX4200 | | EX4200 |
+----------+ +----------+
| | | ------- 1GE links to ToR
+-------+ +-------+ ...
| HP SW | | HP SW | ------- RSTP priority 32k
+-------+ +-------+
|||||||
|...
+--------------+
| Proxmox Host |
+--------------+
In this particular scenario, the HP ProCurve switch had STP disabled and
did not participate in the spanning-tree protocol. However, for my
understanding that shouldn't be required anyways because the VC of the
EX4200 switches should identify a potential loop on their own as long as
there is no BPDU filter present.
The visible behaviour was high latency to several devices and sometimes
up to 100% packet-loss. Not only within the customer's vlan but globally
in every L2 segment. The arp count did not change on the router, I
checked that. No TC event was recognized by any device. The questions
are now:
- How is it possible that a bridge on a host with just ONE physical
uplink can cause such problems?
- Am I correct that RSTP on the ToR switches is not required as long as
they do not filter BPDUs?
- Why would the MX router create such a message and indicate, that the
MAC changes although there is only one interface (ae0) facing the L2
network? Even if a MAC moves on the EX4550 stack from one port to
another, the MX would never see that.
Please let me know if you need any further details. I am very curious to
find out what went wrong here. Do I have a misunderstanding on how STP
behaves in that scenario? Are there any ways to dig deeper into analysis
of the causes for that strange MX log message?
Thanks a lot!
Best regards,
Jeff
More information about the juniper-nsp
mailing list