[j-nsp] Weird behaviour in network after customer created a bridge inside a Windows VM

Jeff Meyers Jeff.Meyers at gmx.net
Thu May 29 10:35:36 EDT 2014


Hi everybody,

recently we saw a strange bahviour in our network. A customer with a 
Proxmox server had a Windows 2k8 VM with 2 virtual NICs (both bridged to 
eth0 of the server which faces the internet) and bridged them together 
INSIDE the VM. This caused immediately high latency and partial 
packet-loss within the whole network with the following messages in the 
router log:

May 29 04:06:33  cr0 l2ald[2545]: L2ALD_MAC_MOVE_NOTIFICATION: MAC Moves 
detected in the system


This is all I saw, no other device detected anything suspicious. This is 
the setup of the network:


MX480 router with DPCE and irb-interfaces but no (R)STP or any other STP 
flavour. This router connects 2x 10G as ae0 to a virtual-chassis 
consisting of 2x EX4550. This is a pure Layer2 device and the RSTP root 
bridge with priority 0. Furthermore, each server room is equipped with 
2x EX4200 in a VC as well with a RSTP priority of 16k. The ToR switches 
might have RSTP enabled or not and are usually connected with 1x GE to 
the EX4200 stack. Here is a scheme which describes the setup hopefully 
good enough:



                      +-------+
                      | MX480 |           ------- L3 edge-router, no STP
                      +-------+
                          ||              ------- ae0 with 2x XGE
                  +----------------+
                  |     EX4550     |      ------- L2 only
                  +----------------+      ------- RSTP priority 0
                  |     EX4550     |
                  +----------------+
                    ||          ||
               +----------+ +----------+
               |  EX4200  | |  EX4200  |
               +----------+ +----------+  ------- RSTP priority 16k
               |  EX4200  | |  EX4200  |
               +----------+ +----------+
                |    |            |       ------- 1GE links to ToR
        +-------+ +-------+ ...
        | HP SW | | HP SW |               ------- RSTP priority 32k
        +-------+ +-------+
         |||||||
         |...
    +--------------+
    | Proxmox Host |
    +--------------+



In this particular scenario, the HP ProCurve switch had STP disabled and 
did not participate in the spanning-tree protocol. However, for my 
understanding that shouldn't be required anyways because the VC of the 
EX4200 switches should identify a potential loop on their own as long as 
there is no BPDU filter present.
The visible behaviour was high latency to several devices and sometimes 
up to 100% packet-loss. Not only within the customer's vlan but globally 
in every L2 segment. The arp count did not change on the router, I 
checked that. No TC event was recognized by any device. The questions 
are now:

- How is it possible that a bridge on a host with just ONE physical 
uplink can cause such problems?
- Am I correct that RSTP on the ToR switches is not required as long as 
they do not filter BPDUs?

- Why would the MX router create such a message and indicate, that the 
MAC changes although there is only one interface (ae0) facing the L2 
network? Even if a MAC moves on the EX4550 stack from one port to 
another, the MX would never see that.


Please let me know if you need any further details. I am very curious to 
find out what went wrong here. Do I have a misunderstanding on how STP 
behaves in that scenario? Are there any ways to dig deeper into analysis 
of the causes for that strange MX log message?


Thanks a lot!


Best regards,
Jeff



More information about the juniper-nsp mailing list