[j-nsp] Network design problem in a bridged setup with 2x Juniper MX and some Brocade SuperX

Wed Jan 30 17:26:19 EST 2013

Hello list,

I'm currently a little stuck and might need some help in order to decide 
how to improve the current setup. We are running a network where all 
customer vlans are bridged because the same Vlan is usually required in 
different areas in the network. This is the setup:

Room A:      +--------+
              | SX1600 |--------> [ 2nd SuperX not installed yet ]
              +--------+
                   |                 |
                   |                 |
              +--------+        +--------+
              | MX480  |--------|  MX80  |
              +--------+        +--------+
                   |                 |
                   |                 |
Room B:      +--------+        +--------+
              | SX400  |--------| SX400  |
              +--------+        +--------+

Both MX routers have a 10G link between each other with RSTP active, so 
the the two SuperXes in Room B. These are the priorities:

MX480: 0 (root bridge)
MX80: 4k (backup root)
SX400: both 16k

Because topology changes caused some minor packet loss in Room B, I 
installed the SX1600 with MSTP instead of RSTP to see if that performs 
better. During some tests before connecting customers to the SX1600, 
results looked fine. We proceeded with the setup and replaced the old 
Cisco 6509/sup32 with the SX1600 and turned all routed Vlans active on 
the Cisco into bridged Vlans.

I'm running just one instance of MSTP (CIST) on the SX1600 with the 
following configuration:

mstp scope all
mstp instance 0 vlan 1
mstp instance 0 vlan 19
...
mstp instance 0 priority 16384
mstp edge-port-auto-detect
mstp start

On this SX1600, most uplinks go to switches on their own, usually HP 
ProCurve 2600 or 2800 series. Although we manage those switches, 
customers can install cables on their own. And here is where the problem 
actually starts: a rack with two ProCurve switches installed receives 
two uplinks from the same SX1600 and those switches are connected with 
each other, causing a loop. No matter what I did, the loop continued to 
cause trouble to the whole network because the MX routers saw topology 
changes all the time (between a few and 200 seconds or so) and flushed 
the whole arp cache. With about 90.000 active arp entries, this caused a 
more or less heavy impact on the servers behind of course. Although STP 
was active on both HP switches, the problem didn't vanish but the 
topolgy change itself was not visible on the SX1600 as it seems. In 
order to solve the issue, we had to remove the cable causing the loop 
but of course this can't be the solution since customers may install a 
new loop anytime and what's the point in running STP if you need to care 
about that?

The question is now how to proceed and how to improve the setup 
generally? Does it make sense to change RSTP to MSTP on the MX routers 
in the first place? Is there any configuration I should perform on any 
of those devices involved?
Since many of you are most likely from the Cisco world, here is a list 
of the available commands on the SuperX running in MST mode:

SSH at A.cs0 (config)#mst
   admin-edge-port         Define this port to be an edge port
   admin-pt2pt-mac         Define this port to be a point-to-point link
   disable                 Disable MSTP on this interface
   edge-port-auto-detect   Enable/Disable auto-detect edge port
   force-migration-check   Trigger port's migration state machine check
   force-version           Configure MSTP force version
   forward-delay           Configure bridge parameter forward-delay
   hello-time              Configure bridge parameter hello-time
   instance                Configure MSTP instance VLAN membership
   max-age                 Configure bridge parameter max-age
   max-hops                Configure MSTP max-hops
   name                    Configure MSTP configuration name
   revision                Configure MSTP revision level
   scope                   Configure MSTP scope
   start                   Start/stop MSTP operation

Inside the interface configuration, there is no way to configure e.g. a 
bpdu-protect on the port but root-protect is configured on every port 
towards customer switches.

I will be gladly thankful for any hints and I am aware that some of you 
might declare the setup to be broken but on the other hand, for 
colocation services where the same vlan might be required campus-wide, 
it's hard to improve that without installing tons of cables.
Furthermore, we want to eliminate the dependency of just one big 
core-switch. Both rooms are equally important and in the past, we had a 
big core in Room A with downlinks going to smaller core-switches in Room 
B but with the big core having a problem, everything was going down.

Thanks so far for reading this and hopefully some great ideas will 
follow. Any help will rewarded with a cold beer in Frankfurt, Germany 
anytime! ;-)

Best regards,
Jeff