[j-nsp] Network design problem in a bridged setup with 2x Juniper MX and some Brocade SuperX
Jeff Meyers
Jeff.Meyers at gmx.net
Wed Jan 30 17:26:19 EST 2013
Hello list,
I'm currently a little stuck and might need some help in order to decide
how to improve the current setup. We are running a network where all
customer vlans are bridged because the same Vlan is usually required in
different areas in the network. This is the setup:
Room A: +--------+
| SX1600 |--------> [ 2nd SuperX not installed yet ]
+--------+
| |
| |
+--------+ +--------+
| MX480 |--------| MX80 |
+--------+ +--------+
| |
| |
Room B: +--------+ +--------+
| SX400 |--------| SX400 |
+--------+ +--------+
Both MX routers have a 10G link between each other with RSTP active, so
the the two SuperXes in Room B. These are the priorities:
MX480: 0 (root bridge)
MX80: 4k (backup root)
SX400: both 16k
Because topology changes caused some minor packet loss in Room B, I
installed the SX1600 with MSTP instead of RSTP to see if that performs
better. During some tests before connecting customers to the SX1600,
results looked fine. We proceeded with the setup and replaced the old
Cisco 6509/sup32 with the SX1600 and turned all routed Vlans active on
the Cisco into bridged Vlans.
I'm running just one instance of MSTP (CIST) on the SX1600 with the
following configuration:
mstp scope all
mstp instance 0 vlan 1
mstp instance 0 vlan 19
...
mstp instance 0 priority 16384
mstp edge-port-auto-detect
mstp start
On this SX1600, most uplinks go to switches on their own, usually HP
ProCurve 2600 or 2800 series. Although we manage those switches,
customers can install cables on their own. And here is where the problem
actually starts: a rack with two ProCurve switches installed receives
two uplinks from the same SX1600 and those switches are connected with
each other, causing a loop. No matter what I did, the loop continued to
cause trouble to the whole network because the MX routers saw topology
changes all the time (between a few and 200 seconds or so) and flushed
the whole arp cache. With about 90.000 active arp entries, this caused a
more or less heavy impact on the servers behind of course. Although STP
was active on both HP switches, the problem didn't vanish but the
topolgy change itself was not visible on the SX1600 as it seems. In
order to solve the issue, we had to remove the cable causing the loop
but of course this can't be the solution since customers may install a
new loop anytime and what's the point in running STP if you need to care
about that?
The question is now how to proceed and how to improve the setup
generally? Does it make sense to change RSTP to MSTP on the MX routers
in the first place? Is there any configuration I should perform on any
of those devices involved?
Since many of you are most likely from the Cisco world, here is a list
of the available commands on the SuperX running in MST mode:
SSH at A.cs0 (config)#mst
admin-edge-port Define this port to be an edge port
admin-pt2pt-mac Define this port to be a point-to-point link
disable Disable MSTP on this interface
edge-port-auto-detect Enable/Disable auto-detect edge port
force-migration-check Trigger port's migration state machine check
force-version Configure MSTP force version
forward-delay Configure bridge parameter forward-delay
hello-time Configure bridge parameter hello-time
instance Configure MSTP instance VLAN membership
max-age Configure bridge parameter max-age
max-hops Configure MSTP max-hops
name Configure MSTP configuration name
revision Configure MSTP revision level
scope Configure MSTP scope
start Start/stop MSTP operation
Inside the interface configuration, there is no way to configure e.g. a
bpdu-protect on the port but root-protect is configured on every port
towards customer switches.
I will be gladly thankful for any hints and I am aware that some of you
might declare the setup to be broken but on the other hand, for
colocation services where the same vlan might be required campus-wide,
it's hard to improve that without installing tons of cables.
Furthermore, we want to eliminate the dependency of just one big
core-switch. Both rooms are equally important and in the past, we had a
big core in Room A with downlinks going to smaller core-switches in Room
B but with the big core having a problem, everything was going down.
Thanks so far for reading this and hopefully some great ideas will
follow. Any help will rewarded with a cold beer in Frankfurt, Germany
anytime! ;-)
Best regards,
Jeff
More information about the juniper-nsp
mailing list