[j-nsp] Network design problem in a bridged setup with 2x Juniper MX and some Brocade SuperX

Wed Jan 30 22:57:10 EST 2013

Hi Jeff, 

> The question is now how to proceed and how to improve the setup generally?

>From what you've described, it sounds like there is a misconfiguration or bug *somewhere* amongst your 3 vendors.  As painful as it will probably be to locate, that is probably the best place to start.

- Since you're only using a CIST ensure that *every* VLAN is configured on every switch. 
- Make sure they are all configured as members of the CIST region too, otherwise your MSTP hash won't match and you'll end up with weird results not unlike what you are seeing
- Also make sure the MSTP revision level and configuration name for each switch is identical otherwise the hash won't match again
- Check all up/downlinks to make sure that there are no boundary ports - this will indicate a problem with one of the above items

All that said, almost every vendors implementation has it's peculiarities. In EXOS (Extreme Networks) for example, if you don't configure edge-safeguard on your edge ports, then if the *edge* port ever changes state (up or down), a TCN will trigger.  Great when everyone shuts down their PCs in the evening at close intervals.

For your customer-facing ports, you want to BPDU Protect/Edge Guard or whatever HP call it configured.  If they loop a port, you shut it down and leave it down.  I've seen ESX vSwitches do this on plenty of occasions during reboots, even they shouldn't (eg: a loop is briefly formed inside the customer's hypervisor across a supposedly bonded link).

If you're downlinking to customer switches the only real option you have is Root-Protection/Root-Guard.  This will block any port that receives a BPDU advertising a superior root bridge.  A lot of people make the mistake of either disabling STP on links to "untrusted" switches, or filtering BPDUs altogether so that the customer can run their own xSTP domain beside yours.  Bad move.  When someone down the track loops that port, you'll remember why.  You only want one root bridge in any L2 domain.

On all your switches, enable Storm-Control (or equivalent) with aggressive limits on broadcast traffic.  Even with all of the above in place, there is nothing to stop one of your customer's downstream switches not running spanning-tree to have it's own loop and send the resulting broadcast storm back to you and there is very little you can do about it.

> Does it make sense to change RSTP to MSTP on the MX routers in the first place?

Since you've only configured the CIST,  your RSTP and MSTP operation is basically equivalent eg: you have a single spanning-tree instance across all your VLANs and convergence time and operation will be pretty much identical.

MSTP is a bit more work to configure and troubleshoot (especially if you're running multiple regions), but gives you that flexibility to lay out different trees across VLAN groups if required.

> Is there any configuration I should perform on any of those devices involved?

Hard set your STP's point-to-point mode on all your uplinks, you may find it improves convergence time slightly on some vendors

Without knowing anything else about your set-up (L3 termination or the capabilities of the SX and HP boxen) you could configure Q-in-Q and use layer2-protocol-tunneling for all your customer's traffic (BPDUs included).  Let them manage their own VLANs (give each customer a dedicated S-VLAN to go nuts with) and provide a level of separation between your STP and theirs.  The MX can do plenty of packet-fu to Q-in-Q tagged frames in order to terminate to any L3 interfaces.

Good luck!

Ben