[c-nsp] Loop Prevention in VPLS

Tue Oct 18 04:46:38 EDT 2016

Maybe I'm missing something here but I don't see the problem with VPLS
itself, it’s about how you deploy it surely? How do you manage large
campus LAN sections and DC LAN layer 2 sections? What about your
office, we have about 20 switches in a continuous layer 2 broadcast
domain, we don’t have any loops? So what about 2 VPLS hub sites and 10
spokes off each, the topology would be the same as our office LAN.

I have deployed VPLS for customers and "it works" - but we went
through much pain to simplify the designs as much as possible. I don’t
see much difference between a small LAN and small VPLS deployment,
plan it properly and think through all the possible failure scenarios
(just like you do for any layer 2 domain right).

 If you take a simple hub-and spoke topology, the VPLS cloud itself is
loop free thanks to split horizon (even if you have multiple
hub-sites). The loops tend to appear (in our experience) at the edge
after your PE/CE/circuit demark (its not impossible for loops to occur
in the VPLS cloud but that will be your fault :D ). On premise the
design can fork, if we have some old VPLS PEs that can’t forward BPDUs
across the VPLS domain the CE device we supply must process BPDUs to
ensure that that there are no loops in the client LAN up to our CE
devices (so they might be a switch rather than a router for example).
If the customer wants to connect their switch to ours we will work
with them as they MUST exchange BPDUS, they MUST use loopgaurd, and so
on. If we have BPDU supporting VPLS PEs the customer can send BPDU
frames into the cloud and again we will work with them and instruct
them that various BPDU and loop prevention techniques must be used on
their LAN.

Ultimately we can only design around 9/10 issues if I am being honest,
we can’t provide a guaranteed loop-free topology; our CE can send BPDU
frames and loopback frames into the customer LAN, if a device on the
LAN mangles or absorbs the frames we might not get them back and see
there is a loop, but now we are leaving the realms of normal working
operations. I’ve seen many devices do weird stuff at layer 3 when not
functioning normally or due to bugs. So I am happy that under normal
circumstances it works well.

As someone mentioned storm-control is vital. Also use policers/shapers
were you can, if the customer has a 20Mbps services and you’ve put in
a 100Mbps fibre, make sure you limit it, if/when they bugger it up
that will help. When thinking about worst case scenarios, lets imagine
the customer is hammering all their links and 100% utilisation with
genuine traffic, if you can’t support that scenario and that’s what
they’re asking for, don’t sell to them. We have some customers who
could have all sites looped generating 100% link utilisation and
they’d just get a big bill from us because they are connected to some
of our larger PEs/PoPs.

Once to meet demands in a region were VPLS isn’t available we built
out a partial mesh of port based L2TP tunnels between CE devices, as
that wil forward BPDUs and LACP frames. That meant the WAN saw regular
layer 3 traffic and the CE devices bore the brunt of the work. Works
fine.

I haven’t tried MPLS to the CE for layer 2 services, if you want to
run VPLS between MPLS capable CEs I’d be interested in your results :)

Cheers,
James.