[c-nsp] PVLANs in a Hosting Environment

Sat Mar 6 01:35:50 EST 2010

On Fri, Feb 26, 2010 at 6:59 AM, Matthew Melbourne <matt at melbourne.org.uk>wrote:

> We are investigating options to provide a "VLAN-per-customer" within a
> hosting environment. Inside each VLAN could be hosting services, e.g.
> hosted web servers, AD, Exchange (etc). In order to maximum the number
> of supported VLANs, then the use of Private VLANs has been raised.
>

I have used this configuration in a large scale hosting environment with
numerous /24's on a single SVI interface.  This works reasonably well if you
want to throw a bunch of servers into a VLAN and provide some form of
isolation, but still allow full IP connectivity.  It isn't as good as a VLAN
per customer though, and really if you are going to add a firewall you
pretty much need a real VLAN for each customer anyway.  PVLAN +
local-proxy-arp is really only useful for non-firewalled servers, or perhaps
for the outside interfaces of firewalls with real VLANs behind those
firewalls.

However, there are many caveats.

"Transparent" briding firewalls, such as the Cisco ASA in transparent mode,
don't really work in this environment.  If you have 2 servers downstream of
your transparent firewall and they ARP for each other, it becomes undefined
who will answer.  The other server may answer first - or the upstream
router's "local proxy arp" may answer first.  If local proxy arp wins, then
server a's path to server b is up through the firewall, through the upstream
router, and then back down to the firewall, who sees packets sourced from
behind it and simply drops them.

Semi-transparent firewalls (such as the Sonicwall), that don't actually
bridge traffic, but instead do more of a proxy ARPing router type behavior,
will work fine.

Routed mode firewalls work, however it should be noted that if you don't put
an HA pair in a community, then the HA pair's heartbeats won't be seen.  For
ASA this seems to end up working OK, but if you look at "show failover" it
complains about not seeing the other firewall on the outside interface.

PVLAN communities are hard to maintain.  I have pretty much stopped using
them.  You have to associate the primary VLAN with the isolated and all
community VLANs on EVERY switch that will carry them.  I can't tell you how
many times I've seen someone forget to associate a new community on a switch
that is in-between the router and the client.  When this happens, packets
will pass through the middle switch, however since the middle switch doesn't
know to share a single CAM table between the VLAN tags, your traffic will be
unknown unicast flooded.  If you have a lot of automation and/or a
simple/static network, this may not be a problem for you.  However, if you
have a lot of switches in a chaotic environment with lots of engineers
manually making changes, this can be a problem.

PVLAN with local-proxy-arp provides full protection against default-gateway
IP conflicts.  However, it only provides mitigation (sticky arp) against
conflicts between host IPs.  Whoever gets an IP first gets to keep it.
 However, one day a year from now you might reboot your upstream router and
suddenly find 100 IP conflicts that you have been "protected" against
suddenly all happen at once since you lost your sticky arp table.

For load balancing, I offer it by placing a large shared load balancer with
a single interface going to another VLAN on the router.  I then use policy
based routing (applied to the PVLAN interface) to ensure that traffic
originating on the PVLAN that is part of a load balanced real-server goes
through the load balancer on the way upstream.  This works fine, and even
customers on the PVLAN in the same subnet can hit other customer's VIPs
since it all gets forced through the upstream router.  You could place
dedicated load balancers with their outside interfaces in a PVLAN as long as
you watch out for HA heartbeats as discussed above.

Finally, understand that this is a relatively poorly documented
configuration, and also may not be 100% bug free.  For example, imagine a
basic PVLAN configuration with local proxy arp, upstream redundant routers
running HSRP, and a subnet on the PVLAN interface of those routers.  Each
router is configured with local-proxy-arp, which means it answers every ARP.
 Now, how does redundant router B know to not respond to router A's ARPs?
 I've never seen this documented, but it seems to be a behavior triggered by
an HSRP association.  If you see an ARP from your HSRP partner, do not
respond as you would normally because of your local-proxy-arp configuration.

However, imagine the first 5-10 seconds when the router has just booted up,
but HSRP is not yet active?  What happens?  They both answer each other's
ARPs.  Router A's PVLAN ARP table is full of router B's mc, and router B's
PVLAN ARP table is full of router A's mac.  All of these IPs are now in a
routing loop.  On top of that, the "sticky arp" feature ensures that these
ARP entries are never overwritten and the routing loop lasts forever.

I'm not aware of any resolution to this problem other than to clear ARP on
the PVLAN interfaces after HSRP becomes active.  Personally, my routers
rarely reboot so this isn't a huge issue.

At this point, I no longer deploy this (PVLAN with local proxy ARP).
 Everyone gets a firewall, and thus everyone needs a real VLAN behind that
firewall.  The downside of this is that my pain point now is number of
VLANs.  I haven't yet moved to MST, and the 1,800 logical-port-per-slot
limit of the 6500 is terribly easy to reach with 500 VLANs that the
distribution layer needs to send down to the many switches of the access
layer.

Scaling lots of VLANs to lots of switches, all in the same domain (aka every
VLAN on every switch throughout the data center), has become the number 1
architectural issue I focus on these days.