[c-nsp] Q: L2 vs. L3 High Availability

Wed Dec 1 01:02:35 EST 2004

John Ferriby wrote:

>We've been using a high-availability configuration using
>dual switches in a switch-fault-tolerant configuration.
>Rapid PVST+ and Intel adapters on the servers in using
>Intel's Adapter teaming software.
>
>We're considering going to an L3 multihomed configuration
>where the servers advertise (by way of OSPF) an "internal"
>network via the multiple adapters and then using HSRP on
>nearby L3 switches.
>
>Anybody else been through this configuration?  We're looking
>to build a more stable, controllable fail-safe environment.
>The L2 environment fails-over very rapidly, but the level
>of control and detection seems pretty rudimentary.
>  
>

I've set up both types (layer 2 NIC failover and layer  3 routed 
failover) at various times for various applications.
Layer 2 failover is generally faster & easier to deploy.  Harder to 
manage - you can't tell things like someone putting the standby 
interface into the wrong vlan until its too late.
Layer 3  failover is more reliable (you can always be pinging all 3 IP 
addresses (2 NICs, one VIP), so you know if anything is amiss; the 2 
NICs can not only be on different switches, but on different subnets) 
but failover will not be as fast (but still fast enough if you tune your 
OSPF hello time and hold times down).  However, having 200 OSPF stub 
networks, plus 200 adjacencies (or 400 if on the same subnet), for 200 
servers, gets to be non-trivial in terms of OSPF packet overhead and CPU 
load. (Or at least in terms of humans looking at routing tables or OSPF 
databases for debugging.)
So currently I do layer 2 if there are lots of servers in the cluster; 
layer 3 if only one or two.