[c-nsp] MAC addresses unlearned with HSRP

Wed Jun 7 13:12:24 EDT 2006

"Vincent De Keyzer" <vincent at dekeyzer.net> wrote:

> Problem with this set-up is that R2 might very well send out packets for 
> the
> H via S2, which at some point will have timed out the MAC address of H, 
> and
> will hence have to flood all of its ports (like a good switch is supposed 
> to
> do). This will result in increased and useless traffic on the ports of S2.

I have a network with thousands of hosts on a single VLAN with 30+ /24's 
using the "private vlans" feature.  Thus, I am especially exposed to this 
issue.  Here is how I have dealt with it.

First, in my R1/S1 and R2/S2 (6500 switches with sup720, so my router and 
switch are a single unit), I upped the mac address aging time to 86400 
seconds.  The idea here is to be significantly longer than the ARP timeout. 
Thus, in theory, R2 will regularly re-arp much more often than the timeout 
hits.

However, any time a port flaps anywhere in the network, a topology change is 
flooded out to all switches on that VLAN, and then then drop the remaining 
age for all entries in that VLAN to 6 seconds (or immediately delete in 
rapid-STP!).  So, be very careful that you always use the "portfast" command 
on ALL edge ports (anything that isn't another switch/hub).  Never miss one. 
Once this is done, only your backbone switch-to-switch links lack portfast 
and generate topology changes.  Hopefully these don't flap very often.  :)

Third, I switched to GLBP with the round-robin distribution.  I haven't done 
careful testing, but in my experience most hosts have very low ARP timeouts. 
When we deploy PIXes, we always drop the ARP timeout down to 15 minutes so 
that they are similarly low.  My thinking here is that if a host flips back 
and forth between the two gateways several times an hour, it is very likely 
to stay active in the MAC tables.

Anyway, that is what I did and it seems to work reasonably well.  When I add 
a switch or otherwise flap a backbone link, there is a bit of flooding but 
it clears up quickly.  So, this isn't a perfect fix, but it does a 
reasonably good job of minimizing the issue.

I have seen people play with OSPF route maps and HSRP priorities so that (in 
theory) the OSPF path to the prioritized HSRP router is always preferred.  I 
found this configuration to be too complex and messy with my many subnets 
and many interfaces.  Plus, it skews the load balancing of your uplinks if 
the HSRP is not evenly balanced (as with my large VLAN that had 30+ /24's on 
a single HSRP instance, so all on the same side).  With my current setup, 
even traffic to a single host IP is load balanced (per-flow) across my two 
uplink paths.