[c-nsp] Useful HSRP feature additions WAS: Rate limiting questions

Christopher E. Brown chris.brown at acsalaska.net
Sun Oct 28 13:55:08 EDT 2007


Dale Shaw wrote:
> Hi all,
> 
> On 10/28/07, Christopher E. Brown <chris.brown at acsalaska.net> wrote:
>> 5 min later, the MAC entry times out, but the ARP entries are there for
>> another 4hr 55min...  Now we have our layer2 network with no target for
>> that MAC and flooding everywhere.
> 
> (3hr 55min?)

	Yes, adjacent keys and all that.

> I was tempted to start a new thread re: this, but since it's on topic
> and people-who-know are reading, I decided not to..
> 
> There is conflicting advice about the 'correct' fix in this scenario.
> The options appear to be:
> 
> 1. reduce ARP aging timer on a per-interface basis from (4hr) default
> to something less than the default MAC aging timer (5 minutes)
> 
> 2. increase MAC aging timer globally or on a per-VLAN basis from
> (5min) default to something equal-to or less-than the default ARP
> aging timer (4 hours)
> 
> Even Cisco's web site has separate documents that provide conflicting
> advice. A search of the list archives reveals differing views. A
> recent post in this (or the related) thread suggests low ARP aging
> timers are bad things.
> 
> Is there an authoritative guide 'out there', or can someone provide a
> solution and back it up with the rationale? Perhaps there are pros and
> cons to both approaches, but I haven't been able to find these
> documented anywhere.


The problem is that both of these /solutions/ work, but both have issues
depending on the network.  These issues have grown over time as traffic
volume and network scale has increased.  Back when virtual MAC use was
introduced there were very few networks that would be bitten by this.


The simple thing to do (when possible) is to lower the arp timeout of
the clients.

One of the common problems with this is that many devices/systems do not
allow the ARP timeout to be changed.  Others are just fairly stupid
(management interfaces with limited CPU), and take issue with more than
a certain number of ARPs/interval.  If course these same "stupid"
devices generally have issues with flooding as well, destination
filtration (is the packet for me?) is often does in software, deliver
traffic to them, and flood traffic from their 50 neighbors and they can
fall over.



The MAC aging fix has it's own issues.  It only scales well in small
switch groups with known MAC loads.  As soon as you start dealing with
more switches, or a large number of MACS...  This one used to be easy
fix, but with larger and larger L2 networks in use these days, it can be
an issue.  This one is totally useless to me, most of my flooding issues
are happening in switching areas where both HSRP redundant feeds and
PPPoE are being delivered.  If I have 16,000 - 20,000 MACS in the table
with a 4hr timeout and approx 35,000 MACS total I *cannot* lower the
timers.  (And this issue kicks in at *MUCH* lower levels when you are
talking about 3550 and smaller switches.)



The key point is that HSRP has a minor breakage.  This issue was
introduced when HSRP started using virtual MACS instead of the burnt in
interface MACs.  At the time that virtual MAC use was introduced (and
don't get me wrong, this was a big improvement), this breakage was seen
as very minor.  Couple of 3640 or 7200 providing service to a small
switch stack and some server groups, no big deal.

But, when you start talking larger switching areas and tens or even
runhreds of HSRP groups across dozens of physically diverse routers...
(Many customers can't/won't L3 agg and speak OSPF or BGP in very yet
want a redundant feed into a city wide VLAN)  It is very easy for a
flooding problem to grow into hundreds of Mbit, eating capacity and even
causing issuesw with clients that have trouble dealing with an extra
500pps of flood traffic in a VLAN.


The "simple low overhead" fix would be to have the HSRP master send a
*single* extra packet every X seconds.  Just one gratuitous ARP every
200 seconds would solve the whole issue.


-- 
------------------------------------------------------------------------
Christopher E. Brown   <chris.brown at acsalaska.net>   desk (907) 550-8393
                                                     cell (907) 632-8492
IP Engineer - ACS
------------------------------------------------------------------------


More information about the cisco-nsp mailing list