[c-nsp] unicast storm

Thu Apr 19 04:32:04 EDT 2012

On (2012-04-19 08:26 +0100), Phil Mayers wrote:

>  1. Cause the host to emit traffic
>  2. Lower the ARP time to < FDB timeout

ACK. 4h is brutally long as IOS default in IOS, some other options:

FreeBSD:
> sysctl net.link.ether.inet.max_age
net.link.ether.inet.max_age: 1200
Linux:
% sysctl net.ipv4.neigh.eth0.gc_stale_time
net.ipv4.neigh.eth0.gc_stale_time = 60
OSX: (not sure if it actually uses/honors this)
% sysctl net.link.ether.inet.max_age
net.link.ether.inet.max_age: 1200

Windows appears to have had 2min but has since decreased to random sub
minute. So the syslog server would need to be not linux and not windows to
cause problems.
JunOS seems to have 1200s ish, but randomized bit (after clear arp, I'm
seeing 1100s through 1500s)
I would encourage BSD core team to change the default to below 5min. If
both windows and linux can live at 1min or less, I think it's fairly proven
that it works in real-life. Hopefully fix would propagate to JunOS and OSX
too.

One less common and tricky storm can occur if you have L2 metroring to
which you've attached two PE routers. When some CPE dies in the metro ring,
ARP will of course remain there for 4h. So PE will happily send frame to
metro, where it'll get flooded to all ports.

Now if the CPE which went down was redundantly terminated to both PE, the
backup PE will receive it, and as it sees best path via BGP (instead of
local) it'll send it over core back to the primary PE, causing loop.

Obviously the DMAC isn't for the backup PE, so this situation will only
arise if you are running your interface in promisc mode. Not all routers
have VLAN specific promisc mode, so configuring one L2VPN (xconnect), might
cause all vLANs to receive all DMACs.

-- 
  ++ytti