[c-nsp] c6k msfc - hsrp flapping

Thu May 18 01:59:23 EDT 2006

Hi Yuri,

I appreciate the help.

> HSRP will change state when it misses three consecutive hellos. It's
possible
> you have an underlying layer 1 or layer 2 problem. Can you tell if the
> %STANDBY-6-STATECHANGE messages are occurring for certain vlans only?

One of the first things we tried was making msfc1 the hsrp primary for all
vlans.  So now, when we have an hsrp flapping incident on msfc2, it happens
on every vlan with hsrp configured on it except for one - and that one
doesn't have multicast enabled on it.

The only time we have an hsrp state change on msfc1 is when somebody does
something to cause it - like shutting down a trunk port.

> The following doc has some good troubleshooting steps that can be taken:
>
http://www.cisco.com/en/US/partner/tech/tk648/tk362/technologies_tech_note09186a0080094afd.shtml#t1

The closest match is the section on HSRP Intermittent State Changes on
Multicast Stub Network.  But I'm pretty sure that isn't that problem
because the non-dr router is msfc1.  Msfc2 is the router having all the
problems with hsrp flapping.

Another thing is that we did have that problem a few years ago with
multicast traffic hitting the non-dr router and it was pretty obvious that
the router was in trouble.  CPU busy would go over 90% and there were
largee numbers of input queue drops.  We've also had problems with the old
version of Ghost that would set the multicast TTL to the exact value needed
to get the traffic to the destination machine.  Again, it was real obvious
the router was having problems.  But the worst case cpu busy I've seen on
msfc2 is 49% and the worst-case input queue drop counter is 2728 flushes in
a bit over 36 hours.

I'm not seeing large numbers of drops on the switch ports either.
Everything I've looked at has me thinking that the packets are being
dropped somewhere inside the switch.  I'm guessing the drops are happening
at the interface between the switch and the MSFC but I don't know what to
look at to see if that's really the case or not.

In any case, it does seem to be multicast related.
We've got two T3 links connecting this site to the rest of the network.  We
blocked multicast going across the T3s and there was no more hsrp flapping.
About 9.5 hours after blocking multicast on the T3s I tried allowing it
again.  20 minutes later we had another instance of hsrp flapping on
msfc2..

Thanks,
Lee