[c-nsp] c6k msfc - hsrp flapping

Yuri Lukin lists at swaggi.com
Thu May 18 16:11:10 EDT 2006


See replies inline....

lee.e.rian at census.gov wrote ..
> Hi Yuri,
> 
> I appreciate the help.
>
> One of the first things we tried was making msfc1 the hsrp primary for
> all
> vlans.  So now, when we have an hsrp flapping incident on msfc2, it happens
> on every vlan with hsrp configured on it except for one - and that one
> doesn't have multicast enabled on it.
> 

This makes me think the problem is multicast related, possibly causing HSRP hellos to time out. 

> The only time we have an hsrp state change on msfc1 is when somebody does
> something to cause it - like shutting down a trunk port.
> 

Probably causing msfc1 to lose its standby router causing another election to take place. 

> The closest match is the section on HSRP Intermittent State Changes on
> Multicast Stub Network.  But I'm pretty sure that isn't that problem
> because the non-dr router is msfc1.  Msfc2 is the router having all the
> problems with hsrp flapping.

Think about it from another perspective: what if too much multicast traffic
hitting your DR (msfc2) is causing HSRP issues?

> Another thing is that we did have that problem a few years ago with
> multicast traffic hitting the non-dr router and it was pretty obvious that
> the router was in trouble.  CPU busy would go over 90% and there were
> largee numbers of input queue drops.  We've also had problems with the
> old
> version of Ghost that would set the multicast TTL to the exact value needed
> to get the traffic to the destination machine.  Again, it was real obvious
> the router was having problems.  But the worst case cpu busy I've seen
> on
> msfc2 is 49% and the worst-case input queue drop counter is 2728 flushes
> in
> a bit over 36 hours.
> 
> I'm not seeing large numbers of drops on the switch ports either.
> Everything I've looked at has me thinking that the packets are being
> dropped somewhere inside the switch.  I'm guessing the drops are happening
> at the interface between the switch and the MSFC but I don't know what
> to
> look at to see if that's really the case or not.
> 

Are these switches hybrid or native? How are they interconnected?
Can you check the various counters to see which interfaces are
showing excessive multicast traffic? Also, I think HSRP uses 224.0.0.2
for its hello messages. 

Perhaps you can "debug standby" as the problem is happening and paste 
the output to the list? See the following on how to debug hsrp:
http://www.cisco.com/en/US/partner/tech/tk648/tk362/technologies_tech_note09186a0080094a91.shtml#hsrpdebug


> In any case, it does seem to be multicast related.
> We've got two T3 links connecting this site to the rest of the network.
> We
> blocked multicast going across the T3s and there was no more hsrp flapping.
> About 9.5 hours after blocking multicast on the T3s I tried allowing it
> again.  20 minutes later we had another instance of hsrp flapping on
> msfc2..
> 
> Thanks,
> Lee

I agree and so I would focus on the multicast traffic between these two switches. 
Hope this helps...

-Yuri


More information about the cisco-nsp mailing list