Re: [nsp] 6500 stops ARPing

From: Steve Francis (steve@expertcity.com)
Date: Fri May 31 2002 - 12:32:23 EDT


Sounds like a CEF issue.

I'd try the stuff in http://www.cisco.com/warp/customer/473/128.html#case1
and see if the mls on the switch and the cef adjacencies agree.

Matt Buford wrote:

>Lately I've been having a problem with some 6509s that doesn't seem to make
>any sense, and I was wondering if others had any ideas or have run into this
>before. I spent the last few hours searching the web without finding any
>discussion that sounded like the same problem I'm seeing.
>
>I have some 6509s running Supervisor IOS 12.1(11b)E. The 6509s are in pairs
>with VLAN interfaces, and are running HSRP on these interfaces. These VLANs
>feed down to smaller switches where the host connects. The smaller switches
>are connected to both of the 6509s in the pair. The smaller switches
>consist of cisco models plus some HPs. ARP table sizes of 15,000 to 30,000
>are common.
>
>Sometimes a router seems to just refuse to try to ARP certain IPs.
>Traceroutes to the broken IPs show the last hop as the 6509. The router
>shows no ARP entry. The 6509 does have the host's mac address in the
>forwarding table. "debug arp" shows nothing matching the broken IP when
>left running for 30 minutes while a continuous ping to a broken IP runs from
>another host. The debug does show other addresses generating ARPs (and
>getting responses) so it isn't like ARP completely stops working. When a
>few IPs break but most things are working, a "clear arp" will result in only
>a small percentage of the ARPs returning, with the majority of the IPs
>suddenly being broken and not coming back (at least not anytime soon).
>
>Here's the strange part. If I log onto the affected 6509 that has no ARP
>entry, all I have to do is ping the broken IP from the management interface.
>This generates an arp, and instantly the ping I left running from a remote
>host to a broken IP starts responding. Another way to fix it is to do
>"shut" then "no shut" on the affected VLAN interface. This seems to clear
>up something on the interface, as suddenly all the IPs on that interface get
>ARP entries. A few hours ago I had identified roughly 10 IPs (all on the
>same VLAN interface) that were having this problem. I decided to try "clear
>arp" to see if that would reset things and correct the problem. After about
>5 minutes, the ARP table was only back up to about 5,900 entries compared to
>the normal 15,000 to 20,000 and there were huge numbers of IPs that were not
>unreachable and not generating ARPs. After 10 minutes, the table was only
>up to about 6,000. I pasted in "int vlanXXX", "shut", "no shut" commands
>for every VLAN interface, and within 1 minute after that the ARP table was
>up to a reasonable 15,000 entries and everything became reachable.
>
>The fact that a ping from the management interface fixes it along with the
>lack of any ARP attempts even showing up in the debug arp output leads me to
>believe the fault is clearly within the 6509, and not any other part of the
>network. It doesn't seem to be throttling or broadcast storm control as
>evidenced by the fact that the ARPs *CAN* be learned very quickly if you can
>just get the 6509 to generate the arps (by the shut and no shut). I'm
>guessing perhaps the packets destined for the broken IPs are (incorrectly)
>being switched in hardware, and thus never being seen by the CPU and never
>generating ARPs.
>
>This problem seems to happen regularly now. I'm open to ideas, suggestions,
>and show/debug to collect next time this happens...
>



This archive was generated by hypermail 2b29 : Sun Aug 04 2002 - 04:13:46 EDT