Re: [nsp] 6500 stops ARPing

From: Matt Buford (matt@overloaded.net)
Date: Fri May 31 2002 - 15:00:59 EDT


Good info in that URL. Thanks.

In "Remarks and Conclusions" #1, it talks about a "frc drop" (force drop)
entry being entered into the adjacency table while the arp is in the
incomplete state. I have started running pings to nonexistant hosts, and I
see the incomplete arp in the arp table, however I am not able to see any
kind of entry for the host in the adjacency table while the arp is
incomplete. Is there some way to see these "frc drop" entries?

A guess is that perhaps somehow I'm getting force drop adjacency entries
left behind after an incomplete arp times out and is removed from the arp
table. These force drop adjacency entries may then be orphaned, with
nothing ever coming along to remove them, and thus the MSFC never sees any
packets destined for this host, so it never feels the need to generate any
further arps. However, I need to find a way to show these table entries to
really know.

----- Original Message -----
From: "Steve Francis" <steve@expertcity.com>
To: "Matt Buford" <matt@overloaded.net>
Cc: <cisco-nsp@puck.nether.net>
Sent: Friday, May 31, 2002 12:32 PM
Subject: Re: [nsp] 6500 stops ARPing

> Sounds like a CEF issue.
>
> I'd try the stuff in http://www.cisco.com/warp/customer/473/128.html#case1
> and see if the mls on the switch and the cef adjacencies agree.
>
>
> Matt Buford wrote:
>
> >Lately I've been having a problem with some 6509s that doesn't seem to
make
> >any sense, and I was wondering if others had any ideas or have run into
this
> >before. I spent the last few hours searching the web without finding any
> >discussion that sounded like the same problem I'm seeing.
> >
> >I have some 6509s running Supervisor IOS 12.1(11b)E. The 6509s are in
pairs
> >with VLAN interfaces, and are running HSRP on these interfaces. These
VLANs
> >feed down to smaller switches where the host connects. The smaller
switches
> >are connected to both of the 6509s in the pair. The smaller switches
> >consist of cisco models plus some HPs. ARP table sizes of 15,000 to
30,000
> >are common.
> >
> >Sometimes a router seems to just refuse to try to ARP certain IPs.
> >Traceroutes to the broken IPs show the last hop as the 6509. The router
> >shows no ARP entry. The 6509 does have the host's mac address in the
> >forwarding table. "debug arp" shows nothing matching the broken IP when
> >left running for 30 minutes while a continuous ping to a broken IP runs
from
> >another host. The debug does show other addresses generating ARPs (and
> >getting responses) so it isn't like ARP completely stops working. When a
> >few IPs break but most things are working, a "clear arp" will result in
only
> >a small percentage of the ARPs returning, with the majority of the IPs
> >suddenly being broken and not coming back (at least not anytime soon).
> >
> >Here's the strange part. If I log onto the affected 6509 that has no ARP
> >entry, all I have to do is ping the broken IP from the management
interface.
> >This generates an arp, and instantly the ping I left running from a
remote
> >host to a broken IP starts responding. Another way to fix it is to do
> >"shut" then "no shut" on the affected VLAN interface. This seems to
clear
> >up something on the interface, as suddenly all the IPs on that interface
get
> >ARP entries. A few hours ago I had identified roughly 10 IPs (all on the
> >same VLAN interface) that were having this problem. I decided to try
"clear
> >arp" to see if that would reset things and correct the problem. After
about
> >5 minutes, the ARP table was only back up to about 5,900 entries compared
to
> >the normal 15,000 to 20,000 and there were huge numbers of IPs that were
not
> >unreachable and not generating ARPs. After 10 minutes, the table was
only
> >up to about 6,000. I pasted in "int vlanXXX", "shut", "no shut" commands
> >for every VLAN interface, and within 1 minute after that the ARP table
was
> >up to a reasonable 15,000 entries and everything became reachable.
> >
> >The fact that a ping from the management interface fixes it along with
the
> >lack of any ARP attempts even showing up in the debug arp output leads me
to
> >believe the fault is clearly within the 6509, and not any other part of
the
> >network. It doesn't seem to be throttling or broadcast storm control as
> >evidenced by the fact that the ARPs *CAN* be learned very quickly if you
can
> >just get the 6509 to generate the arps (by the shut and no shut). I'm
> >guessing perhaps the packets destined for the broken IPs are
(incorrectly)
> >being switched in hardware, and thus never being seen by the CPU and
never
> >generating ARPs.
> >
> >This problem seems to happen regularly now. I'm open to ideas,
suggestions,
> >and show/debug to collect next time this happens...
> >
>
>



This archive was generated by hypermail 2b29 : Sun Aug 04 2002 - 04:11:58 EDT