Re: [nsp] 6500 stops ARPing

From: Matt Buford (matt@overloaded.net)
Date: Fri May 31 2002 - 15:28:35 EDT


How about if I'm not running CatOS? There seems to be a similar commands that passes the request off to the supervisor, but I'm not seeing any entries for the incomplete arps:

aggr2.ewr#sh ip arp 64.27.90.21
Protocol Address Age (min) Hardware Addr Type Interface
Internet 64.27.90.21 0 Incomplete ARPA
aggr2.ewr#sh mls cef ip 64.27.90.21

aggr2.ewr-sp#
Index Prefix Mask Adjacency
aggr2.ewr#sh mls cef adj | i 64.27.90.21

aggr2.ewr-sp#
aggr2.ewr#
  ----- Original Message -----
  From: Steve Francis
  To: Matt Buford
  Cc: Steve Francis ; cisco-nsp@puck.nether.net
  Sent: Friday, May 31, 2002 3:10 PM
  Subject: Re: [nsp] 6500 stops ARPing

  on the cat:

   sho mls entry cef ip 10.16.1.155/32 ad
  Mod: 15
  Destination-IP: 10.16.1.155 Destination-Mask: 255.255.255.255
  FIB-Type: resolved

  AdjType NextHop-IP NextHop-Mac Vlan Encp Tx-Packets Tx-Octets
  -------- --------------- ----------------- ---- ---- ------------ -------------
  frc drop 10.16.1.155

  (I was pinging a non-existent host on a directly connected net from the MSFC )

  Matt Buford wrote:

Good info in that URL. Thanks.In "Remarks and Conclusions" #1, it talks about a "frc drop" (force drop)entry being entered into the adjacency table while the arp is in theincomplete state. I have started running pings to nonexistant hosts, and Isee the incomplete arp in the arp table, however I am not able to see anykind of entry for the host in the adjacency table while the arp isincomplete. Is there some way to see these "frc drop" entries?A guess is that perhaps somehow I'm getting force drop adjacency entriesleft behind after an incomplete arp times out and is removed from the arptable. These force drop adjacency entries may then be orphaned, withnothing ever coming along to remove them, and thus the MSFC never sees anypackets destined for this host, so it never feels the need to generate anyfurther arps. However, I need to find a way to show these table entries toreally know.----- Or
iginal Message -----From: "Steve Francis" <steve@expertcity.com>To: "Matt Buford" <matt@overloaded.net>Cc: <cisco-nsp@puck.nether.net>Sent: Friday, May 31, 2002 12:32 PMSubject: Re: [nsp] 6500 stops ARPing
Sounds like a CEF issue.I'd try the stuff in http://www.cisco.com/warp/customer/473/128.html#case1and see if the mls on the switch and the cef adjacencies agree.Matt Buford wrote:
Lately I've been having a problem with some 6509s that doesn't seem to
make
any sense, and I was wondering if others had any ideas or have run into
this
before. I spent the last few hours searching the web without finding anydiscussion that sounded like the same problem I'm seeing.I have some 6509s running Supervisor IOS 12.1(11b)E. The 6509s are in
pairs
with VLAN interfaces, and are running HSRP on these interfaces. These
VLANs
feed down to smaller switches where the host connects. The smaller
switches
are connected to both of the 6509s in the pair. The smaller switchesconsist of cisco models plus some HPs. ARP table sizes of 15,000 to
30,000
are common.Sometimes a router seems to just refuse to try to ARP certain IPs.Traceroutes to the broken IPs show the last hop as the 6509. The routershows no ARP entry. The 6509 does have the host's mac address in theforwarding table. "debug arp" shows nothing matching the broken IP whenleft running for 30 minutes while a continuous ping to a broken IP runs
from
another host. The debug does show other addresses generating ARPs (andgetting responses) so it isn't like ARP completely stops working. When afew IPs break but most things are working, a "clear arp" will result in
only
a small percentage of the ARPs returning, with the majority of the IPssuddenly being broken and not coming back (at least not anytime soon).Here's the strange part. If I log onto the affected 6509 that has no ARPentry, all I have to do is ping the broken IP from the management
interface.
This generates an arp, and instantly the ping I left running from a
remote
host to a broken IP starts responding. Another way to fix it is to do"shut" then "no shut" on the affected VLAN interface. This seems to
clear
up something on the interface, as suddenly all the IPs on that interface
get
ARP entries. A few hours ago I had identified roughly 10 IPs (all on thesame VLAN interface) that were having this problem. I decided to try
"clear
arp" to see if that would reset things and correct the problem. After
about
5 minutes, the ARP table was only back up to about 5,900 entries compared
to
the normal 15,000 to 20,000 and there were huge numbers of IPs that were
not
unreachable and not generating ARPs. After 10 minutes, the table was
only
up to about 6,000. I pasted in "int vlanXXX", "shut", "no shut" commandsfor every VLAN interface, and within 1 minute after that the ARP table
was
up to a reasonable 15,000 entries and everything became reachable.The fact that a ping from the management interface fixes it along with
the
lack of any ARP attempts even showing up in the debug arp output leads me
to
believe the fault is clearly within the 6509, and not any other part of
the
network. It doesn't seem to be throttling or broadcast storm control asevidenced by the fact that the ARPs *CAN* be learned very quickly if you
can
just get the 6509 to generate the arps (by the shut and no shut). I'mguessing perhaps the packets destined for the broken IPs are
(incorrectly)
being switched in hardware, and thus never being seen by the CPU and
never
generating ARPs.This problem seems to happen regularly now. I'm open to ideas,
suggestions,
and show/debug to collect next time this happens...



This archive was generated by hypermail 2b29 : Sun Aug 04 2002 - 04:13:46 EDT