Re: [nsp] 6500 stops ARPing

From: Matt Buford (matt@overloaded.net)
Date: Wed Jun 05 2002 - 14:36:10 EDT

Next message: Tejal Shah: "[nsp] Minimum prefix for BGP"
Previous message: Josh Duffek: "Re: stange behavior of cisco router"
In reply to: Matt Buford: "Re: [nsp] 6500 stops ARPing"
Next in thread: Matt Buford: "Re: [nsp] 6500 stops ARPing"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Aha, found something:

#sh mls cef | i drop

6 66.24.203.64 255.255.255.255 drop
37921 66.70.12.0 255.255.255.0 drop
52350 216.109.150.0 255.255.255.0 drop
52486 64.27.110.0 255.255.255.0 drop
52594 216.109.149.0 255.255.255.0 drop
52721 216.109.148.0 255.255.255.0 drop
53678 216.109.142.0 255.255.255.0 drop
(and so on for roughly 30 entries)

A lot of directly connected networks are showing up as drop. That certainly
isn't right. No wonder they won't arp. At least I see the invalid entries
now so I can clearly show them to Cisco. I've got a case open...

Checking on the other 6509 that is part of the redundant pair that is
configured almost exactly the same, there are only a few entries marked
drop, which are all static null0 routes so the problem doesn't seem to
happen there. Strange.

----- Original Message -----
From: "Matt Buford" <matt@overloaded.net>
To: "Steve Francis" <steve@expertcity.com>
Cc: <cisco-nsp@puck.nether.net>
Sent: Friday, May 31, 2002 3:00 PM
Subject: Re: [nsp] 6500 stops ARPing

> Good info in that URL. Thanks.
>
> In "Remarks and Conclusions" #1, it talks about a "frc drop" (force drop)
> entry being entered into the adjacency table while the arp is in the
> incomplete state. I have started running pings to nonexistant hosts, and
I
> see the incomplete arp in the arp table, however I am not able to see any
> kind of entry for the host in the adjacency table while the arp is
> incomplete. Is there some way to see these "frc drop" entries?
>
> A guess is that perhaps somehow I'm getting force drop adjacency entries
> left behind after an incomplete arp times out and is removed from the arp
> table. These force drop adjacency entries may then be orphaned, with
> nothing ever coming along to remove them, and thus the MSFC never sees any
> packets destined for this host, so it never feels the need to generate any
> further arps. However, I need to find a way to show these table entries
to
> really know.
>
> ----- Original Message -----
> From: "Steve Francis" <steve@expertcity.com>
> To: "Matt Buford" <matt@overloaded.net>
> Cc: <cisco-nsp@puck.nether.net>
> Sent: Friday, May 31, 2002 12:32 PM
> Subject: Re: [nsp] 6500 stops ARPing
>
>
> > Sounds like a CEF issue.
> >
> > I'd try the stuff in
http://www.cisco.com/warp/customer/473/128.html#case1
> > and see if the mls on the switch and the cef adjacencies agree.
> >
> >
> > Matt Buford wrote:
> >
> > >Lately I've been having a problem with some 6509s that doesn't seem to
> make
> > >any sense, and I was wondering if others had any ideas or have run into
> this
> > >before. I spent the last few hours searching the web without finding
any
> > >discussion that sounded like the same problem I'm seeing.
> > >
> > >I have some 6509s running Supervisor IOS 12.1(11b)E. The 6509s are in
> pairs
> > >with VLAN interfaces, and are running HSRP on these interfaces. These
> VLANs
> > >feed down to smaller switches where the host connects. The smaller
> switches
> > >are connected to both of the 6509s in the pair. The smaller switches
> > >consist of cisco models plus some HPs. ARP table sizes of 15,000 to
> 30,000
> > >are common.
> > >
> > >Sometimes a router seems to just refuse to try to ARP certain IPs.
> > >Traceroutes to the broken IPs show the last hop as the 6509. The
router
> > >shows no ARP entry. The 6509 does have the host's mac address in the
> > >forwarding table. "debug arp" shows nothing matching the broken IP
when
> > >left running for 30 minutes while a continuous ping to a broken IP runs
> from
> > >another host. The debug does show other addresses generating ARPs (and
> > >getting responses) so it isn't like ARP completely stops working. When
a
> > >few IPs break but most things are working, a "clear arp" will result in
> only
> > >a small percentage of the ARPs returning, with the majority of the IPs
> > >suddenly being broken and not coming back (at least not anytime soon).
> > >
> > >Here's the strange part. If I log onto the affected 6509 that has no
ARP
> > >entry, all I have to do is ping the broken IP from the management
> interface.
> > >This generates an arp, and instantly the ping I left running from a
> remote
> > >host to a broken IP starts responding. Another way to fix it is to do
> > >"shut" then "no shut" on the affected VLAN interface. This seems to
> clear
> > >up something on the interface, as suddenly all the IPs on that
interface
> get
> > >ARP entries. A few hours ago I had identified roughly 10 IPs (all on
the
> > >same VLAN interface) that were having this problem. I decided to try
> "clear
> > >arp" to see if that would reset things and correct the problem. After
> about
> > >5 minutes, the ARP table was only back up to about 5,900 entries
compared
> to
> > >the normal 15,000 to 20,000 and there were huge numbers of IPs that
were
> not
> > >unreachable and not generating ARPs. After 10 minutes, the table was
> only
> > >up to about 6,000. I pasted in "int vlanXXX", "shut", "no shut"
commands
> > >for every VLAN interface, and within 1 minute after that the ARP table
> was
> > >up to a reasonable 15,000 entries and everything became reachable.
> > >
> > >The fact that a ping from the management interface fixes it along with
> the
> > >lack of any ARP attempts even showing up in the debug arp output leads
me
> to
> > >believe the fault is clearly within the 6509, and not any other part of
> the
> > >network. It doesn't seem to be throttling or broadcast storm control
as
> > >evidenced by the fact that the ARPs *CAN* be learned very quickly if
you
> can
> > >just get the 6509 to generate the arps (by the shut and no shut). I'm
> > >guessing perhaps the packets destined for the broken IPs are
> (incorrectly)
> > >being switched in hardware, and thus never being seen by the CPU and
> never
> > >generating ARPs.
> > >
> > >This problem seems to happen regularly now. I'm open to ideas,
> suggestions,
> > >and show/debug to collect next time this happens...
> > >
> >
> >
>

Next message: Tejal Shah: "[nsp] Minimum prefix for BGP"
Previous message: Josh Duffek: "Re: stange behavior of cisco router"
In reply to: Matt Buford: "Re: [nsp] 6500 stops ARPing"
Next in thread: Matt Buford: "Re: [nsp] 6500 stops ARPing"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2b29 : Sun Aug 04 2002 - 04:13:46 EDT