[c-nsp] Ethernet Freezeup

Andre Beck cisco-nsp at ibh.net
Mon Apr 7 09:28:12 EDT 2008


Hi,

(directly to Ed and Cc to list due to the original beeing quite old,
feel free to reply to the list only)

On Sat, Jul 15, 2006 at 05:23:20PM -0400, Ed Ravin wrote:
> A few times on this list, people have discussed how a Cisco 1700 series
> router can suddenly "freeze up" on its main Ethernet interface.  The
> problem as I've observed it hits routers that have a single Ethernet
> interface (and no other interfaces in use).  The symptom is that the router
> no longer receives traffic on the Ethernet - it still transmits ARP requests
> and retries of routing protocol packets, but nothing is received.  Getting to
> the console of the router and issuing "clear int faste0" always fixes the
> problem.

Sadly I've came to know this bug in the last months as well.
 
> We've had this problem every 1-2 months on a 1720 in the field, which
> was tolerable since the router didn't have that many users on it, but now
> it has started happening on one of our core 7206 routers.  We used this
> same router in a similar configuration for years in a different location
> with no problems, but back then it had multiple interfaces (a DS3 and the
> FastEthernet).

I was seeing this with a 7206/IO-FE that *has* other interfaces, though
what seemed to trigger it there was indeed single-armed routed traffic.
 
> The freezeups have happened on various IOS 12.1 versions on the 1720,
> and on 12.3.17 on the 7206 (non-VXR, NPE-225).

After the effect hitting us regularly (mostly in the middle of the night
when backups ran) I've finally done something I hoped would rule out any
hardware issues:

1) Placed a new 7204VXR chassis next to the problem box (7206);
2) Plugged a NPE225 and IO-FE into the chassis (different from the modules
   in the 7204VXR) and took over the configuration and IOS;
3) Powered off the old box and took over the required PAs (one 8BRI,
   one MC-8E1 and one FE-TX) and cabling;
4) Booted the new box.

Initially all seemed well. Even the next backup ran without a problem.
But the next day, without any excessive traffic beeing there to trigger
it like it did before, the exact same thing happened to the new box, even
though it is another chassis, another NPE225 and another IO-FE. It hit
the next time today, again without heavy trigger traffic, so the situation
is in a way worse then before - now it seems to hit completely at random.

For us, the issues actually seemed to start when the old NPE200 in the
7206 was replaced with a NPE225. Given that they have quite a different
architecture, I'm pondering whether what we see is actually a software
problem that hits NPE225s in general when used heavily one-armed with
an IO-FE. I've seen it with 12.4 mainline and with the 12.2(31)SB train,
so it might have been introduced after 12.2S - I remember the boxes with
NPE225 beeing rock solid when running 12.2(25)S - never saw this issue
creep up before. Now I have it on two chassis...

BTW, I'm seeing a memory leak in 12.2(31)SB (up to SB11) in SNMP, I can't
tell if it is related. I've also noticed that RTTs of packets that go
through the box in question were distorted for several seconds before
the interface actually froze - the effect seems to announce itself. This
would could mean something is badly hitting the CPU, but it's hard to
tell what it is after the fact.

> Any thoughts about what might be going on in the innards of the IOS,
> and how to troubleshoot or prevent recurrence?

Ed, did you find a solution (other than going to a NPE-G1/2 or NPE-400)
or workaround? Anyone else here on c-nsp still using these good old
chassis and having advise?

TIA,
Andre.
-- 
   Real men don't make backups of their mail. They just send it out
    on the Internet and let the secret services do the hard work.

-> Andre Beck    +++ ABP-RIPE +++      IBH IT-Service GmbH, Dresden <-


More information about the cisco-nsp mailing list