Re: [nsp] Extremely flakey 7206VXR

From: Brian Vowell (brian@digitalix.net)
Date: Thu Nov 22 2001 - 18:10:29 EST


We had the same problem when one of our PA's were not inserted all the way. We
powered ours down, and then pulled and reinserted each PA all the way, and then
things were fine. Never figured out which one it was, but the problem went away
and never came back.

--bv

Blaz Zupan wrote:

> We're having extreme trouble with a 7206VXR and are quickly running out of
> ideas. We have an open ticket with Cisco, but they're not particulary helpful
> either.
>
> Here's the history: 7206VXR NPE-400 was running 12.0(16)S and 12.0(18)S mostly
> fine for a relatively long time (two or three months). Then suddenly it
> started rebooting twice a day (almost exactly every twelve hours) with various
> erros (bus error, software forced crash, etc.). Tried 12.0(19)S and 12.2(5)
> with no visible fix. We then replaced the whole box except the PA's. One day
> later, machine reboots with memory parity error, a couple of hours later it
> starts rebooting in a loop and never comes up again. Replaced NPE-400 in the
> new box with a NPE-225. Machine is mostly stable again for about two days (so
> it seems the new NPE-400 was broken indeed), but after two days the weird
> reboots started again, first a bus error, then a couple of software forced
> crashes. Considering that most of the reboots were due to software forced
> crashes, Cisco suggested we turn on exception handling and create a core dump
> file. Unfortunately after three software forced crashes, no core dump was
> generated (although it is configured correctly). So we went to 12.2(6) which
> was released recently, today we again have the box rebooting 5 times in a row
> with bus errors and software forced crashes.
>
> We're monitoring the temperature in the facility, everything looks normal.
> Also "show environment all" shows no anomalies in the voltage readings.
>
> We have enough of Cisco and will be replacing this box with a Juniper M5
> anyway, but until it arrives we'll have to do *something* as this box is
> extremely mission critical. We'll be replacing the NPE once again (with a new
> NPE-400) and putting in another UPS in addition to the one the whole facility
> is on. Also, we'll probably be replacing the PA's as well. But other than
> that, I'm completely out of ideas.
>
> Configuration: 7206VXR NPE-400, 128MB RAM, 2FE I/O controller, PA-2E3,
> PA-MC-8E1, typical ISP software configuration (some OSPF, some BGP, 802.1q
> VLANs to a 3com switch).
>
> *Any* ideas how to solve this mess will be much appreciated!



This archive was generated by hypermail 2b29 : Sun Aug 04 2002 - 04:13:24 EDT