[nsp] Extremely flakey 7206VXR

From: Blaz Zupan (blaz@inlimbo.org)
Date: Thu Nov 22 2001 - 13:50:13 EST


We're having extreme trouble with a 7206VXR and are quickly running out of
ideas. We have an open ticket with Cisco, but they're not particulary helpful
either.

Here's the history: 7206VXR NPE-400 was running 12.0(16)S and 12.0(18)S mostly
fine for a relatively long time (two or three months). Then suddenly it
started rebooting twice a day (almost exactly every twelve hours) with various
erros (bus error, software forced crash, etc.). Tried 12.0(19)S and 12.2(5)
with no visible fix. We then replaced the whole box except the PA's. One day
later, machine reboots with memory parity error, a couple of hours later it
starts rebooting in a loop and never comes up again. Replaced NPE-400 in the
new box with a NPE-225. Machine is mostly stable again for about two days (so
it seems the new NPE-400 was broken indeed), but after two days the weird
reboots started again, first a bus error, then a couple of software forced
crashes. Considering that most of the reboots were due to software forced
crashes, Cisco suggested we turn on exception handling and create a core dump
file. Unfortunately after three software forced crashes, no core dump was
generated (although it is configured correctly). So we went to 12.2(6) which
was released recently, today we again have the box rebooting 5 times in a row
with bus errors and software forced crashes.

We're monitoring the temperature in the facility, everything looks normal.
Also "show environment all" shows no anomalies in the voltage readings.

We have enough of Cisco and will be replacing this box with a Juniper M5
anyway, but until it arrives we'll have to do *something* as this box is
extremely mission critical. We'll be replacing the NPE once again (with a new
NPE-400) and putting in another UPS in addition to the one the whole facility
is on. Also, we'll probably be replacing the PA's as well. But other than
that, I'm completely out of ideas.

Configuration: 7206VXR NPE-400, 128MB RAM, 2FE I/O controller, PA-2E3,
PA-MC-8E1, typical ISP software configuration (some OSPF, some BGP, 802.1q
VLANs to a 3com switch).

*Any* ideas how to solve this mess will be much appreciated!



This archive was generated by hypermail 2b29 : Sun Aug 04 2002 - 04:13:24 EDT