RE: [nsp] Extremely flakey 7206VXR

From: KF (kf@reign.sk)
Date: Fri Nov 23 2001 - 04:24:41 EST


Hi,

We had simmilar problem with 7206... The box was pre-configured on the table and later mounted to the rack.

After mounting *s**t* happened... and once again when we removed box and placed it on the table it works fine...

So I found, that when the box is mounted in the rack with rack mounting kit, the chassis is torsion stressed, so something goes
wrong, reloaded with bus error (PA's was firmly attached...)

So when I removed all PA's and pluged it back *after* chassis is mouted to the rack, everything started to work well...

for a wonder...

alex.

-----Original Message-----
From: Blaz Zupan [mailto:blaz@inlimbo.org]
Sent: Thursday, November 22, 2001 7:50 PM
To: cisco-nsp@puck.nether.net
Subject: [nsp] Extremely flakey 7206VXR

We're having extreme trouble with a 7206VXR and are quickly running out of
ideas. We have an open ticket with Cisco, but they're not particulary helpful
either.

Here's the history: 7206VXR NPE-400 was running 12.0(16)S and 12.0(18)S mostly
fine for a relatively long time (two or three months). Then suddenly it
started rebooting twice a day (almost exactly every twelve hours) with various
erros (bus error, software forced crash, etc.). Tried 12.0(19)S and 12.2(5)
with no visible fix. We then replaced the whole box except the PA's. One day
later, machine reboots with memory parity error, a couple of hours later it
starts rebooting in a loop and never comes up again. Replaced NPE-400 in the
new box with a NPE-225. Machine is mostly stable again for about two days (so
it seems the new NPE-400 was broken indeed), but after two days the weird
reboots started again, first a bus error, then a couple of software forced
crashes. Considering that most of the reboots were due to software forced
crashes, Cisco suggested we turn on exception handling and create a core dump
file. Unfortunately after three software forced crashes, no core dump was
generated (although it is configured correctly). So we went to 12.2(6) which
was released recently, today we again have the box rebooting 5 times in a row
with bus errors and software forced crashes.

We're monitoring the temperature in the facility, everything looks normal.
Also "show environment all" shows no anomalies in the voltage readings.

We have enough of Cisco and will be replacing this box with a Juniper M5
anyway, but until it arrives we'll have to do *something* as this box is
extremely mission critical. We'll be replacing the NPE once again (with a new
NPE-400) and putting in another UPS in addition to the one the whole facility
is on. Also, we'll probably be replacing the PA's as well. But other than
that, I'm completely out of ideas.

Configuration: 7206VXR NPE-400, 128MB RAM, 2FE I/O controller, PA-2E3,
PA-MC-8E1, typical ISP software configuration (some OSPF, some BGP, 802.1q
VLANs to a 3com switch).

*Any* ideas how to solve this mess will be much appreciated!



This archive was generated by hypermail 2b29 : Sun Aug 04 2002 - 04:12:55 EDT