RE: [nsp] Extremely flakey 7206VXR

From: Yuval Ben-Ari (yuvalba@netvision.net.il)
Date: Thu Nov 22 2001 - 15:10:24 EST


We suffer NPE-400 problems too.
First of all take a look at CSCdv12450 (NPE400 restarted by hardware
watchdog)
(although you did not mention watchdog crashes). We had few such
crashes.
Secondly, did you by any chance enabled NBAR at some point ?
there is also a bug with NBAR which you can find on CCO.

However if the problem persisted after going to NPE-225 I'm not sure
how much those are relevant.

And finally, of course, power / environment issues as Ryan suggested.

Yuval.

> -----Original Message-----
> From: Blaz Zupan [mailto:blaz@inlimbo.org]
> Sent: Thursday, November 22, 2001 20:50
> To: cisco-nsp@puck.nether.net
> Subject: [nsp] Extremely flakey 7206VXR
>
>
> We're having extreme trouble with a 7206VXR and are quickly
> running out of
> ideas. We have an open ticket with Cisco, but they're not
> particulary helpful
> either.
>
> Here's the history: 7206VXR NPE-400 was running 12.0(16)S and
> 12.0(18)S mostly
> fine for a relatively long time (two or three months). Then
> suddenly it
> started rebooting twice a day (almost exactly every twelve
> hours) with various
> erros (bus error, software forced crash, etc.). Tried
> 12.0(19)S and 12.2(5)
> with no visible fix. We then replaced the whole box except
> the PA's. One day
> later, machine reboots with memory parity error, a couple of
> hours later it
> starts rebooting in a loop and never comes up again. Replaced
> NPE-400 in the
> new box with a NPE-225. Machine is mostly stable again for
> about two days (so
> it seems the new NPE-400 was broken indeed), but after two
> days the weird
> reboots started again, first a bus error, then a couple of
> software forced
> crashes. Considering that most of the reboots were due to
> software forced
> crashes, Cisco suggested we turn on exception handling and
> create a core dump
> file. Unfortunately after three software forced crashes, no
> core dump was
> generated (although it is configured correctly). So we went
> to 12.2(6) which
> was released recently, today we again have the box rebooting
> 5 times in a row
> with bus errors and software forced crashes.
>
> We're monitoring the temperature in the facility, everything
> looks normal.
> Also "show environment all" shows no anomalies in the voltage
> readings.
>
> We have enough of Cisco and will be replacing this box with a
> Juniper M5
> anyway, but until it arrives we'll have to do *something* as
> this box is
> extremely mission critical. We'll be replacing the NPE once
> again (with a new
> NPE-400) and putting in another UPS in addition to the one
> the whole facility
> is on. Also, we'll probably be replacing the PA's as well.
> But other than
> that, I'm completely out of ideas.
>
> Configuration: 7206VXR NPE-400, 128MB RAM, 2FE I/O controller, PA-2E3,
> PA-MC-8E1, typical ISP software configuration (some OSPF,
> some BGP, 802.1q
> VLANs to a 3com switch).
>
> *Any* ideas how to solve this mess will be much appreciated!
>
>



This archive was generated by hypermail 2b29 : Sun Aug 04 2002 - 04:13:24 EDT