[c-nsp] Unstable IOS Version for LNS on Cisco 7206 NPE-G2

Nick Hilliard nick at foobar.org
Fri Nov 12 06:56:35 EST 2010


On 12/11/2010 11:17, Gert Doering wrote:
> If you have a bad RAM that will change a pointer value for you *without*
> triggering a RAM parity error (because it's on a platform that has no
> ECC/parity, like 'early 7200 NPEs'), and the software uses that pointer,
> and accesses a bad memory location *because the RAM corrupted the pointer*,
> how can the resulting SegV be "always a software bug"?

this is certainly true - an untrapped DRAM bit error can certainly cause a
SEGV in the right circumstances.  Sorry Eninja, but Cisco is misleading on
this point; undetected DRAM corruption can produce any type of exception
available.

But what are the chances of seeing the same dram corruption problem which
1) didn't cause an ECC error (an npe-g2 uses ecc ram) and 2) consistently
caused the same SEGV, even after reboots and with different software?  I
would say the chances of this are pretty low.

I would put my money on either a software bug which was present in all the
software trains the OP tried, and was tickled by some odd configuration or
other.  Or else a hardware problem which was not related to DRAM.

You can check if it's the latter by doing a hardware swapout with a similar
box.

Nick


More information about the cisco-nsp mailing list