[j-nsp] recovering from failed HD during jinstall?

Dave Diller dave at maxgigapop.net
Wed Feb 3 14:30:42 EST 2010


I had a backup RE drive seem to fail during a code upgrade last night:

ad1: FAILURE - READ_DMA status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=72519695
ad1 removed from the boot list

I rebooted, expecting it would then ignore the removed drive and complete the install on the compact flash, but instead it
ad1: missing in Boot List, restoring Boot List to default

and then, of course, failed again. 

I note that it failed while trying to copy the package from /mnt/tmp/preinstall (on ad0, I assume) to /tmp/preinstall (must be on ad1), so it would seem the drive is required for code upgrades as it needs to newfs ad0, which'd be rather difficult if it's running the install from there... but given where it failed, it means it never gets to the point of actually installing the new code on ad0, and the RE is dead in the water.  Is there any way to accomplish this remotely?

I managed to recover it this morning via a clean install of the pre-maintenance code that I already had on PCMCIA, which newfs'd/installed on both ad0 and ad1 without issue.  I was then successful at adding the same software that failed last night, followed by a system snapshot, and a reboot onto the hard disk.  So it SEEMS fully functional now, but is it trustworthy - or a time bomb?  What would account for the read errors, if not a dying disk?  I opened an RMA last night, so they're going to get it back - this is mostly just curiosity now about what and how.


-dd


More information about the juniper-nsp mailing list