[j-nsp] Routing stopped due to failed hard disk?

Richard A Steenbergen ras at e-gerbil.net
Fri May 6 20:35:11 EDT 2005


On Fri, May 06, 2005 at 07:47:02AM +0200, Blaz Zupan wrote:
> We had a complete catastrophy this night. All routing through one of the main 
> POPs stopped. Rebooting the M10i that is servicing the POP fixed the problem, 
> but after the reboot it seems like the hard disk is no longer recongized and 
> the system is running purely off of flash:
> 
> ad0: 245MB <SanDisk SDCFB-256> [980/16/32] at ata0-master using PIO4
> rd0: ATA SW-RAID configuring 1 subdisks
> rd0: mirrordisk #ad/0x1000a not found, mirroring disabled
> rd0: stripe 0: subdisk rad0 mirrordisk -
> Mounting root from ufs:/dev/rd0s1a
> 
> We have "system mirror-flash-on-disk" configured on all boxes. In retrospect 
> this seems like a bad idea because apparently this caused the trouble. Looking 
> at our logs, I can see this:
> 
> May  6 01:46:15 maribor2-lo0.ipv4 smartd[2597]: atareadsmartvalues: ioctl: Resource temporarily unavailable
> May  6 01:46:15 maribor2-lo0.ipv4 smartd[2597]: checkdevices: Non zero return from atacheckdevice
> 
> Ten minutes later, everything stopped.... So apparently the hard disk failed 
> and took the whole box with it - not good. Unfortunatelly, currently the box 
> does not have a redundant routing engine (not my decision), except for the 
> power supply.
> 
> Has anybody seen something like this? Is mirror-flash-on-disk a good idea?

The problem with software RAID is that it behaves poorly when one of the 
devices goes down. The good thing about it is that your config has been 
duplicated on the other media. 

First off, you may want to get console on the box and check the logs to 
figure out what actually happened. Assuming the HD died and really did 
take down the system, you should be able to boot the device into single 
user mode (boot -s at the boot loader, just like you would do to recover 
passwords) and remove the mirror-flash-on-disk command.

-- 
Richard A Steenbergen <ras at e-gerbil.net>       http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)


More information about the juniper-nsp mailing list