[j-nsp] Routing stopped due to failed hard disk?

Blaz Zupan blaz at inlimbo.org
Fri May 6 01:47:02 EDT 2005


We had a complete catastrophy this night. All routing through one of the main 
POPs stopped. Rebooting the M10i that is servicing the POP fixed the problem, 
but after the reboot it seems like the hard disk is no longer recongized and 
the system is running purely off of flash:

ad0: 245MB <SanDisk SDCFB-256> [980/16/32] at ata0-master using PIO4
rd0: ATA SW-RAID configuring 1 subdisks
rd0: mirrordisk #ad/0x1000a not found, mirroring disabled
rd0: stripe 0: subdisk rad0 mirrordisk -
Mounting root from ufs:/dev/rd0s1a

We have "system mirror-flash-on-disk" configured on all boxes. In retrospect 
this seems like a bad idea because apparently this caused the trouble. Looking 
at our logs, I can see this:

May  6 01:46:15 maribor2-lo0.ipv4 smartd[2597]: atareadsmartvalues: ioctl: Resource temporarily unavailable
May  6 01:46:15 maribor2-lo0.ipv4 smartd[2597]: checkdevices: Non zero return from atacheckdevice

Ten minutes later, everything stopped.... So apparently the hard disk failed 
and took the whole box with it - not good. Unfortunatelly, currently the box 
does not have a redundant routing engine (not my decision), except for the 
power supply.

Has anybody seen something like this? Is mirror-flash-on-disk a good idea?

Blaz Zupan,  Medinet d.o.o, Trzaska 85, SI-2000 Maribor, Slovenia
E-mail: blaz at amis.net, Tel: +386 2 320 6320, Fax: +386 2 320 6325


More information about the juniper-nsp mailing list