[j-nsp] Routing stopped due to failed hard disk?
Blaz Zupan
blaz at inlimbo.org
Fri May 6 01:47:02 EDT 2005
We had a complete catastrophy this night. All routing through one of the main
POPs stopped. Rebooting the M10i that is servicing the POP fixed the problem,
but after the reboot it seems like the hard disk is no longer recongized and
the system is running purely off of flash:
ad0: 245MB <SanDisk SDCFB-256> [980/16/32] at ata0-master using PIO4
rd0: ATA SW-RAID configuring 1 subdisks
rd0: mirrordisk #ad/0x1000a not found, mirroring disabled
rd0: stripe 0: subdisk rad0 mirrordisk -
Mounting root from ufs:/dev/rd0s1a
We have "system mirror-flash-on-disk" configured on all boxes. In retrospect
this seems like a bad idea because apparently this caused the trouble. Looking
at our logs, I can see this:
May 6 01:46:15 maribor2-lo0.ipv4 smartd[2597]: atareadsmartvalues: ioctl: Resource temporarily unavailable
May 6 01:46:15 maribor2-lo0.ipv4 smartd[2597]: checkdevices: Non zero return from atacheckdevice
Ten minutes later, everything stopped.... So apparently the hard disk failed
and took the whole box with it - not good. Unfortunatelly, currently the box
does not have a redundant routing engine (not my decision), except for the
power supply.
Has anybody seen something like this? Is mirror-flash-on-disk a good idea?
Blaz Zupan, Medinet d.o.o, Trzaska 85, SI-2000 Maribor, Slovenia
E-mail: blaz at amis.net, Tel: +386 2 320 6320, Fax: +386 2 320 6325
More information about the juniper-nsp
mailing list