[j-nsp] High failure rates for M7i/M10i hard disks?

Sun Aug 21 01:52:03 EDT 2005

--On August 20, 2005 9:38:40 AM +0200 sthaug at nethelp.no wrote:

> We have around 25 M7i/M10i in production, and we have had 3 hard disks
> fail during the last year. The last instance was rather ridiculous - M7i
> with failed hard disk, we RMAed the RE, it was swapped - and after one
> week, the new hard disk failed also.
>
> When the hard disks have failed, they have failed in such a way that the
> rest of the ruter ceases functioning. We *assume* this is because the
> hard disk problem ties up the CPU so much (interrupts or otherwise) that

This has been my experience with every IDE HDD I've ever had fail.  The 
system basically locks up solid.  Fault of how IDE operates mostly.  SCSI I 
ususally have better response with, timeouts happen, life lurches on. 
Kernel gets a chance to handle the failures.  IDE, the system just WHAM! 
stops.  Thus most of our critical stuff is SCSI.  It makes me nervous that 
the HDD in the juni is IDE but nothing i can do about it but watch SMART 
and hope it's enough warning, ususally isn't.

> the RE cannot generate the necessary protocol keepalives - thus the
> routing protocols go down. Pinging the router from a connected network
> still works, but we're unable to login to the box. Powering the box off
> and on again brings it up, running from from compact flash only (the
> hard disk is not detected on boot).
>
> Has anybody else seen similar problems?
>
> Steinar Haug, Nethelp consulting, sthaug at nethelp.no
> _______________________________________________
> juniper-nsp mailing list juniper-nsp at puck.nether.net
> http://puck.nether.net/mailman/listinfo/juniper-nsp
>

--
"Genius might be described as a supreme capacity for getting its possessors
into trouble of all kinds."
-- Samuel Butler