[j-nsp] High failure rates for M7i/M10i hard disks?

Scott Morris swm at emanon.com
Fri Aug 26 17:14:32 EDT 2005


Agreed.  Somehow I think that the RMA process will well-exceed the $30
differential there.  :) 

-----Original Message-----
From: juniper-nsp-bounces at puck.nether.net
[mailto:juniper-nsp-bounces at puck.nether.net] On Behalf Of Richard A
Steenbergen
Sent: Friday, August 26, 2005 5:05 PM
To: sthaug at nethelp.no
Cc: juniper-nsp at puck.nether.net
Subject: Re: [j-nsp] High failure rates for M7i/M10i hard disks?

On Fri, Aug 26, 2005 at 10:33:16PM +0200, sthaug at nethelp.no wrote:
> > > And that field alert is now out: PSN-2005-08-014
> > > 
> > > https://www.juniper.net/alerts/viewalert.jsp?actionBtn=Search&txtA
> > > lertNumber=PSN-2005-08-014&viewMode=view
> > 
> > I hope they're not actually saying that the hard drive can't handle 
> > being written to every 10 secs?
> 
> I'm not going to defend Juniper here - I think we have suffered quite 
> enough of these disk problems (got woken by a phone call from our NOC 
> this morning - *another* M7i had stopped working during the night, 
> from the same problem).
> 
> However, I *think* what they're saying is that writes every 10 seconds 
> for a while is not a problem, but writes every 10 seconds 24x7 may be 
> a problem. Remember what the disk manufacturers have been trying to 
> tell us - there are differences between disks made for heavy-duty 
> server use (typically SCSI) and disks made for PC/home use (typically 
> ATA). The M7i/M10i disks are 2.5" ATA disks (laptop type disks) and 
> are probably not made for continuous use.

As I understand it, they're claiming that writing to the drive every 10 secs
is preventing the thermal recal. I'm not a hard drive engineer so I can't
say for certain what is or isn't necessary, but like I said my bullshit
meter is going off on log file writes every 10 secs preventing thermal
recal.

However, I will definitely say that the rest is nonsense. First, the drives
are exactly the same, the only difference is the type of interface attached
to the drive. Yes the drive manufacturers will reserve the fastest and
nicest drives for the more expensive commercial-use interfaces (fibre
channel, scsi, etc), but there are in fact specifically targetted server
grade 2.5" ATA drives for use in blade servers. These drives are subjected
to the same high-volume 24/7 reads and writes as any other server, I can't
imagine how a small log file every 10 secs could possibly compare.

I checked on the specific drive used in a RE-5.0/RE-400, based on
information provided by some actual users of it. The drive detects as a:

ad1: 19077MB <HTS548020M9AT00> [38760/16/63] at ata0-slave using UDMA33

Which seems to be a Travelstr 5K80-20 5400RPM 20G ATA-6 drive:

http://www.hitachigst.com/portal/site/en/menuitem.4a8443e5524e0c5deb4703e3aa
c4f0a0/

It seems the drives they are marketing for blade server use are the E#K##'s
not the regular #K##'s. For example:

http://www.hitachigst.com/portal/site/en/menuitem.ec03cadee7c6fb5deb4703e3aa
c4f0a0/

vs

http://www.hitachigst.com/portal/site/en/menuitem.c8c3966a526cfb5deb4703e3aa
c4f0a0/

A search on pricewatch seems to put the price for these models (remember
this is 60G, 3x the capacity of the RE drive, and 7200RPM) at $179 for the
non-E, and $209 for the E ($30 difference). Now, I don't know for certain if
this drive is actually any "longer lived", or if it just offers faster
access rates, but we do know that an actual server blade grade HD is
obtainable for very cheap. Given that the list price of a RE-400-256 is
$15,000, and RE-850-1536 is $20,000, you're going to have a pretty damn hard
time convincing me that Juniper couldn't make sure $30 more per unit was
spent to get drives which could handle updating a log file every 10 seconds.

I'm just not buying it. Just speculating without any facts here, but a
manufacturing defect or bad bios/firmware interaction that would require an
RMA seems far more likely to me. Maybe Juniper doesn't want to RMA every M7i
and M10i routing engine they've sold, and thinks that reducing the error
rates by writing to the drive less is a fix for some of the problems?

-- 
Richard A Steenbergen <ras at e-gerbil.net>       http://www.e-gerbil.net/ras
GPG Key ID: 0xF8B12CBC (7535 7F59 8204 ED1F CC1C 53AF 4C41 5ECA F8B1 2CBC)
_______________________________________________
juniper-nsp mailing list juniper-nsp at puck.nether.net
http://puck.nether.net/mailman/listinfo/juniper-nsp



More information about the juniper-nsp mailing list