Google Hard Drive Failure Rate Study

8bitbytes

SatelliteGuys Pro
Original poster
Sep 8, 2003
3,239
0
NoVA
For those of you preparing for the External Drive feature, you may wish to read this article when deciding where to throw your hard earned dollars. Here is an excerpt and the link:

Earlier this month, Google researchers released a fascinating paper called "Failure Trends in a Large Disk Drive Population" that examined hard drive failure rates in Google's infrastructure. Two conclusions stood out: self-monitoring data isn't useful for predicting individual drive failures, and temperature and activity levels don't correlate well with drive failure. This throws conventional wisdom about predicting drive failures into question, so we sought out some independent expert analysis to weigh in on the findings. First, we briefly recap last week's Google study.

The Google study
First, though, let's take a look at what the Google researchers found. They examined the data from more than 100,000 drives deployed in Google's servers, all of which were consumer-grade serial and parallel ATA units with spindle speeds of 5400rpm and 7200rpm. Drives were considered "failed" if they were replaced as part of a repair procedure. Self-Monitoring, Analysis and Reporting Technology (SMART) information was recorded from all drives, and spurious readings were filtered out of the resulting database.

When they looked at annualized failure rates, they saw the expected "infant mortality" effect, where drives die more often very early in their life cycle. The thinking behind this is that poorly-made drives fail quickly, while well-made ones then have a few trouble-free years before they begin to reach their end-of-life stage at around five years. This is sometimes referred to as the "bathtub curve" for its shape, but Google's researchers found that the failure rate ticked up much sooner—starting at two years—and remained steady for the next several years.



Experts: No cure in sight for unpredictable hard drive loss
 
Wow ! Google use cheap CG disks in data center !
"all of which were consumer-grade serial and parallel ATA units with spindle speeds of 5400rpm and 7200rpm"
While I'm agree the study is benefitting for us, but hey - servers require almost military grade disks :) for those companies who rely on the data. Google purposely cut cost and use the CG disks. BTW, I should look into - interesting what manufacturers/models used ? IBM 'DeathStar' ?
I want to quote Gibson about temp issues:
When it comes to the question of temperature and drive failure, though, Gibson says that the "very high temperatures" which can affect drive longevity are actually quite common in home PCs. "SpinRite will stop mid-run when drives become too hot," he says, "but SpinRite will only stop when the drive gets really hot." And Gibson knows from user reports that his tool does regularly stop to let drives cool down. "So this informs us that these 'extreme' temperatures are actually being encountered in the real world and are limiting drive lifetimes."
 
Regarding SpinRite, if the drive isn't already failing, the drive shouldn't be doing a lot of seeking which, I assume, is what generates the most heat. Obviously, the electronics generate some heat that may be associated with extra activity, but I can't imagine that it is as much as the physical energy of exciting the voice coil.
 
You forget that BIG coil(s) what move head's arm in high magnetic field - there a lot of juice required, plus spindles motor's current. Voice coil w/r by relatively low current, perhaps V-technology req more.
 
Heat warps the platter causing more errors. I have 2 80gb Caviar going strong since june 2002. I have 2 front fans blowing directly on the drives. I never had a Deskstar HD last 12 months.
 
Wow ! Google use cheap CG disks in data center !
"all of which were consumer-grade serial and parallel ATA units with spindle speeds of 5400rpm and 7200rpm"

Okay. You deploy 100K spindles then and get back to us on what you spend for which types of drive.

While I'm agree the study is benefitting for us, but hey - servers require almost military grade disks :) for those companies who rely on the data. Google purposely cut cost and use the CG disks.

I am nearly 100% certain that Google has multiple levels of backups and (at a minimum) drives will be mirrored. Replacing a drive does not ipso facto create a loss of data. That's the point of RAID-1 (mirrored) and RAID-5 (striped with distributed parity) environments. It would be an exposure that would require restoration from a slower backup system only in the event of a second drive failure. While the chances of this aren't zero, properly monitored it would be very low.
 
You forget that BIG coil(s) what move head's arm in high magnetic field - there a lot of juice required, plus spindles motor's current. Voice coil w/r by relatively low current, perhaps V-technology req more.
You seem to be ignoring that the spindle motor isn't working very hard. It only requires enough power to overcome spindle bearing friction and perhaps some air bearing drag from the heads. I'm guessing that you've got to go out some decimal places to measure that energy consumption.

The voice coil in a "high seek" situation is working hard against itself. It is constantly throwing the heads back and forth and using lots of power to accelerate towards and suddenly stop at the subsequent cylinder.
 
I know how the HDD built - just using some different words/terms; same time you shouldn't ignore that current of spindle motor - I can measure it for you - NP, but you can find the data in service manuals.
 
Moot point - the amount wasn't deploy one time - it took years.

I probably could get more info from my peers there.

But my point was: want it cheap now - pay more later ( been there, seen that, having more then 25 years experience in server's farms).

Okay. You deploy 100K spindles then and get back to us on what you spend for which types of drive.

I am nearly 100% certain that Google has multiple levels of backups and (at a minimum) drives will be mirrored. Replacing a drive does not ipso facto create a loss of data. That's the point of RAID-1 (mirrored) and RAID-5 (striped with distributed parity) environments. It would be an exposure that would require restoration from a slower backup system only in the event of a second drive failure. While the chances of this aren't zero, properly monitored it would be very low.
 
I just read an interesting article in EE Times that nanoimpression technology will be used to imprint single magnetic particles on a non-ferreous platter, in effect "fixing" the number of bits that are on the platter. The article predicted two orders of magnitude increase in capacity by using this method.
 
I just read an interesting article in EE Times that nanoimpression technology will be used to imprint single magnetic particles on a non-ferreous platter, in effect "fixing" the number of bits that are on the platter. The article predicted two orders of magnitude increase in capacity by using this method.

2 spindles, a primary and a mirror would likely be all the storage you ever need.

1TB spindles are just now being released. Two orders of magnitude pops us into Exabyte territory. Note, these are storage (powers of 10 numbers) not binary values where 1K = 1024

Code:
Name           Value                  Bytes
Megabyte       10^6                             1,000,000 
Gigabyte       10^9                         1,000,000,000
Terabyte       10^12                    1,000,000,000,000
Petabyte       10^15                1,000,000,000,000,000
Exabyte        10^18            1,000,000,000,000,000,000

So let's break this down into space for a DVR. Assuming 10GB/hour for HD storage; that would mean:

100 hours for 1 terabyte
100,000 hours for 1 Petabyte
100,000,000 hours for 1 Exabyte.

This is what 2 orders of magnitude buys you.

There are 8,766 hours in a year (365.25 to account for leap year). 1 Exabyte of storage gets you > 11,400 years of HD storage.

How many streams simultaneously would it take to fill up this much storage?

Oh, you really would need a 64-bit File System to address all this storage space :D

Cheers,
 
I think they were talking decimal orders; in other words, a 10²-fold increase. So we're only talking 10,000 hours of HD storage for now. Maybe when the Holographic storage crystals come on-line...
 
I think they were talking decimal orders; in other words, a 10²-fold increase. So we're only talking 10,000 hours of HD storage for now. Maybe when the Holographic storage crystals come on-line...

Foxbat; doh. Yep. 100TB / spindle would be correct.

So 10K hours of non-stop recording. Still a very stupid number of hours IMO.

What is the cost / unit storage in comparison to current technology drives? The 1 TB spindles are coming in with a suggested list of $400 give or take. That's ~.40/GB.

If these are going to be affordable then the manufacturing costs will have to take a nose dive too.

Cheers,
 
If these are going to be affordable then the manufacturing costs will have to take a nose dive too.
The nanoimpression technology is supposed to let manufacturers print ICs and other nano-scale featured devices much in the same way CDs and DVDs are made: Instead of having to write the bits of a movie to a DVD in a serial fashion, all of the data are pressed in one step. With the disk platter, the magnetic bits would be printed on the platter in one operation instead of being layered-on by vapor deposition.

I couldn't find the electronic version of the article in the June 11th Electronic Engineering Times, but here's what they said:
EE Times said:
Consider the disk drive sector, where for decades vendors have increased storage capacity by shrinking the tiny magnetic grains that constitute the data bits in a system. For traditional, longitudinal recording media, data integrity would be "harmed if we continue to shrink the magnetic grains," drive giant Hitachi Global Storage Technologies warned in a recent presentation.

The inability to scale the magnetic grains is called the superparamagnetic effect. Drive vendors have skirted the issue of late by moving to perpendicular recording technology, but now they are looking beyond that technique.

One alternative is the use of bit-patterned media, wherein "each bit is stored in a single, deliberately formed magnetic switching volume," according to the Hitachi presentation. "Since we no longer need on the order of 100 grains per bit, but just one single, grain-sized switching volume, density can be increased by roughly two orders magnitude compared with conventional recording media."

Today's drives have media capacities of 100 Gbits/square inch, with a center-to-center spacing of 86 nm, according to the presentation. Increasing capacity to 1 Tbit/inch², spacing would require center spacing of 27 nm. At 10 Tbits/inch², spacing would be just 9 nm. Hitachi believes optical lithography will run out of gas before the bit-patterned era and is exploring nanoimprint as an alternative.
Of course, that doesn't say anything about the cost, but like most technologies, it should get cheaper as time goes on.
 
Foxbat:

Unfortunately it'll be too damned expensive if they can't cut the cost to $.004 / GB give or take. At current prices for 100 TB, it's going to be 40K dollars. Not a lot of sales there. At $1K dollars for a drive that you might not ever fill; well then that's awesome.

Here's an example where a consumer RAID-5 + hot spare would make perfect sense. Buy 5 of them in a 4-drive raid with one hot spare. You'd have a great solution with practically limitless storage.

It should be interesting to see how long this takes to come to fruition, if ever.

Also -- for data centers the number of spindles is important for throughput so it might be quite some time before these make it to a data center near you.

The only practical backup method for drives with this density is more drives though.

Cheers,
 

New DP-Plus LNB won't recognize 61.5 bird

Remote Control code for off-brand tv

Users Who Are Viewing This Thread (Total: 0, Members: 0, Guests: 0)

Who Read This Thread (Total Members: 1)