An article in the Proceedings of the 5th USENIX Conference on File and Storage Technologies this month offers perhaps the most in-depth study of hard drive failures to date. Google uses hundreds of thousands of hard drives to store its data, and a sample of one hundred thousand of Google’s drives was studied for five years to determine common causes of failure. Since this very interesting article is a little dense to read in its entirety, I thought Tech Tails readers would enjoy reading some highlights.
• Going against conventional thought, the study determined that increased temperature and/or activity had little or no correlation to failure rate. By extension, it was found that drives spinning up and spinning down most often had the highest failure rates. This means it’s best to uncheck the “Put the hard disk(s) to sleep when possible) box in Energy Save—at least in terms of hard drive health.
• Some SMART (self monitoring and reporting technology) parameters are excellent indicators of impending mechanical failure. Among failed drives, a good chunk gave no warning by SMART, even though SMART-monitored parameters were to blame for failure. For this reason, SMART is most useful as a statistical predictor of failure for a population of drives rather than on individual devices. With that in mind, if your drive reports SMART errors you should at the very least immediately perform a full backup.
• About 3% of drives failed in the first three months, 1.8% in the first 6 months, 1.7% in the first year. From there, failure rates jump to approximately 8% in the second year, 9% in the third year, fall to 6% in the fourth year, and jump back to 7% in the fifth year.
The whole article can be read at: