The internal temperature of a hard drive is an important factor not only in its correct functioning but also when it comes to the reliability and lifetime of a storage system. The mechanical components of the hard disk drive, especially the fluid-dynamic bearing of the platter-stack spindle, wear out faster at higher temperatures as the fluid oil leaks out at a faster rate.
So, controlling the temperature of hard drives in storage systems is necessary to ensure optimal functionality and high reliability.
How to measure the hard drive temperature
Modern hard drives have a built-in internal temperature sensor, which can be read via the S.M.A.R.T parameter, by tools within the operating system, or by the management tools of the host bus adapter or RAID controller.
Open-E JovianDSS GUI provides access to management tools of Broadcom and Microchip-Adaptec management tools.
In case of Microchip-Adaptec: <node-ip-address>:8443, login with aac, password raid
In case of Broadcom: <node-ip-address>:9000, login with root, password admin
By expanding controller-, enclosure-, and port-view, the connected hard disk drives are listed and temperature is displayed.
On the operating system level, smart values can be read out with “smartmontools”, a common freeware tool available for windows and Linux. Command-line is:
For this example, the hard disk drive’s internal temperature is currently 36 °C. The initial starting temperature was 16 °C and its historic maximum was 45 °C.
Temperature is too high?
The manufacturer specifies the correct functionality of a hard disk drive within a certain range. When specifically looking at Enterprise HDDs, a controlled data center-type of cooling is assumed, hence they are usually specified to operate from 5 to 60 °C, with an ambient temperature of max. 55 °C. NAS drives are specified from 5 to 65 °C and surveillance-specific models from 0-70 °C (it’s because surveillance systems may operate in less stable environments).
Temperature and Reliability
The average hard drive temperature has a direct impact on its reliability. The reliability of a hard drive, measured in Mean Time To Failure (MTTF), will only be achieved if the average hard drive temperature stays below 40 °C. Average here means that periods with more than 40 °C will need to be compensated by periods with less than 40 °C. In data center environments with active cooling, the user should mind that the temperature of 40 °C is never exceeded.
A typical enterprise hard drive is rated with an MTTF of 2 Mio. hours. MTTF can be translated into an “annualized failure rate” (AFR) using the following formula:
AFR = 1-e(-8760h/MTTF[h]) (1 year = 8760 hours)
This exponential formula assumes that the already failed drives have to be considered when calculating the failure rate for the remaining drives. With low failure rates, the formula can be simplified to:
An MTTF of 2 Mio hours would result in an AFR of 0.438 %. So, within 1000 drives in operation, 4-5 parts would be expected to fail throughout a year of operation.
But this MTTF/AFR commitment of the HDD manufacturer is restricted to certain conditions, which are:
Valid only within the warranty period (typically 5 years),
Valid only when not exceeding excessive workload (less than 550TB/year reading and writing),
Valid only when not exceeding the maximum power on hours per period (only for non-24/7 models),
Valid only when not exceeding the average temperature of 40°C.
If any of these conditions are violated, the hard drive may not fail straight away. Specifically, the workload limitation is not meant in the same way as endurance when referring to SSD write endurance (where a violation would always result in a failure of a component). For a hard drive operating beyond these limits, AFR (annual failure rate) would slowly increase, depending on the level of the violation. The same can be applied when considering a hard drive usage after 5 years of warranted service lifetime – a massive fail should not necessarily be expected, but over time, higher yearly failure rates exceeding a level of 4-5 parts out of 1000 could be experienced outside of the first 5 years.
After exceeding an average temperature of 40 °C, the AFR of 0.438% increases. As a rough indication, each 5 °C over 40 °C could increase the failure rate by around 30%. At 55 °C continual or sustained hard drive temperature, the failure rate is expected to double.
Keeping the average hard drive temperature below 40 °C shouldn’t be a problem in any well-designed thermal system with appropriate airflow. However, operating without proper airflow and a set of fans will not facilitate the service and support for the thermal requirements for the 24/7 continuous operation of enterprise drives. In case of operation at room temperature, high room temperatures of >30 °C may result in hard drive temperatures of over 40 °C, but these could be offset with periods of lower temperatures as previously explained.
As for installation in data centers, well-designed thermal servers and JBODs keep the hard drive at a maximum of +10..+15 °C over air inlet temperature. So, air inlet temperatures that are lower than 20 °C will support proper hard drive operations, even if they are stacked in a rear position of a large 4U top-loader JBOD or server.
In such cases where there is a sustained higher HDD temperature of 15 °C over ambient/air-inlet temperature, something is fundamentally wrong in the thermal design of the system. You should check the airflow/fan operation and analyze the system for potential blockages of the airflow.
For correct functioning and the highest possible reliability, it is important to observe the temperature of the hard disk drive in operation. Maximum temperatures of 60 °C and more should be avoided by all means, and the average temperature should ideally not exceed 40 °C.
Rainer W. Kaese
Senior Manager Business Development Storage Products, Toshiba Electronics Europe
Rainer Kaese has been with Toshiba for almost 30 years. He initially specialized in application specific ICs, managing the ASIC Design Center, and later the Business Development Team for ASIC- and Foundry Products. He is currently responsible for the introduction of Toshiba’s Enterprise HDD products into Datacenters, Cloud Computing and Enterprise applications.