How Server Rack Temperature Monitoring Prevents Downtime
Server rooms and data centres try to maintain a comfortable ambient temperature and this in the main is for the people working within the IT rooms and to protect the lead acid batteries within any associated uninterruptible power supplies. Servers and electronic IT devices can work at higher ambient temperatures. So why is it important to monitor the ambient temperature?
UPS Lead Acid Batteries
Firstly let’s consider lead acid batteries in a UPS system. The modern valve regulated lead acid battery (VRLA) requires an ambient temperature of 20-25˚C in order to reach its design life. This may be 5 or 10 years. The general rule of thumb is that for each 1˚C rise above 30˚C the design life of a VRLA battery halves. Most UPS battery sets require replacement around years 3-4 and 7-8 respectively and as an expensive consumable item it makes sense to protect these temperature-sensitive elements.
Server Rack Temperatures
Secondly, whilst servers including heatsinks and fan ventilation, reliability problems can occur at higher temperatures. Most electronic systems (including the electronics in a UPS system) will operate up to 40˚C without derating in a cooled environment, with the most heat typically generated within the CPU (central processing unit) and any related power electronics. For every 1˚C rise in ambient, the CPU will rise 1˚C accordingly.
Within a server rack the amount of processing power can lead to demands of 15kW or more and the risk of runaway temperatures pose a management issue. Server CPUs will typically approach meltdown if ambient temperatures within a server rack rise above 30-35. Runaway temperatures therefore present a fire risk.
Most data centres and server room aim for lower ambient temperatures and run to guidelines recommended by ASHRAE of 18-26˚C, depending on humidity and dew points. However, within a data centre or server room environment, temperature fluctuations are common with hot-spots both within server racks and around the rooms themselves. These hot-spots are created through poor airflow and equipment layout designs as well as equipment starting fail.
Environmental temperature monitoring is therefore paramount to keeping a server room or data centre operational. The right monitoring system should not only record what has been in terms of historic data but should also be able to provide predictive scenarios of what could happen and where. Continuous and real-time monitoring with appropriate alarms should is also important for those server rooms and data centres that want to push-up their ambient temperatures as part of an overall energy efficiency drive. Cooling costs and so the higher the ambient the lower the cooling needs and the lower the operational costs and electricity demanded by the air conditioning systems, whether air or liquid based.
Real-Time Temperature and Humidity Monitoring
Within a server rack ASHRAE recommends a minimum of six temperature sensors. Three at the front, at the top, middle and bottom of the rack and three in corresponding positions in the back. This allows the temperature monitoring system to monitor the temperature of air drawn into the front of the rack and its exhaust temperatures at the rear of the cabinets. For more information see: https://tc0909.ashraetcs.org/documents/ASHRAE_TC0909_Power_White_Paper_22_June_2016_REVISED.pdf.
Temperature sensitive installations may use more than six sensors per rack to create a more detailed air flow and temperature dynamic. This is recommended for those looking to push the ambient temperature envelope.
The golden rule for any management system is that you cannot control what you do not monitor. Real-time temperature monitoring systems connected to the IP network can help to quickly identify rising hot-spots and issue alerts via SNMP, SMS or email.
Once a ‘hot-spot’ or temperature spike is alerted it is important to investigate and identify the root cause. The outcome from this may be to continue to monitor or to take corrective and then preventative action(s). Thermodynamic modelling techniques can help with this regard as they can provide a 3D view of the entire data centre environment and drill down into specific server racks.
It is also important to make sure that ambient temperatures do not fall too low. Below 18˚C there are further risks to the server environment. Lower ambient temperatures hold less moisture and so if there is a high relative humidity in a low-ambient environment, condensation can occur, and this moisture can lead to corrosion or even catastrophic short-circuits. Low humidity levels can also cause problems in terms of dry air and the potential for electrostatic discharges which can damage sensitive electronics.
So what are the choices when it comes to ambient temperature and humidity monitoring. There are two choices. The first is to install a dedicated temperature, humidity and general environment monitoring system. This type of system will typically monitor other factors in addition to the temperature and humidity including water leakage, smoke, fire, power and access. The alternative is to consider using power distribution units (PDUs) with optional sensors for the factors to be monitored. Whichever solution is chosen, it is important that they are IP-connected and can be monitored through their own software or via a data centre infrastructure management (DCIM) software package.
Over the last 30 years we have seen data processing move from centralised datacentre locations with plug-in terminals to decentralised on-site server rooms operating standalone IT networks. The latest trend is to push the decentralised concept even further using Edge computing and connection to hyperscale datacentres via Cloud-based applications.
Whether you run a small server room, enterprise datacentre or colocation facility it is vital that you have an environmental monitoring system in place as part of your critical infrastructure. The information from an environmental monitoring system will help to keep the facility resilient, efficient and optimised. Such a system can also help to prevent downtime and damage to mission critical data and servers.