12/02/2021

How to Prevent Comms Room Power Outages Using Business Contintuity Principles

Blog main image 1801191 1614335961

Whether you operate a comms room or server room it is important to ensure you have a power protection plan in place that will prevent unplanned for downtime to a power outage. Many IT networks expand rapidly, and the rooms and racks used to house their critical servers and network devices can quickly become cluttered. This can lead to several health & safety issues from trip to fire hazards. It can also lead to single-points-of-failure in terms of power continuity planning as some new devices may not be supported when there is a mains power supply failure.

What is a Power Outage?

A power outage is a break in the mains power supply that can last from a few milliseconds to minutes or hours. Momentary power outages can occur during lighting storms when part of the electricity network infrastructure, such as an overhead line or substation transformer suffers a nearby lighting strike or electrical storm interference. Electricity grids are set to automatically recover supplies from momentary outages.

A complete power outage, lasting several minutes or longer is generally caused by a complete failure in the electrical supply network. The most recent widescale blackout or power outage in the UK occurred in 2019 when power was lost by several electrical utilities over a wide area including England and Wales. The cause was two generating stations going off-line leading to the activation of automatic protection systems on the national grid kicking in as the frequency started to drift.

Other examples of power outage causes include sub-station transformers shorting and breaks in electrical supply cables. The latter is more frequent and is generally down to poor planning when it comes to road works and building excavations close to electricity supply cable ducts.

More information:
https://www.bbc.co.uk/news/topics/cgemkklp29nt/uk-power-cut

Why Power Protection Plans are Important

Business continuity planning is a process for creating systems of prevention and recovery to deal with potential threats that could limit or prevent an organisation from operating and providing its products and services.

The IT environment is one area of an organisation that requires specialist audit and review. Cyber security is an 24/7 unseen threat to every organisation that runs an IT network and most organisations who are dependent upon their IT have gone down the Cyber Essentials or ISO 27001 certification routes to ensure that they have some form of robust protection and information system management in place.

Power outage protection is another 24/7 threat but one for which there is no specific standard endorsed by the International Standards Organisation (ISO). There are specific standards for electrical aspects of a building including the electrical wiring (BS 7671, 18th Edition IET Wiring Regulations) and UPS topologies (IEC 62040-3:2011). There are standards for data centres including ISO/IEC TS 22237-1:2018 Information technology — Data centre facilities and infrastructures, but nothing specifically for smaller comms rooms or server rooms.

What is needed is to adopt a common-sense approach and implement a core of best practices to prevent downtime from a power outage. One of the primary ways to prevent downtime is to install an uninterruptible power supply.

Uninterruptible Power Supplies

A UPS system is designed to provide an ‘uninterruptible’ source of power from a battery set. The rating of the UPS defines the maximum load size that can be supported, and the battery is sized to provide a specific amount of runtime at the full UPS load rating or to a derated level i.e. 0.7 or 0.8 of the load rating. What this means is that a UPS rated at 1kVA may have a battery rated to provide 800W of power for a set number of minutes.

There are three main types of uninterruptible power supply: standby/off-line UPS, line interactive and on-line UPS. Of the three types, on-line UPS provide the highest quality of power for a critical IT server application in a comms room, server room or data centre.

Under normal conditions, the inverter continuously power the load, using a rectified mains power supply or stored energy in the connected battery set. The output to the load is ‘break-free’ and uninterruptible and the output from the digitally controlled inverter is a pure sinewave. On-line UPS system provide protection when the mains power supply is present and provide protection from brownouts, sags, surges, spikes, and transient voltages.

Line interactive UPS provide the next level own in power protection. A built-in automatic voltage regulator or stabiliser provides some protection from brownouts, sags, surges, spikes, and transient voltages. The inverter is powered but not connected to the load and only engages when the UPS cannot maintain its output power using the automatic voltage stabiliser. If the mains power supply voltage or frequency drifts or collapses outside a pre-set operating window, the inverter is brought into line to power the load. There can be a 2-4ms break in supply as this occurs, but it is generally accepted that the type of switch mode power supply in modern electronics and IT equipment can withstand this due to their electrical circuit capacitance.

Aside from the quality of its output power, an on-line UPS is superior to a line interactive for two other reasons. The first is that an on-line UPS has an automatic bypass supply. If the UPS develops a fault or is overload, the critical load is automatically transferred to the mains power supply. The second is that the inverter is rated for continuous running and this allows the UPS to be installed with runtime battery packs that can run up to several hours at full load. Line interactive UPS do not have an automatic bypass and long runtimes are not achievable with all line interactive models.

Power Protection Risk Assessments

Power protection planning should be a sub-set of the business continuity planning cycle. Risk assessments specifically for power protection and outages should be completed and reviewed annually.

The first step in the profess is to complete a power audit to identify all the devices within the IT environment that must be protected with uninterruptible power. A complete asset list should be created details device specific information including device type, location, model number, serial number, VA, Watts, power input requirements (single or three phase), power supplies (single or A/B redundant) plug type, side (U height, width, and depth in mm), warranty, supplier, last service, and clean dates. This list should be cross-linked to a PAT register for any devices that require Portable Appliance Testing. When devices are disposed of or new one commissioned the registers should be updated. Without a comprehensive approach, rogue devices can be missed including broadband routers and network switches vital to the overall availability of the IT network.

An asset list provides the information required for capacity planning and a power protection single line diagram. These are important considerations. Consider two server cabinets in a comms room.

From the asset list it is possible to calculate the overall power and cooling requirements for each cabinet or rack. From the single line diagram or layout drawing the electrical power connections can drawn up and decisions made on how best to connect the loads to power distribution units and a source of uninterruptible power.
Support time must also be considered, and this goes together with remote interface monitoring decisions.

A UPS system and smart/intelligent PDUs can be installed with SNMP interface cards. For IT environments running UPS monitoring and control software, this allows for broadcast messages when a UPS system alarms and provides for automated unattended server shutdowns should a battery reach a low threshold alarm.

UPS systems can be relatively compact design, but their overall footprint increases exponentially as runtime battery packs are added to increase uptime. A 3kVA UPS system in a 19inch rack may be 3U high but with 4-hour runtime may require up to 3-4 times the amount of U-height available. For long runtime applications it may be necessary to move from a distributed power protection plan to a centralised one i.e. a larger UPS battery set protection a sub-distribution board from which the server racks and IT devices are powered.

Further runtime may be provided by a local standby power generator or even an energy storage system such as a Tesla Powerwall or similar device.

Regular Power Outage and Black Out Testing

It is easy to test a UPS system. Remove or disconnect the input supply and the UPS will run on its batteries. A short test proves this but does not go far enough to test the integrity of the power protection plan design. Two aspects to consider here.

Batteries age over time and should be treated as a consumable. The most common type of a battery in single phase UPS is a 5-year lead acid. This will provide about 300-400 complete charge/discharge cycles at 20-25⁰C and require replacement in years 3-4. The older the battery the more likely its output will collapse when placed under load i.e. just when you need it during a prolonged power outage. If there is a standby power generator and/or energy storage system, the ‘black out’ test should ensure that these devices also start/perform as intended.

Where communications interfaces are installed, it is important to ensure that alert emails or text SMS messages are received by key personnel on a controlled distribution list. Each person should have a specific role to play when there is a power outage and provide cover as required. If automated server shutdown scripts are enabled these should also be activated during the test to ensure that all systems are shutdown safely to secure data integrity and prevent hardware damage.

Summary

A formal approach to power protection can ensure that an organisation has a more robust power protection and business continuity plan in place for its comms room or server room. Disruption caused by power outages can be prevented and IT systems kept running until the mains power supply is restored or servers are shutdown safely if there is a prolonged outage. As important is the need to regularly stress-test the power continuity plan to make sure that it is comprehensive and that no additional rogue devices have been introduced that will become single-points-of-failure the next time the building experience a power outage.

Related blog posts

29/01/2021
Next Article
How to Carry Out Server Room and Data Centre Risk Assessments
Blog box fixed 1788271 1612078372

There are several hazards within a server room or data centre that can disrupt operations, lead to down time, and potentially cause personal injuries. A formal risk assessment process is a way to identify the hazards and implement control and monitoring measures to mitigate the potential risks. Risk assessments should be carried out by suitably trained personnel in order to comply with health & safety requirements and can assist in improving the overall resilience of a server room or data centre.

Read more ...