Earlier this year, almost every organisation in the UK had to initiate their business continuity plans and move to widescale remote working in a bid to protect the NHS and general population from COVID-19. For many, the speed and scale of the move increased the pressures on already stretched IT teams. It also brought a greater focus to an area associated with larger-scale datacentre operations and one that up to now has typically received more investment than is made in small-to-medium sized computer or server room installations; remote monitoring solutions.
For computer and server room operators, datacentres often set the standards including information security and energy efficiency. Though smaller-scale operations than can be found in a datacentre, on-site IT networks will be support by similar critical infrastructure systems. Examples include air conditioners and an uninterruptible power supplies and/or local standby power generation. Fire suppressions systems may also be installed though this is less like for a small on-site computer room with only one or two servers. A fire solution is typically installed when there are more servers and racks within a server room due to the confined space and higher power demand footprint.
In more normal times, IT managers may have the luxury of being able to review the best practices deployed by Datacentre managers. The use of data centre infrastructure management (DCIM) packages like the Schneider EcoStruxure within a datacentre environment is a good example. This comprehensive packaged provides a complete overview of the entire IT, power and cooling estate. Supporting multiple protocols almost any device can be connected to the platform for remote monitoring, control and support.
The speed and scale with which organisations have had to move to remote working prevented this and meant that quick solutions had to be identified and deployed. With fewer workers on-site, there was less opportunity for smaller organisations to identify on-site alarms and respond to them accordingly. Added to this are travel restrictions, which prevented some IT personnel from being able to easily return to their sites in an emergency for an alarm pr system reset and problem diagnosis.
Fortunately a solution already existed and one that is well developed and in use in far more environments than purely IT. Remote monitoring solutions is a specialist niche within the wider IT market place and one with solutions for monitoring a wide range of environmental factors within not just computer and server rooms, but datacentres, industrial, retail & food distribution, pharmaceuticals and telecomms applications.
Most environment monitoring devices can be installed within a relatively short time frame. Typical examples include the STE2 and Room Alert 4ER. These devices are powered via an AC adapter or can use Power over Ethernet (PoE). They have a built-in webserver to assist connection to the local network. Sensors are either built-in, for example for temperature, or can be connected via external plug-in sensors for temperature, temperature & humidity, water leakage, smoke, fire, air flow and even power outages.
The on-premise software packages available provide similar features to DCIM packages. They provide an overview view of the IT environment related to the sensors and detectors installed but not on the holistic scale required for a larger datacentre. DCIM packages provide more comprehensive information covering cooling, and power usage per server rack to better assist loading and capacity planning.
Smaller organisations may describe their IT operations as a datacentre and to some degree they are right. A datacentre is a managed and secure environment in which to run IT servers. Computer and server rooms will have some if not all the critical infrastructure elements of a datacentre and most will have air conditioning and some form of uninterruptible power. The differences are ones of scale, and the comprehensiveness of their deployed infrastructure solutions as well as their resilience and levels of N+(x) redundancy.
Environment monitoring may be overlooked as a necessity, however. In a typical set-up, IT system components including servers, storage devices and networking switches will be housed in server racks and the racks will be arranged into an array that makes best use of the local air conditioning and cooling. For most this will be a wall mounted air conditioner. Uninterruptible power supplies may also be deployed to provide emergency backup power if the mains power supply fails. The UPS may also be rack mounted or installed as a floor standing tower system in such a way as to provide power to the server racks and their power distribution units (PDUs).
One of the most monitored critical infrastructure devices within the server room will be the uninterruptible power supply. This will typically be installed with a slot-in SNMP card to allow remote monitoring via an HTTPS browser directly or through a locally installed UPS monitoring and control software package.
Most air conditions are installed without any local remote monitoring. Even though most provide a signal contact status and alarms via a plug-in interface card. More modern systems have Wi-Fi capability to remote alarm via mobile App.
During normal working hours and conditions both approaches may be sufficient. In addition, a UPS system and air conditioner will provide visual and audible alarms via their front panels. These alarms may or may not be noticed as employees pass-by computer or server rooms.
The addition of a dedicated environment monitoring solution into this type of installation provides a centralised platform and alert system. Sensors for specific concerns can be added to the monitoring device and local on-premise software used to monitor over the local network and send alert messages when readings move outside pre-set ‘normal operation’ ranges.
Temperature is the most monitored aspect. An overtemperature can indicate that the local air conditioning has failed or is not performing to specification. Higher temperatures in a server room may or may not be critical for short periods and especially unmanned ones. However, temperatures above 25°C can start to ‘cook’ UPS batteries and lead to increased component failures. If the general room ambient is 25°C, there could be far higher ‘hot-spots’ inside server racks which if left unchecked could lead to potential fire risks.
Humidity and water leakage are other areas that are important to monitor. Higher levels of humidity can lead to increased condensation on cooler areas, with the creation of liquid pools which can lead to a short-circuit. Water leakage from poor cooling infrastructure or local plumbing bursts can also disrupt IT operations.
Most environment monitoring platforms offer on-premise and Cloud-based software portals. Cloud-based monitoring portals remote the need to VPN into a local network and are more easily monitored remotely than LAN based platforms. Cloud-based systems can also provide more user-functionality when managing multiple locations and estates, displaying geographic maps and more comprehensive dashboards.
Both types can be configured to provide email alert status updates for any of the sensors installed. If the IP/Ethernet network, they are connected to goes down then a ‘disconnected’ alert is provided.
In addition, it may be possible to configure SMS text alerts and phone calls via a software platform. These may be actioned via a SMS-gateway connected to the network or an email-to-SMS service. Here the Cloud monitoring portal sends an email to a third-party platform where a subscription account is set to convert the email to an SMS text and distribute this to a defined list of mobile phone numbers. For IT managers swamped with emails, a text alert can be more immediately noticed and responded to.
In addition to monitoring the local environment, it is also important that a Business Continuity plan covers other aspects that can affect information security, availability, and uptime:
For more information on Information Security Management and Cyber Security standards visit: https://www.bsigroup.com/en-GB/iso-27001-information-security/ or https://www.gov.uk/government/publications/cyber-essentials-scheme-overview.
Business Continuity plans have rarely been flexed or implemented on a scale as earlier this year. The move to remote working may be temporary or could become the norm for many organisations. Some global multinationals have already stated that their employees will not return to work in their official buildings until at least 2021 if ever. The change in how we work means less people on site and the need to use remote monitoring solutions to their IT infrastructures. With such monitoring systems in place, IT managers are in a stronger position to guarantee the availability and uptime of their computer and server rooms and prevent downtime from air conditioning failures and over temperatures, humidity problems, water leakages and other environmental factors that could disrupt operations and services.
Which IT Asset Management (ITAM) tracking system do you use for your network devices? It may or may not surprise you that many smaller computer and server room operators use spreadsheets to track their IT assets. Whilst this approach may be fine for smaller IT operations with assets running into the 10s, it is not efficient and can lead to out-of-date information, duplicates, inaccurate serial numbers and tag overlaps. The average error rate for spreadsheet-based IT asset management is around 15%. Larger facilities and datacentres have to take a more software-based approach due to the number of IT assets involved and some use smart tagging systems to track and improve physical asset security.
The power electronics circuits used within an uninterruptible power supply continue to evolve as UPS manufacturers develop their designs to be more energy efficient and compact. However, the overall size of a UPS system is limited by its battery pack and the amount of runtime that is required by the critical load when the mains power supply fails. How much runtime is required depends on the business continuity plan in place for any given organisation.