Server rooms should be designed to provide a secure and managed environment in which to run critical IT servers and network infrastructures. Facilities vary in size from IT closets to computer rooms, Edge to regional and even hybrid datacentres. Whatever the size of the setup, each will have a core set of systems that require regular inspection and maintenance in order to ensure their uptime and resilience.
Aside from servers, software, storage and IT networking devices, most organisations will operate several core critical infrastructure systems including:
Each of these critical systems will have their own maintenance requirements and require at least one (if not more) inspections per year.
Most organisations have a dedicated IT manager or IT team whose responsibility it is to monitor the IT servers and software they run on the network. The support for these may be outsourced through a mixture of providers who are specialists in their areas i.e. hardware providers, Cloud hosting and Cybersecurity.
A similar approach can be adopted for critical infrastructure systems, but this can lead to the server room or datacentre operator having to work with a complex supply chain. A more pragmatic approach is to work with a facilities management and systems integration company like Server Room Environments.
In addition to the maintenance requirements for critical infrastructure systems, there are other important aspects of a server room facility that also require regular inspection and potential work plans:
Within any maintenance plan, one of the most important aspects is visual inspection. This can identify potential issues that are not automatically or immediately captured via monitoring systems. The person responsible for the inspection should be suitably trained and from time to time it can be advantageous for them to be accompanied by a third-party consultant who can challenge accepted practices and thinking.
Visual inspections can also be assisted using technologies such as thermal camera imaging. The hand-held devices can be used to identify both hot and cold areas within the room and within targeted systems and components and should be operated by suitably trained personnel.
A ‘hot-spot’ within a server rack can help to identify a need for additional cooling and/or a rearrangement of the equipment in the rack. High thermal temperatures elsewhere can identify a failing battery, overloaded electrical conductor and electrical inefficiencies that should be investigated. Variable temperatures from too cold to to hot within a server room or datacentre can also indicate a poorly functioning cooling strategy.
Additional information could also be available from any local monitoring software including a central DCIM software platform, and especially where local sensors have been utilised for specific issues including temperature and humidity.
The most common type of maintenance is termed Preventative Maintenance or PM visit. The type of maintenance visit generally adheres to a set task checklist covering various aspects of the system including consumables and any aspects notified to the visiting engineer i.e. made known via a manufacturer’s service bulletin. The aim of a preventative maintenance visit is to check and identify areas of concern, check and/or replace consumables and carry out firmware upgrades. Another term for this type of visit is Planned Maintenance.
The visit is a scheduled one and performed before or at a system manufacturer’s recommended service interval. The objective is to ensure a system is healthy and protected from a future potential breakdown. The PM visit should be carried out by a manufacturer certified engineer. Using certified engineers and approved suppliers also ensures access to manufacturer approved consumables, spares, firmware upgrades and technical information.
Without preventative or planned maintenance, a product manufacturer may void its product warranty or an insurance company remove cover if it cannot be provided with a current inspection/maintenance visit. An organisation can also be in breach of its statutory requirements in terms of electrical wiring and systems testing (BS7671 and the Electrical Systems at Work Regulations) or cooling system inspection (F-Gas regulations).
More info on BS7671 and the Electrical Systems at Work Regulations:
More info on F-Gas Regulations:
Other types of maintenance visit include:
Most organisations will adopt a mixture of preventative/planned maintenance (to meet regulatory, insurance driven requirements, service level agreement and operational certifications) and corrective/condition-based maintenance based on their CAPEX and OPEX budgets.
How often should a server room be maintained? This is dependent on the size and complexity of the installation. Generally maintenance tasks can be split over time into daily, weekly and monthly tasks.
|Task||Visual Inspection||Preventative Maintenance|
Note: 6-12months indicates a need for annual or bi-annual preventative maintenance visits.
We have already mentioned the need to use manufacturer certified engineers for maintenance visits. Even when an emergency callout necessitates the need to get a maintenance engineer onto site consideration should be given to:
Other aspects could include permits to work, working hours, parking and access times which will be site-specific.
For a server room or datacentre to achieve continuous operation, it is important to ensure that all critical systems are inspected and maintained regularly. Most systems will be automatically monitored for alarms via product-specific software or a DCIM package. Even with this in place, regular inspections and planned, preventative maintenance will still be required to manufacturer’s service and warranty requirements, insurance requirements and specific regulations.
The projects team at Server Room Environments can assist will all aspects of server room maintenance including the drawing up of inspection checklists, maintenance contracts and schedules, personnel training, facilities management and maintenance provision, emergency call-outs and reactive service visits.
Uninterruptible power supplies are no different to other critical systems (cooling and fire suppression) within a server room or data centre environment. For a UPS system to provide no-break backup power on demand, it must be regularly inspected and maintained. Most UPS failures are down to poor battery health, leading to shorter than expected runtime and IT load downtime. There are other factors that can also limit the ability of a UPS to keep critical loads running and like battery health, they can only be identified and/or prevented from regular inspection and maintenance.
When you purchase a UPS system or any other type of capital goods like a precision cooling system, it is important to consider what is referred to in engineering as the bathtub curve. Why is this important? Well at some point during the working life of your new system you may have an alarm condition that requires technical support and an emergency call out and if you are not covered by a UPS maintenance contract then this could be a chargeable service.