08000306838 sales@serverroomenvironments.co.uk
03/12/2019

How Important are Server Room Maintenance Checklists?

Server Room Maintenance

Server rooms should be designed to provide a secure and managed environment in which to run critical IT servers and network infrastructures. Facilities vary in size from IT closets to computer rooms, Edge to regional and even hybrid datacentres. Whatever the size of the setup, each will have a core set of systems that require regular inspection and maintenance in order to ensure their uptime and resilience.

Critical Infrastructure Systems

Aside from servers, software, storage and IT networking devices, most organisations will operate several core critical infrastructure systems including:

  1. Power Protection: uninterruptible power supplies, energy storage battery packs, power distribution units and standby power generators.
  2. Cooling: air conditioners, condensers, air handlers, chillers and humidifiers.
  3. Electrical Circuits and Distribution: LV switchboards, power factor correction (PFC) systems, electrical distribution and sub-distribution boards, breakers and cable connections, automatic transfer switches, UPS bypass switch arrangements, lighting and emergency lighting.
  4. Energy Metering: energy metering at the distribution, sub-distribution and rack levels.
  5. Environment Monitoring: monitoring devices and sensors including temperature (indoor and outdoor), humidity, moisture, power, smoke, fluid and water.
  6. Fire Suppression: suppression agent containers, fire suppression monitoring and alarm panels, power connections/battery replacement and sprinkler systems.
  7. Security: the access control system at the building, room and rack level and the CCTV systems including any motion detectors and IP cameras within the server room itself; if there is no access control check the room visitor list.

Each of these critical systems will have their own maintenance requirements and require at least one (if not more) inspections per year.

Server Room Equipment Management

Most organisations have a dedicated IT manager or IT team whose responsibility it is to monitor the IT servers and software they run on the network. The support for these may be outsourced through a mixture of providers who are specialists in their areas i.e. hardware providers, Cloud hosting and Cybersecurity.

A similar approach can be adopted for critical infrastructure systems, but this can lead to the server room or datacentre operator having to work with a complex supply chain. A more pragmatic approach is to work with a facilities management and systems integration company like Server Room Environments.

General Maintenance Inspections

In addition to the maintenance requirements for critical infrastructure systems, there are other important aspects of a server room facility that also require regular inspection and potential work plans:

  • Access and Space Usage Reviews: an inspection to compare the layout to the design floor plan to ensure all racks and equipment are where they should be, with no additions or use of the space for storage; good access is essential for safe preventative maintenance and to ensure airflow into and around server racks and hot/cold-aisles is as planned for.
  • Building and Rooms Usage: an inspection covering the general condition of the floors, ceilings and walls, windows/seals (if present), leak or water damage, damp or heat marks, exit and entry signs, and pest presence.
  • Cleanliness, Dust, Particles and Rubbish Removal: dust and particulates will build up in within the servers, IT devices, electronic and electrical devices, floor tiles, walls, server racks and cooling systems and will require regular removal and potentially a deep clean to avoid a harmful build-up. It is also important to ensure any rubbish is removed from the room which could present a potential fire hazard or help to fuel one of it were to break out.
  • IT Network Servers, Cables and Connectivity: to ensure servers and network devices are clean with unobstructed air flow, and that there are no loose, bare connections or unlabelled cables, that could lead to a short-circuit, trip-hazard or accidental disconnection.

Within any maintenance plan, one of the most important aspects is visual inspection. This can identify potential issues that are not automatically or immediately captured via monitoring systems. The person responsible for the inspection should be suitably trained and from time to time it can be advantageous for them to be accompanied by a third-party consultant who can challenge accepted practices and thinking.

Visual inspections can also be assisted using technologies such as thermal camera imaging. The hand-held devices can be used to identify both hot and cold areas within the room and within targeted systems and components and should be operated by suitably trained personnel.

A ‘hot-spot’ within a server rack can help to identify a need for additional cooling and/or a rearrangement of the equipment in the rack. High thermal temperatures elsewhere can identify a failing battery, overloaded electrical conductor and electrical inefficiencies that should be investigated. Variable temperatures from too cold to to hot within a server room or datacentre can also indicate a poorly functioning cooling strategy.

Additional information could also be available from any local monitoring software including a central DCIM software platform, and especially where local sensors have been utilised for specific issues including temperature and humidity.

Types of Maintenance

The most common type of maintenance is termed Preventative Maintenance or PM visit. The type of maintenance visit generally adheres to a set task checklist covering various aspects of the system including consumables and any aspects notified to the visiting engineer i.e. made known via a manufacturer’s service bulletin. The aim of a preventative maintenance visit is to check and identify areas of concern, check and/or replace consumables and carry out firmware upgrades. Another term for this type of visit is Planned Maintenance.

The visit is a scheduled one and performed before or at a system manufacturer’s recommended service interval. The objective is to ensure a system is healthy and protected from a future potential breakdown. The PM visit should be carried out by a manufacturer certified engineer. Using certified engineers and approved suppliers also ensures access to manufacturer approved consumables, spares, firmware upgrades and technical information.

Without preventative or planned maintenance, a product manufacturer may void its product warranty or an insurance company remove cover if it cannot be provided with a current inspection/maintenance visit. An organisation can also be in breach of its statutory requirements in terms of electrical wiring and systems testing (BS7671 and the Electrical Systems at Work Regulations) or cooling system inspection (F-Gas regulations).

More info on BS7671 and the Electrical Systems at Work Regulations:
http://www.hse.gov.uk/pUbns/priced/hsr25.pdf

More info on F-Gas Regulations:
https://ec.europa.eu/clima/policies/f-gas/legislation_en

Other types of maintenance visit include:

  • Corrective Maintenance: used to rectify a potential fault or fault within a server room system. A potential fault could be identified during a preventative maintenance visit that requires a second visit to site following a quote for additional works and/or sourcing of the correct spares.
  • Condition-Based Maintenance: is similar to corrective maintenance and can be instigated when a system alerts of the need for an inspection or maintenance visit. The alarm can be reported via a DCIM or other software-based system or an audible or LED alarm.
  • Predictive Maintenance: is designed to prevent deterioration of a system or its components to prevent a future breakdown and higher repair costs. A typical example would be battery testing and the storage of test results for comparative algorithmic analysis to identify a single failing battery which could reduce the performance of the entire UPS battery string. This type of battery testing can be built-into a preventative/planned maintenance visit.

Most organisations will adopt a mixture of preventative/planned maintenance (to meet regulatory, insurance driven requirements, service level agreement and operational certifications) and corrective/condition-based maintenance based on their CAPEX and OPEX budgets.

Maintenance Timetable

How often should a server room be maintained? This is dependent on the size and complexity of the installation. Generally maintenance tasks can be split over time into daily, weekly and monthly tasks.

Task Visual Inspection Preventative Maintenance
General Inspection weekly
Power monthly 6-12months
Cooling monthly annual
Electrical 6months annual
Energy 6months annual
Environment weekly annual
Fire weekly 6-12months
Security annual

Note: 6-12months indicates a need for annual or bi-annual preventative maintenance visits.

Other Service Visit Considerations

We have already mentioned the need to use manufacturer certified engineers for maintenance visits. Even when an emergency callout necessitates the need to get a maintenance engineer onto site consideration should be given to:

  • Supplier Approval Process: any sub-contractor or supplier brought onto sight should be subject to the organisation’s supplier approvals process. This could cover the need for background checks, references and reviews, credit worthiness, and manufacturer certifications.
  • Risk Assessments and Method Statements (RAMS): safe working practices and the mitigation of risk within the server room environment is vital for any task to be completed. A site-specific RAMS document should be provided and accepted prior to commencement of work.
  • Inductions and Cybersecurity: most industrial sites have a 1-2hour induction process which must be built-into a site visit, even during an emergency callout. Inductions are also becoming more important within server room and datacentre environments due to Cybersecurity concerns. Any visiting service engineer should be made aware of any limitations on site including the use of a mobile phone in the server environment and access for comms to the IP/Wi-Fi network.

Other aspects could include permits to work, working hours, parking and access times which will be site-specific.

Summary

For a server room or datacentre to achieve continuous operation, it is important to ensure that all critical systems are inspected and maintained regularly. Most systems will be automatically monitored for alarms via product-specific software or a DCIM package. Even with this in place, regular inspections and planned, preventative maintenance will still be required to manufacturer’s service and warranty requirements, insurance requirements and specific regulations.

The projects team at Server Room Environments can assist will all aspects of server room maintenance including the drawing up of inspection checklists, maintenance contracts and schedules, personnel training, facilities management and maintenance provision, emergency call-outs and reactive service visits.

comments powered by Disqus

Related blog posts

18/03/2019
Next Article
Emergency Call Outs, Warranties and UPS Maintenance Contracts

When you purchase a UPS system or any other type of capital goods like a precision cooling system, it is important to consider what is referred to in engineering as the bathtub curve. Why is this important? Well at some point during the working life of your new system you may have an alarm condition that requires technical support and an emergency call out and if you are not covered by a UPS maintenance contract then this could be a chargeable service.

Read more ...