Effective server room monitoring: An investment worth making

We can support you to reduce business downtime and save money

Human error causes outages mitigated by effective monitoring Human error causes outages mitigated by effective monitoring

IT outages cost money for businesses and while effective server room monitoring software and devices have been available for years, with new appliances being developed more recently, it is still not always prioritised as it should be. If you decide to invest, we can help devise the right solutions for you which will save time and money for your business.

Implications of IT Outages can be significant

Cloud-based monitoring solutions provider, LogicMonitor, commissioned an international survey of 300 companies, including 100 based in the UK, in 2019. The IT decision makers questioned were at organisations with 2,500 or more employees. The study reported that 96% of its respondents said their organisations had suffered at least one IT outage in the previous three years. LogicMonitor found that in the UK 51% of the companies had experienced 5 or more outages over the last 3 years while 49% of UK-based IT decision makers had experienced 4 or less outages over the last 3 years.

These outages could have resulted in high costs and a poor experience for customers. Meanwhile companies operating in regulated sectors, such as healthcare and finance, could also be penalised for compliance failure.

Additionally, the survey showed that over half of the respondents admitted that the outages, and brownouts, which mean the system may remain available but will have experienced a significant slowdown, could have been avoided

Meanwhile, research by the US-based Ponemon Institute into 63 data centres found that the cost of IT outages rose to $740,358 or £544,336 in 2016, which was a 38% rise since 2010. Separately, Schneider Electric has suggested that organisations lose as much as $100m or £74m a year to downtime related to information and communication technology so you cannot afford to forget monitoring your IT spaces.

Causes of IT outages

Human Error

In our video from 2015 we reported that over 70% of IT outages in server rooms, IT rooms or communication rooms were directly attributed to human error.

In the survey published by LogicMonitor in 2019 it found that across the 300 IT managers questioned human error was the third most common cause of an IT outage identified, and the fifth among the 100 UK companies. In the Ponemon Institute research published in 2016 human or accidental error caused outages in 22% of the US data centres surveyed.

The consequences of human error can be devastating. For example, it was reported by Network World that an IT outage in May 2017 was caused by an engineer who had disconnected a power supply at a data centre near London’s Heathrow airport. When the power supply was reconnected, it caused a surge of power that resulted in major damage that forced British Airways (BA) to cancel more than 400 flights and strand 75,000 passengers in one day.

UPS System and Battery Failure

UPS battery failure can also be a common cause of IT outages and we reported in our video that 65% of IT directors have experienced outages due to this.

UPS (uninterruptible power supply) appliances provide power protection and surge protection for electronic equipment. UPS devices sit in between your IT device and the power circuit. It provides a level power to that device and will smooth out any peaks or troughs in the power supply.

In addition, in the event of a power outage the UPS would be able to provide power from its onboard battery to your advice giving you continuity in your IT systems. However, you need to keep the UPS system maintained by monitoring the batteries to ensure they continue to have the power to provide that backup when it is required.

The Ponemon Institute report from 2016 found that UPS system failure was the single main cause of unplanned data center outages, accounting for one-quarter of all such events. Among the 100 US managers surveyed for Logitech loss of electrical power was found to be the fourth most identified cause of an IT outage.

Network and other Failures

Network failure was recorded as the main reason for IT outages among the 100 UK IT managers surveyed for LogicMonitor in 2019 and it was the top cause identified across 300 IT managers internationally.

Usage spikes and surges were named as the second most common cause of outages in the 2019 LogicMonitor study of 300 companies, among the UK companies it was named as the third most common reason. Meanwhile hardware and software failures were also identified as a common cause of IT outage.

According to the 300 global respondents, the top two missed opportunities when it came to preventing downtime are:

  • Passing a capacity threshold: Failing to notice when usage is trending towards a danger level. For example, this might be more traffic than the network can efficiently handle, or a primary storage share running out of space.
  • Failure of hardware/software: Failing to notice that critical hardware/software performance is trending downward

The importance of effective IT monitoring

With effective server room monitoring you are able to concentrate on running your core business rather than firefighting to retain business continuity.

  • Has somebody unplugged the wrong device?
  • Is the air conditioning starting to fail causing overheating?
  • Has the air become too dry or too humid for the electronic circuits?
  • Is the power spiking?
  • Has a water pipe started leaking?
  • Are the UPS batteries charging properly?
  • Has one of the servers starting over overheating?

These are questions we can help you answer through the range of monitoring solutions we provide that will help you be smarter about protecting your IT devices and networks and deal with issues before they actually cause an outage.

Server Room monitoring and protecting your IT network

In addition to the monitoring devices and software we also future-proof your networks in our installations by using Excel cabling which is a world-class premium performance end-to-end infrastructure solution. We can also offer access control and surveillance camera solutions that will help monitor IT areas and reduce the risk of human error.

APC Netbotz 250 rack monitor: This provides a comprehensive and effective monitoring solution for both environmental factors such as leaking, overheating and fire or smoke as well as access control risks. When you purchase the Netbotz 250 temperature and humidity sensors are included and you can have up to six sensors, for other factors such as fluid, directly into the appliance. If you also want to protect against human error or deliberate human interference you can purchase APC Netbotz rack access handle kits, which will provide you with an audit trail about who is accessing the enclosures.

APC NetBotz Rack Monitor 750: This is the first of the V5 Netbotz appliances providing integrated surveillance, sensing, access control, and advanced alerting for IT environments of all sizes, from single-rack edge networks to large data centres. The Netbotz 750 includes an integrated suite of sensors, access control pods, and the HD Camera Pod 165.

NetBotz Camera Pod 165: This is an IP-based, power-over-ethernet (POE) camera managed by a NetBotz Rack Monitor 750, either plugged directly into the rack monitor’s private switch or discovered over the network to aid intrusion detection and help maintain compliance. The APC NetBotz appliances can operate independently or we can group the Netbotz appliances together and offer a completely scalable solution.

The appliances can be managed through Schneider Electric’s StruxureWare Data Center Expert software to provide effective monitoring for your organisation. As this is installed through a web browser it is easy to set up and use. The IT staff can monitor the computer room, whether at their desk or working from home. Units can also be configured to alert technical support staff of potential issues via email, SMS or via SNMP.

For effective real-time monitoring solutions contact us

We encourage all our customers to be smart and know everything and through our advanced and effective monitoring solutions you can do just that. IT outages are bad for your business and can be prevented and so investing in the right equipment and software to keep you in control will pay off in the long term. To find out more, or to request a server room or data centre audit, contact us.