The Which Group recently published a study [1]stating that the UK banking sector was hit by IT outages on a daily basis in the last nine months of 2018, with 302 reported failures. The major banks had suffered at least one incident apiece every two weeks. This is a highly concerning statistic that exposes the fact that bank outages and IT issues occur much more often than was previously thought[2]. And the impact can cause significant setbacks, financial and otherwise. In one recent example[3], a major bank suffered an outage with costs amounting to over £330 million.

Nick Coleman, Channel Director EMEA of Virtual Instruments, explains why this is happening and what the issues surrounding this trend are.

Regulatory pressure

Firstly, banks are now obliged to report any IT issues to the Financial Conduct Authority (FCA), (and, as in the case of Which, using this data to form their report), IT problems are much more visible. There is greater recognition than ever before on how much serious disruption can be caused by IT outages at financial services institutions for people and businesses. The FCA now regards IT system performance as more important than staff performance. So if a member of staff is signed off of work, it is considered normal business, but if an IT system fails to deliver, it is viewed as a violation. Under regulations enforced by the FCA in August 2018, banks and financial services have to report on how they recover from outages within three months and have been mandated with a maximum acceptable time for systems to be down. They are so reliant on IT systems, it is critical that they take the necessary steps to ensure the business can get back up and running as soon as possible after an outage.

Infrastructure complexity

Secondly, knowing the true root cause of problems before taking any action is key, but a lack of proper infrastructure visibility is preventing banks from effectively managing the situation. With the inherent complexity of today’s hybrid infrastructure brought about by new procurements layered over legacy systems that are not necessarily cohesive, interoperability issues often ensue. The knock-on effects of systems fighting for resources during busy periods can cause latency issues, in turn seriously affecting the performance of business-critical applications. Here, it is not a matter of if, but when.

Over the years, peoples’ perception as to the value of the IT infrastructure has eroded in the eyes of the business.

With digital transformation, IT systems are now beyond human comprehension and require automation and AI-powered IT operations management (AIOps[4]), also known as ‘algorithmic IT operations’, to run efficiently. Unfortunately, IT doesn’t have the investment or influence at board level that it should put the proper performance safeguards and assurances into place. The business insists that their customer-facing applications run as planned, but don’t really care who runs the IT infrastructure for them. They see the infrastructure as an overhead rather than a vital, profit-generating differentiator, giving a competitive advantage.

Lack of performance benchmarks in the cloud

Thirdly, the banking sector has been advised to embrace the cloud and is struggling to migrate applications, often written in the 1980s and 1990s, to a new platform. The cloud suppliers are reluctant to provide a service level agreement (SLA) on application performance, as they do not know the quality of application coding they will be hosting, so there is effectively no one fully accountable if problems occur. This means that at present, a bank can have its customer-facing applications slow down for an hour and as the cloud provider is not accountable, it is not in breach of contract.

For example, performance issues can impact upon banking applications for customer transactions, and if that capability goes down, not only will it be difficult for the IT team to locate the issue and get systems back up and running quickly, there are also implications for the business reputation to deal with following such an incident.

How can this imbalance be addressed?

Bringing insights and the value of IT back to the business

Over the years, peoples’ perception as to the value of the IT infrastructure has eroded in the eyes of the business. This type of thinking is not unique to IT. For example, years ago people used to care about the engine in their cars, but now they just expect it to work and are really irritated if it fails. The same goes for domestic automation: washing machines, dishwashers, stereo systems, etc. These are now just viewed as commodity items to be used and replaced with no emotion. The value of IT in general needs to be recognised and the way to do this is to report on exactly how it is helping the business, in language the business understands.

So how can IT be of more value to the business? Organisations must recognise that a shift in the perception of the importance of the application (which relates to the customer) is needed. As the organisation cares about application performance, and as IT supports the applications, IT should logically also show how well they are running, how cost-efficient they are compared to other suppliers, and how in-house IT has a better understanding of the company direction than any outsourced partner.

Organisations must recognise that a shift in the perception of the importance of the application (which relates to the customer) is needed.

Outmoded infrastructure monitoring methods

In terms of tackling the issue, traditional monitoring capabilities are falling short. The tools are commonly proprietary and simply not able to keep pace with digital transformation occurring today. The core of this recurring outage problem in the financial services industry is that IT teams are simply unable to holistically ‘see’ or create a map of their entire systems environment. Greater infrastructure transparency is required.

Currently, the applications themselves can be monitored using application performance monitoring (APM) tools, – but these only show the application performance outside the data centre with perhaps a bit of hypervisor information. It is a similar story from the switch providers and network monitors as they really only look at their own devices and lack context to other devices and to the applications themselves. The entire hybrid IT infrastructure supporting the application processing needs to be viewed live across the hypervisor, VM, server, network fabric and storage together.

The AIOps solution

AIOps-driven app-centric infrastructure management will be a significant part of the solution. Artificial intelligence applied to IT operations (AIOps) utilises AI and ML (machine learning) to help ensure application and infrastructure performance.

With this holistic approach, AI-based analytics are app-centric, with correlation capabilities that provide highly insightful and integrated views across siloes and end-to-end across the entire infrastructure. In this way, a shared context can be seen across all infrastructure management tools, so that the trends and behaviour of resources can be easily read and understood. With a visual representation of the current infrastructure, IT teams can be certain as to all of the dependencies and exactly which applications are utilising or competing for different infrastructure resources. Thus, potential problems can be avoided in advance, making a meaningful shift from reactive to proactive troubleshooting, saving millions in time, money, loss of business revenue and customer loyalty.

And this is not solely for on-premises; the cloud-based outsourced applications also need this level of scrutiny to ensure performance-based SLAs can be set and then met.

With the ability to assure the performance of their mission-critical applications, banks and financial services organisations place themselves in a position to successfully manage their digital transformation journey, whilst ensuring they meet business goals and most importantly, keep their customers happy.

 

[1] https://www.which.co.uk/news/2019/03/revealed-uk-banks-hit-by-major-it-glitches-every-day/

[2] https://www.parliament.uk/business/committees/committees-a-z/commons-select/treasury-committee/

[3] https://www.theguardian.com/business/2019/feb/01/tsb-computer-meltdown-bill-rises-to-330m

[4]Artificial intelligence for IT operations (AIOps) platforms are software systems that combine big data and AI or machine learning functionality to enhance and partially replace a broad range of IT operations processes and tasks, including availability and performance monitoring, event correlation and analysis, IT service management, and automation.” Gartner – Market Guide for AIOps Platforms. (Published August 2017)