Network Management Blog

How We Help Our Customers Get The Right Monitoring Technology

In a competitive market place, where decision-makers have multiple routes to market and in many cases, only a limited amount of time to ensure that their tool of choice achieves their outlined requirements KedronUK have over 15 Year’s experience of designing, installing and supporting IT management projects meaning we have a tried and tested consultative approach that ensures intelligent solutions and happy Customers!

We asked one of our Account Executives, Phil Simms to describe how we do that and below is what he said.

“I’ve worked at KedronUK now for 3 years and what I love about the role is I am always confident that if my colleagues in the Technical team say they can do something for a Customer, I am confident that we are able to deliver. My roles is Sales, but I’d find it impossible to that if I wasn’t sure we could deliver a quality solution and build a long term Customer relationship.

Our engagements are broken down into 3 key areas – PLAN, PROVIDE, ASSURE.

PLAN - All of our engagements start with a discovery meeting, where we need to understand what the Customer's objectives are, what other stakeholders need and where the business wants to get to with their investment. We also need to understand the Customer's unique environment, how it comes together and how our solutions will work within it.

Now, many Companies have just a couple of options when it comes to IT Management Tools and will look to bend those available features into the Customers’ requirements. Something that makes KedronUK unique is that we have a broad range of established Partnerships, each a leader in its field, meaning we can look pragmatically at a Customer’s requirements and select the best fit technology to meet those requirements.

Based on that, we produce a recommendation document with a high-level solution design which also provides an idea of costs. This is provided free of charge.

When the deliverables are agreed Kedron are able to deliver a fully project managed proof of concept where the Customer can see the solution working in the Customers environment delivering on their requirements.

PROVIDE - So Provide is our installation and configuration service. Again, something that makes us stand out is that we produce a Statement of Work (SoW) that captures the mutually agreed technical specification so there is no doubt about what our objectives is.

The Customer is then assigned a Project Manager who applies a proven Project Management and Delivery Process. Including Robust Assurance & Quality Testing. He then produces a project plan which is followed by our delivery teams and managed by him to make sure everything runs to a schedule. At that point, handover training will be completed.

ASSURE - Post installation we provide various levels of service depending on our Customers support requirements, sometimes this is just 9-5 telephone support and sometimes it might be governing the system on the Customer's behalf, integrating with their teams on a 24x7 basis.

Whatever the service-level they choose, all Customers get ongoing Technical Account Management with a core goal of making sure the solution stays up to date with the Customer's dynamic environment and continuously improves.“

A good explanation from Phil of how we work here at KedronUK, please get in touch here if you’re interested in finding out what our Technical Experts would recommend for you, there’s no charge for the initial consultation!

There’s SAP Monitoring and Then There’s SAP Monitoring

The following Blog Post has been written by Chris Booth our Solution Architect.

As specialists in Network and Application Performance Monitoring (NPM/APM), we often speak to customers who have business-critical but hard to monitor applications. These applications can be a source of considerable frustration across a business. End-users find their everyday work held up by poor performance, whilst the IT teams supporting the application crave more information to understand where issues lie. Every IT professional would love to have no issues to find and fix, but the modern IT environment with layers of virtualisation, middleware, databases, and more mean tracing issues is not easy. Application vendors may supply management tools as standard, but these can lack functionality and thus provide little insight into problems.

Ranked #1 for ERP revenue by Gartner (“Market Share Analysis: ERP Software, Worldwide, 2018”), SAP will often be found In the business-critical application space. By definition, ERP will be underpinning a business and any issues with it can quickly be felt in lost productivity and, ultimately, revenue. With SAP being accessible through both a thick client and a web browser, the infrastructure supporting SAP access has become more complex. However, to an end-user, who is quite rightly ignorant of the complexity of a modern SAP deployment, the problem will often be a simple “SAP is slow”.

Virtualisation can also make troubleshooting more difficult. A Java middleware VM may be running slowly but is the source of the issue directly from the VM? Or could another VM on the same virtualisation host be consuming excessive resources and thus impacting other guests?

Thus to effectively monitor and troubleshoot issues, a tool needs to look beyond just the core components of SAP. Incorporating metrics from layers such as virtualisation and storage allows the “big picture” to be considered. Caution is needed though - whilst knowing more is generally considered to be a benefit, it can also lead to instances of the “alert cannon” which overwhelm IT teams with multiple alerts arriving at the same time. It can then take the team longer to fix an issue as they hunt for the root cause which may be buried in the alerts.

Alongside supporting a wide range of technologies, the ability to correlate events becomes a key requirement for the monitoring system. This means the tool can asses multiple alerts, highlighting the root alert, helping the IT team find the root-cause quicker.

The eG Innovations Enterprise Suite offers end-to-end monitoring and performance analytics, collecting metrics from over 180 applications (including SAP modules such as ABAP, ITS, and Web Dispatcher), operating systems and virtualisation platforms. With automatic baselining and event correlation it provides a very powerful tool to help IT teams manage and support complex and critical applications such as SAP.

Part 2 Machine Learning

The following Blog Post is written by Brian Steele our Technical Development Lead.

These days, the terms “Machine Learning” and “Artificial Intelligence” are phrases that are bandied around a great deal when vendors talk about the features of their Monitoring Solutions. But “What is Machine Learning?” and “How does Machine Learning help Monitoring Tools?”

In our previous Blog we discussed “What is Machine Learning?”, now we ask “How does Machine Learning help Monitoring Tools?”

Part 2 – How does Machine Learning help Monitoring Tools?

The Goal of Machine Learning in Monitoring

The goal of Machine Learning in monitoring is to reduce the administrative and diagnostic workload of engineers to help them get to the root cause of a problem faster.

This is achieved by deploying a Monitoring Solution that uses the following main initiatives augmented by Machine Learning to simplify and automate the root cause analysis process:

1. Correlate ALERTS - Identify alerts that correlate to a single root cause alert.

2. Relevant METRICS - Identify which metrics are relevant to diagnosing the root cause issue.

3. WHAT IS NORMAL - learn what is “normal” for any given metric.

If the environment to be monitored is a simple one then a single monitoring tool may be enough. In a more complex environment, multiple tools may be required to collect the different types of metrics needed to perform comprehensive root cause analysis. Either way, a MLMS (Machine Learning Monitoring Solution) can be implemented by deploying a single tool that has ML (Machine Learning) features or by integrating multiple tools into an AI-Ops tool.

Correlate Alerts

We are all used to getting “hundreds” of alerts for the same root cause issue! A Machine Learning Monitoring Solution can help with this problem through the following mechanisms:

1. SERVICE TOPOLOGY – The Machine Learning Monitoring Solution is configured with an understanding of service topology. In other words, it knows the parent/child relationships between configuration items. This understanding of service topology can be configured manually or learned from monitoring tools that have discovered the hierarchy. Because Machine Learning can see the parent/child relationships it will know that if a parent configuration item fails, then there is little point in taking heed of the alerts coming from its child configuration items; so, it mutes them or labels them as being secondary to the root cause alert.

2. ALL ALERTS ALL THE TIME - During model training, Machine Learning looks at all alerts to see if there is any correlation between seeming unrelated alerts. By monitoring all alerts all the time, Machine Learning may observe that two alerts always occur roughly at the same time making them a prime candidate for correlation. This observation can still be true despite the fact that they don’t appear within the same service topology tree and therefore don’t have any parent/child relationship. By correlating alerts in this way, a Machine Learning Monitoring Solution can consolidate many alerts under a single root cause alert reducing the alert cannon. In the following example, the four alerts (A01, A02, A03, and B01) are consolidated to a single root cause alert, which could be A02:

Relevant Metrics

Consider a critical application that is running slow. The application is hosted on a single server so only the metrics from that server are being monitored to determine the health of the application. However; the application is particularly sensitive to the performance of its disc storage which is hosted on a SAN which is in turn serving multiple applications across the wider landscape. Taking all of this into account, it becomes obvious that the availability and performance of the application is dependant on much more than just its hosting server.

Machine Learning will determine which metrics are relevant to the availability and performance of an application during the selection of the hyper-parameters in the model training phase. In this way, a Machine Learning Monitoring Solution will, without the restriction of human assumptions, have a much more open and comprehensive determination of service availability and performance. This is a massive help when it comes to predicting potential problems (before they manifest themselves) and performing root cause analysis.

What is Normal

For a traditional monitoring tool, the engineer is required to select an appropriate threshold for warning and critical alerts against each metric being monitored. The problem with this approach is that what is an appropriate threshold for one configuration item will be something completely different for another. Therefore, what tends to happen, is that engineers accept the default system-wide thresholds set by the monitoring tool vendor. This gets you by but is inappropriate as a long-term solution because it leads to some conditions not generating alerts while others generate too many alerts.

A Machine Learning Monitoring Solution will learn the correct threshold for any metric based on its history during normal behaviour. The Machine Learning will also consider the time of day or day of the week and month to cater for regular processes like backups etc... which are expected to have an impact. This is a great help in keeping the Monitoring Soluton sufficiently sensitive to issues while reducing or eliminating false alerts.

In Conclusion

You don’t have to configure what data to monitor; Machine Learning will work that out for you.
You don’t have to configure rules and thresholds; Machine Learning will work that out for you.
Machine Learning will predict results even if the metrics have never occurred before
The predictions improve over time as the Machine Learning observes more edge cases.

If you have found this introduction interesting then download our mini-presentation here to get more insights into Machine Learning including the fundamentals of Neural Networks.

Part 1 What is Machine Learning?

The following Blog Post is written by Brian Steele our Technical Development Lead.

These days, the terms “Machine Learning” and “Artificial Intelligence” are phrases that are bandied around a great deal when vendors talk about the features of the monitoring solutions. But “What is Machine Learning?” and “How does Machine Learning help Monitoring Tools?”

This is a two-part Blog that aims to answer these questions starting with “What is Machine Learning?”:

The Goal of Machine Learning

The goal of Machine Learning is to take some existing real-world data (like monitoring metrics for CPU, Memory, and Disc, etc…) along with a target metric (like device availability, application performance or user experience) then work out the following:

Which of the real-world data is important? So, for example… By examining the CPU, Memory and Disc utilisation of a server over a two-week period, I may be able to determine that only the CPU utilisation has any correlation to the observed server application performance. So, the Machine Learning has to work out that it needs to focus on the CPU utilisation, and can afford to ignore all other metrics if it wants to be able to predict the availability and performance of the server in question.
What is the relationship between the real-world data and the target metric? So, for example… By examining the CPU, Memory and Disc utilisation of a server over a two-week period, I should be able to develop a formula (function) that takes these metrics as parameters and then provides an answer which closely matches the observed availability and performance of the server in question.
Accept new, unseen, real work data and predict what the target metric is likely to be So, by monitoring the required metrics from (1) and feeding them into the formula developed in (2) I should be able to take any new unseen set of metrics for a given server and then calculate the expected availability and performance.

What does Machine Learning look like?

The graph below helps us to visualise what is going on inside Machine Learning in a scenario where it has worked out that it only needs to monitor the CPU percentage utilisation (Horizontal axis) to predict the server latency (Vertical axis). Remember, it has done this by observing real-life data, nobody had to sit down and program this understanding through the configuration of rules.

The Machine Learning has further observed that it is able to group servers together by their role which enables a more accurate prediction of expected latency for any given server as a function of its CPU utilisation and server role. The Machine Learning also develops a formula (function) which will draw a line through the data points to closely approximate the observed results. This line can then be used to predict the expected results from any intermediate point where we have previously had no data to guide us.

We don’t want to memorise results we want to calculate them

In the following example, we see that the “Web Servers” have results that don’t follow a straight line through the other server roles. In this case, the Machine Learning will develop a more complex formula (function) which will closely follow the observed results while still allowing us to predict the expected results for any intermediate point. During the development of the formula (function), the Machine Learning continually compares its predicted results with the real-world results to see how closely it is tracking. Once the formula (function) is within a close enough approximation it is considered to be “Right Fitting” and the development of the formula (function) stops.

By contrast… in the following example, the Machine Learning has overdeveloped the formula (function) so that the predicted results exactly match the real-life data. This is called “Over Fitting” and means that the Machine Learning has basically “Memorised” the results. This means that its effectiveness will be compromised when it comes to predicting the results for intermediate points.

In Conclusion

You don’t have to configure what data to monitor; Machine Learning will work that out.
You don’t have to configure rules and thresholds; Machine Learning will work that out.
Machine Learning will predict results even if the metrics have never occurred before.
The predictions improve over time as the Machine Learning observes more edge cases.

If you have found this introduction interesting then download our mini-presentation here to get more insights into Machine Learning including the fundamentals of Neural Networks.

Part-2 in this series will explore the topic “How Does Machine Learning Help Monitoring Tools?”

Call us today on

01782 752 369