When you ask machine learning (ML) engineers about their biggest challenges, monitoring and observability often tops the list. There are a number of reasons for this, including data drift, concept drift, bias and fairness, and adversarial inputs to name a few. Discovering the problem is usually the first step when it comes to solving it, and this is where monitoring and observability come in.
ML Monitoring and Observability helps you discover the issues that appear during the ML lifecycle. As ML applications become more common across industries, monitoring the performance of these models becomes even more critical. In this article, I will explain what monitoring and observability are, their differences, and how you can put monitoring and observability into practice using Aporia.
What is ML Monitoring?
ML monitoring in machine learning is the method of tracking the performance metrics of a model from development to production and understanding the issues associated with the model’s performance. Metrics that should be monitored include Accuracy, Recall, Precision, F1 Score, MAE, RMSE, etc.
One of the most important elements of ML Monitoring is its alert system, which notifies data scientists or ML engineers when a change/failure is detected.
This requires setting conditions and designing metrics or thresholds that make it clear when an issue arises.
ML Monitoring is an encompassing process that includes monitoring the:
- Data: The ML monitoring system monitors the data used during training and production to ensure its quality, consistency, and accuracy, as well as security and validity.
- Model: Monitoring the model comes after the model has been deployed. The monitoring system looks out for changes in the model and alerts the data scientist when changes occur.
- Environment: The environment where the model is developed and deployed also contributes to the overall model performance. If there are issues with either environment it affects the performance. The ML monitoring system checks for metrics such as system CPU, memory, disk, I/O utilization, etc.
Monitoring all of these helps identify issues as soon as they occur and enables the data scientist to intervene and resolve them.
The long-run performance of a model may be affected by the changes in data and in the environment over time. Since ML models are prone to errors, model monitoring enables an organization to avoid these issues and focus more on improving performance in their projects.
Monitoring for data drifts, concept drifts, memory leaks, etc. for models in production is critical for the success of your machine learning project and the results you hope to achieve. It helps you identify model drift, data/feature drift, and data leakage, which can lead to poor accuracy, underperformance, or unexpected bias.
Let’s check out what these different issues actually mean:
- Data Drift: Once models are live in production, the input data can change over time, which then causes degradation in the model’s performance and accuracy. The primary issue here is that the data used during training, testing, and model validation changes and is different from the data input in production. Therefore, it’s important to consistently monitor for data drift.
- Concept Drift: Since production models are used in real-time and data in real-time evolves, changes in the relationships between input and output data are bound to happen. This is known as concept drift. Here, the data has evolved based on real-time events, changes in consumer patterns, etc.
- Adversarial Inputs: It’s important to be on the lookout for data inputs made by an attacker, which can cause performance degradation.
- Bias and Fairness: As users interact with a model, they unintentionally bring their own bias, and often the data used from training the model can be biased as well. Monitoring for bias is essential to ensure a model provides fair and accurate predictions.
- Data Leakages: This occurs when the dataset used during training contains relevant data, but similar data is not obtainable when the model is in production. This results in a higher accuracy rate during training and low performance during production because of the difference in the dataset.
- For critical areas like healthcare and finance, where a model’s decisions might have serious implications, it’s important to keep logs and proactively monitor the model development.
- Bugs: When deploying ML models, a myriad of issues can arise that weren’t seen during testing or validation, not just within the data itself. It could be the system usage, a UX error, etc.
ML Monitoring is important for the following reasons:
- It enables you to analyze the accuracy of the prediction
- It helps eliminate prediction errors.
- It ensures the best performance by alerting the data scientist to issues as they arise.
What is ML Observability?
Observability measures the health of the internal states of a system by understanding the relationship between the system’s inputs, outputs, and environment. In machine learning, this means monitoring and analyzing the inputs, prediction requests, and generated predictions from your model before providing an understanding of insights whenever there is an outage.
The concept of observability comes from the control system theory which tells us that you can only control a system to the extent to which you can know/observe it. This means that the idea of controlling the accuracy of the results, usually across different components of a system, requires observability.
In ML systems, observability becomes more complex as you need to consider multiple interacting systems and services such as data inputs/pipelines, model notebooks, cloud deployments, containerized infrastructure, distributed systems, and microservices. This generally means that there are a substantial number of systems that you need to monitor and aggregate.
ML Observability combines the stats of performance data and metrics from every part of an ML system to provide insight into the problems facing the ML system. So more than alerting the user to the problem arising from the model, ML observability provides resolutions and insights for solving the problem.
Making measurements is crucial for ML Observability. Just like when you’re analyzing your model performance during training, measuring top-level metrics is not enough and will provide an incomplete picture. You need to slice your data to understand how your model performs for various data subsets. ML Observability also has a slice and dice approach in evaluating the performance of the model.
The concept of Observability doesn’t just stop at application performance and error logging. It also includes monitoring and analyzing prediction requests, performance metrics and the generated predictions from your models over time, in addition to evaluating the results.
Another important factor needed for ML Observability is having domain knowledge. Domain knowledge helps with precise and accurate insight into the changes that occur in the model. For example, when modeling and evaluating a recommender model for an eCommerce fashion store, you need to be aware of the fashion trends to properly understand the changes that occur in the model. Domain knowledge also helps during data collection & processing, feature engineering, and result interpretation.
ML Monitoring vs. ML Observability
The simple difference between ML Monitoring and ML Observability is “the What vs. the Why”.
ML Monitoring tells us “the What”, while ML Observability explains the What, Why, and sometimes How to Resolve It.
|ML monitoring notifies us about the problem.||ML observability is knowing the problem exists, understanding why the problem exists and how to resolve it.|
|Monitoring alerts us to a component’s outage or failure.||ML Observability gives a system view on outages – taking the whole system into account.|
|ML Monitoring answers the what and when of model problems.
Monitoring tells you whether the system works.
|ML Observability gives the context of why and how.
Observability lets you ask why it’s not working.
|ML Monitoring is failure-centric.||ML Observability understands the system regardless of an outage.|
For instance, let’s say a model in production faces a concept drift problem. An ML Monitoring solution will be able to detect the performance degradation in the model. In contrast, an ML Observability solution will compare data distributions and other key indicators to help pinpoint the cause of the drift. This is something solutions like Aporia do well.
ML Observability with Aporia
Aporia’s full-stack ML observability solution gives data scientists and ML engineers the visibility, monitoring and automation, investigation tools, and explainability to understand why models predict what they do, how they perform in production over time, and where they can be improved.
Aporia provides a customizable monitoring and observability system for your machine learning models. It lets you monitor prediction drifts, data drifts, missing values at input, freshness, F1 Score, etc.
Aporia Monitor Dashboard
Aporia enables you to set alerts for over 50 different types of monitors and gives you actionable insights on how to resolve those issues, i.e. ML Observability.
Aporia Alert Dashboard
Both ML Observability and ML Monitoring are integral parts of the ML lifecycle – and for a while, it was an essential and missing piece of the ML infrastructure. ML observability is a competitive advantage for your ML team as the more observability that you have, the more insights your team can gain from the models and their behavior.
As ML engineers and data scientists, we need a systematic approach to monitor your machine learning models at all levels – data, model and deployment. Aporia provides you with a flexible platform that enables you to build monitoring and observability that fits the needs of your particular use case and machine learning models.