Model Drift: What Is It and How to Prevent It
Model drift refers to the change in the statistical properties of the target function that a machine learning model is...
When you ask machine learning (ML) engineers about their biggest challenges, monitoring and observability often tops the list. There are a number of reasons for this, including data drift, concept drift, bias and fairness, and adversarial inputs to name a few. Discovering the problem is usually the first step when it comes to solving it, and this is where monitoring and observability come in.
ML Monitoring and Observability helps you discover the issues that appear during the ML lifecycle. As ML applications become more common across industries, monitoring the performance of these models becomes even more critical. In this article, I will explain what monitoring and observability are, their differences, and how you can put monitoring and observability into practice using Aporia.
ML monitoring in machine learning is the method of tracking the performance metrics of a model from development to production and understanding the issues associated with the model’s performance. Metrics that should be monitored include Accuracy, Recall, Precision, F1 Score, MAE, RMSE, etc.
One of the most important elements of ML Monitoring is its alert system, which notifies data scientists or ML engineers when a change/failure is detected.
This requires setting conditions and designing metrics or thresholds that make it clear when an issue arises.
ML Monitoring is an encompassing process that includes monitoring the:
Monitoring all of these helps identify issues as soon as they occur and enables the data scientist to intervene and resolve them.
The long-run performance of a model may be affected by the changes in data and in the environment over time. Since ML models are prone to errors, model monitoring enables an organization to avoid these issues and focus more on improving performance in their projects.
Monitoring for data drifts, concept drifts, memory leaks, etc. for models in production is critical for the success of your machine learning project and the results you hope to achieve. It helps you identify model drift, data/feature drift, and data leakage, which can lead to poor accuracy, underperformance, or unexpected bias.
Let’s check out what these different issues actually mean:
ML Monitoring is important for the following reasons:
Observability measures the health of the internal states of a system by understanding the relationship between the system’s inputs, outputs, and environment. In machine learning, this means monitoring and analyzing the inputs, prediction requests, and generated predictions from your model before providing an understanding of insights whenever there is an outage.
The concept of observability comes from the control system theory which tells us that you can only control a system to the extent to which you can know/observe it. This means that the idea of controlling the accuracy of the results, usually across different components of a system, requires observability.
In ML systems, observability becomes more complex as you need to consider multiple interacting systems and services such as data inputs/pipelines, model notebooks, cloud deployments, containerized infrastructure, distributed systems, and microservices. This generally means that there are a substantial number of systems that you need to monitor and aggregate.
ML Observability combines the stats of performance data and metrics from every part of an ML system to provide insight into the problems facing the ML system. So more than alerting the user to the problem arising from the model, ML observability provides resolutions and insights for solving the problem.
Making measurements is crucial for ML Observability. Just like when you’re analyzing your model performance during training, measuring top-level metrics is not enough and will provide an incomplete picture. You need to slice your data to understand how your model performs for various data subsets. ML Observability also has a slice and dice approach in evaluating the performance of the model.
The concept of Observability doesn’t just stop at application performance and error logging. It also includes monitoring and analyzing prediction requests, performance metrics and the generated predictions from your models over time, in addition to evaluating the results.
Another important factor needed for ML Observability is having domain knowledge. Domain knowledge helps with precise and accurate insight into the changes that occur in the model. For example, when modeling and evaluating a recommender model for an eCommerce fashion store, you need to be aware of the fashion trends to properly understand the changes that occur in the model. Domain knowledge also helps during data collection & processing, feature engineering, and result interpretation.
The simple difference between ML Monitoring and ML Observability is “the What vs. the Why”.
ML Monitoring tells us “the What”, while ML Observability explains the What, Why, and sometimes How to Resolve It.
ML Monitoring |
ML Observability |
ML monitoring notifies us about the problem. | ML observability is knowing the problem exists, understanding why the problem exists and how to resolve it. |
Monitoring alerts us to a component’s outage or failure. | ML Observability gives a system view on outages – taking the whole system into account. |
ML Monitoring answers the what and when of model problems.
Monitoring tells you whether the system works. | ML Observability gives the context of why and how.
Observability lets you ask why it’s not working. |
ML Monitoring is failure-centric. | ML Observability understands the system regardless of an outage. |
For instance, let’s say a model in production faces a concept drift problem. An ML Monitoring solution will be able to detect the performance degradation in the model. In contrast, an ML Observability solution will compare data distributions and other key indicators to help pinpoint the cause of the drift. This is something solutions like Aporia do well.
Aporia’s full-stack ML observability solution gives data scientists and ML engineers the visibility, monitoring and automation, investigation tools, and explainability to understand why models predict what they do, how they perform in production over time, and where they can be improved.
Aporia provides a customizable monitoring and observability system for your machine learning models. It lets you monitor prediction drifts, data drifts, missing values at input, freshness, F1 Score, etc.
Aporia enables you to set alerts for over 50 different types of monitors and gives you actionable insights on how to resolve those issues, i.e. ML Observability.
Both ML Observability and ML Monitoring are integral parts of the ML lifecycle – and for a while, it was an essential and missing piece of the ML infrastructure. ML observability is a competitive advantage for your ML team as the more observability that you have, the more insights your team can gain from the models and their behavior.
As ML engineers and data scientists, we need a systematic approach to monitor your machine learning models at all levels – data, model and deployment. Aporia provides you with a flexible platform that enables you to build monitoring and observability that fits the needs of your particular use case and machine learning models.
Happy Monitoring!
Model drift refers to the change in the statistical properties of the target function that a machine learning model is...
What Is Data Drift? Machine learning models are only as good as the data they ingest during and after training....