Model Drift: What Is It and How to Prevent It
Model drift refers to the change in the statistical properties of the target function that a machine learning model is...
🤜🤛 Aporia partners with Google Cloud to bring reliability and security to AI Agents - Read more
Machine learning models are only as good as the data they ingest during and after training. Data drift refers to a change in the distribution of a model’s input data over time. In other words, it refers to a situation where the input data that a machine learning model was trained on no longer accurately represents the data that the model is being applied to.
Data drift can have a significant impact on the performance of machine learning models, as a model that was trained on a different distribution of data may not be able to accurately predict or classify new data. This can cause a model to become less accurate over time, or even lead to the model’s performance degrading rapidly.
It’s important to keep an eye on the performance of the model over time and keep track of any changes in the input data, so that data drift can be identified and addressed as soon as possible.
This is part of an extensive series of guides about machine learning.
Concept drift is when the relationship between the inputs and outputs of a machine learning model changes in the real world, compared to those relations when the model was trained. In other words, predictions generated by the model for certain inputs, which used to be correct, are no longer relevant.
For example, a model that was trained to detect fraudulent credit card transactions may become less accurate over time as criminals change their tactics. This is the most basic form of data drift.
Learn more in our detailed guide to concept drift
Covariate shift is similar to concept drift, but it is a more severe problem. In covariate drift, not only is there a shift in the relation between inputs and outputs, but in addition, the input data changes.
For example, a model that was trained on data from a specific geographical region may become less accurate when applied to data from a different region due to different cultural influences or purchasing habits. Here there is a change in the way the model needs to analyze inputs, and the inputs themselves are also different.
Prior probability shift occurs when the proportion of the different classes in the data changes over time. For example, if a binary classification model was trained to detect spam email, and the proportion of spam email in the population changes, the model’s performance may suffer as its prior probability assumptions are not accurate anymore.
The PSI is a measure of the change in the distribution of a feature between the training and test data. It is calculated as the difference in the cumulative probability of a feature between the two datasets. A high PSI value indicates a significant change in the distribution of the feature, which may indicate data drift.
The formula to calculate PSI looks like this:
PSI = ((Actual% – Expected%) * ln(Actual% * Expected%))
The Kolmogorov-Smirnov test is a non-parametric test that can be used to determine whether two samples come from the same distribution. This test can be used to detect data drift by comparing the distribution of the training data and the distribution of the test data.
The formula looks like this:
Dn,m = supx|F1,n(x) – F2,m(x)| Fn(x) = 12i=1nI[-,x](Xi)
F1,n(x) is the distribution function for previous data (n), while F2,m(x) is the distribution function for new data (m), and supx refers to the subset of x samples that maximizes the two functions.
KL divergence is a measure of the difference between two probability distributions. It can be used to detect data drift by comparing the distribution of the training data and the distribution of the test data.
Here is an example of the KL divergence formula with A and B representing the old and new data distributions, respectively:
KL(A||B) = – xB(x) * logA(x)B(x)
The divergence can be anything between 0 and infinity – score of 0 means the distributions are identical.
JS divergence is a symmetric version of the KL divergence method, which can be used to detect the similarity or dissimilarity between two probability distributions. Following is the formula used in JS divergence:
JS(B||A) = 12(KL(B||M) + KL(A||M))
Learn more in our detailed guide to data drift detection (coming soon)
Here are a few strategies that can be used to solve data drift:
By implementing these strategies, organizations can effectively address data drift and ensure that their machine learning models continue to perform well over time.
Learn more in our detailed guides to:
By identifying and addressing data drift early on, businesses can avoid the negative consequences of inaccurate predictions, such as lost revenue, reduced customer satisfaction, and increased operational costs. Thus, monitoring ML models for data drift is crucial for maintaining business continuity and maximizing the benefits of machine learning.
Aporia’s ML observability platform is the ideal partner for Data Scientists and ML engineers to visualize, monitor, explain, and improve ML models in production. Our platform fits naturally into your existing ML stack and seamlessly integrates with your existing ML infrastructure in minutes. We empower organizations with key features and tools to ensure high model performance:
Visibility
Monitoring
Root Cause Investigation
Together with our content partners, we have authored in-depth guides on several other topics that can also be useful as you explore the world of machine learning.
Authored by Cynet
Authored by Aporia
Authored by Aporia
To get a hands-on feel for Aporia’s advanced model monitoring and deep visualization tools, we recommend:
Book a demo to get a guided tour of Aporia’s capabilities, see ML observability in action, and understand how we can help you achieve your ML goals.
Model drift refers to the change in the statistical properties of the target function that a machine learning model is...
When you ask machine learning (ML) engineers about their biggest challenges, monitoring and observability often tops the list. There are...