The most advanced ML Observability product in the market
Building an ML platform is nothing like putting together Ikea furniture; obviously, Ikea is way more difficult. However, they both, similarly, include many different parts that help create value when put together. As every organization sets out on a unique path to building its own machine learning platform, taking on the project of building a […]
Start integrating our products and tools.
We’re excited 😁 to share that Forbes has named Aporia a Next Billion-Dollar Company. This recognition comes on the heels of our recent $25 million Series A funding and is a huge testament that Aporia’s mission and the need for trust in AI are more relevant than ever. We are very proud to be listed […]
Learn more about machine learning model monitoring and ML model management with our in-depth guide.
Machine learning model monitoring measures of how well your machine learning model performs a task during training and in real-time deployment. As ML engineers, we define performance measures such as accuracy, F1 score, Recall, etc., which compare the predictions of a machine learning model with the known values of the dependent variable in a dataset.
When models are deployed to production, there is often a discrepancy between the original training data and dynamic data in the production environment. This causes the performance of a production model to degrade over time.
For this reason, continuous tracking and monitoring of these performance metrics are critical for improving model performance. Monitoring can help by:
These insights allow ML teams to identify the root cause of problems, and make better decisions on how to evolve and update models to improve accuracy in production.
In life, as well as in business, feedback loops are essential. The concept of feedback loops is simple: You produce something, measure how it performs, and then improve it. This is a constant process of monitoring and improving. ML models can certainly benefit from feedback loops if they contain measurable information and room for improvement.
Consider that you trained your model to detect credit card fraud based on pre-COVID user data. During a pandemic, credit card use and buying habits change. Such changes potentially expose your model to data from a distribution with which the model was not trained. This is an example of data drift, one of several sources of model degradation. Without ML monitoring, your model will output incorrect predictions with no warning signs, which will negatively impact your customers and your organization in the long run.
Related content: 5 Reasons Your ML Model May Be Underperforming in Production
Model building is usually an iterative process, so monitoring your model by using a metric stack is crucial to perform continuous improvement as the feedback received from the deployed ML model can be funneled back to the model building stage. It’s essential to know how well your model performs over time. To do this, you’ll need monitoring tools that effectively monitor the model performance metrics of everything from concept model drift to how well your algorithm performs with new data.
Several steps are involved in a typical ML workflow, including data ingestion, preprocessing, model building, evaluation, and deployment. Feedback, however, is missing from this workflow.
A primary goal of ML monitoring is to provide this feedback loop, feeding data from the production environment into the model building phase. This allows the machine learning models to continuously improve themselves by either updating or using an existing model.
Here is a checklist you can use to monitor your ML models:
Data drift occurs due to changes in your input data. Therefore, to detect data drift, you must observe your model’s input data in production and compare that to your training data. Noticing that the production input data and the training data do not have the same format or distribution, is an indication that you are experiencing data drift.
For example, in the case of changes in data format, consider that you trained a model for house price prediction. In production, ensure that the input matrix has the same columns as the data you used during training. Changes in the distribution of the input data relative to the training data will require statistical techniques to detect.
The following tests can be used to detect changes in the distribution of the input data:
You can detect concept drift by detecting changes in prediction probabilities given the input. Detecting changes in your model’s output given production inputs could indicate changes at a level of analysis where you are not operating.
For example, if your house price classification model is not accounting for inflation, your model will start underestimating house prices. You can also detect concept drift through ML monitoring techniques, such as performance monitoring. Observing a change in the accuracy of your model or the classification confidence could indicate concept drift.
Here are three ways to prevent concept drift:
Performance monitoring helps us detect that a production ML model is underperforming and understand why it is underperforming. Monitoring ML performance often includes monitoring model activity, metric change, model staleness (or freshness), and performance degradation. The insights gained through ML performance monitoring will advise changes to make to improve performance, such as hyperparameter tuning, transfer learning, model retraining, developing a new model, and more.
Monitoring performance depends on the model‘s task. An image classification model would use accuracy as the performance metric, but mean squared error (MSE) is better for a regression model.
It is important to understand that a bad performance does not mean that model performance is degrading. For example, when using MSE, we can expect that sensitivity to outliers will decrease the model’s performance over a given batch. However, observing this degradation does not indicate that the model’s performance is getting worse. It is simply an artifact of having an outlier in the input data while using MSE as your metric.
Defining what is considered poor performance
In monitoring the performance of an ML model, we need to clearly define what is poor performance. This typically means specifying an accuracy score or error as the expected value and observing any deviation from the expected performance over time.
In practice, data scientists understand that a model will not perform as well on real-world data as the test data used during development. Additionally, real-world data is very likely to change over time. For these reasons, we can expect and tolerate some level of performance decay once the model is deployed. To this end, we use an upper and lower bound for the expected performance of the model. The data science team should carefully choose the parameters that define expected performance in collaboration with subject matter experts.
Performance decay has very different consequences depending on the use case. The level of performance decay acceptable thus depends on the application of the model. For example, we may tolerate a 3% accuracy decrease on an animal sound classification app, but a 3% accuracy decrease would be unacceptable for a brain tumor detection system.
ML performance monitoring is a valuable tool to detect when a production model is underperforming and what we can do to improve. To remediate issues in an underperforming model, it is helpful to:
ML monitoring can be more effective with a dedicated monitoring solution. Look for the following features when selecting an ML monitoring solution:
Checking the input data establishes a short feedback loop to quickly detect when the production model starts underperforming.
Aporia is an ML observability solution that can help you monitor your ML models in production quickly and easily. Just follow these quick steps and you’ll get immediate, actionable insights about your models:
Easily sign up for Aporia’s Free Community Edition in a few clicks. Input the number of production models you have and let us know your focus areas.
Great! You’re all signed up, now let’s add your first model.
Add as many models as you need, and get a live centralized view of all your production models.
Choose a model and dive into its predictions. Slice and dice segments, customize widgets, and get a full view of the status and health of your model in production.
Let’s start monitoring your model. You can choose from our automated pre-configured monitors, or…
Create a customized monitor to track your model for drift, performance degradation, model decay, and more.
Determine your detection method.
Now, it’s time to choose which behavior you want to monitor.
Configure alerts and integrate your preferred alert communication channels.
Drill down into your alerts, and understand where, when, and why it was triggered.
Easily explain your predictions in human readable text and simulate “What if?” scenarios with Aporia’s XAI. Re-explain your prediction to determine impactful features.
Try it for yourself! Get started with Aporia’s ML monitoring solution
Start Monitoring Your Models in Minutes