How to Choose the Right Solution for Machine Learning Monitoring

Choosing-the-right-solution-for-ml-model-monitoring

Quite a number of machine learning failures today are caused by either software system failures or machine learning-specific failures. Sometimes these failures are caused by human errors that could be a result of a lack of domain knowledge, poor code logic, etc. 

Solving these issues as they come up is the best practice for ML engineers and data scientists when building machine learning systems. In order to solve them, the MLOps system would have to possess the ability to detect errors and also be able to monitor the entire system for impending failures or issues, and alert the ML engineers and data scientists accordingly. To do all of these, you need an end-to-end machine learning monitoring solution.

When it comes to production models, machine learning model monitoring is essential to help identify issues like data drift, concept drift, bias, performance degradation, and more, before they impact the business or its customers. Monitoring production machine learning isn’t an easy task. In fact, monitoring becomes more complex as the number of features, data, or models increases, which is always the case in production machine learning. 

In this article, you will learn about some of these complexities and how to choose the right solution for monitoring your machine learning models. Let’s start at the foundation, what is machine learning model monitoring?

 

What is Machine Learning Monitoring?

ML models train by following examples from a dataset and minimizing errors which represent how well the model performs at the task for which it is training. Production ML models perform inference on changing data from an ever-changing world, after training on a static set of examples in development. This discrepancy between static training data in development and dynamic data in production causes the performance of a production model to degrade over time. 

Machine learning model monitoring is a set of techniques to observe ML models in production and ensure their performance reliability. ML model monitoring gives you visibility into what is happening in production with your models and how they interact with new-production data based on certain metrics defined by the ML engineer or data scientist and evaluation of the performance of the  model to determine whether or not it is operating efficiently.

ML model monitoring aims to use data science and statistical techniques to continuously assess the quality of machine learning models in production. Monitoring can serve different purposes, such as

  1. Early detection of instabilities, anomalies, system pipeline issues, and bias. 
  2. Understanding how and why model performance degrades.
  3. Investigating the root cause of issues that arise.

 

Why is ML Model Monitoring Necessary?

What are the reasons for monitoring your model?

  • Gain insight into the accuracy of your predictions. 
  • Prevent prediction errors.
  • Tweak your models to perfection, and much more.

ML model monitoring is necessary because the accuracy of a machine learning model in production will constantly change during model development and while it’s in production. 

The data you use during your model training changes from its environment, infrastructure, or business needs over time which may impact the performance of the ML system, making it harder for algorithms to learn from new data. Without proper monitoring, these challenges could go unnoticed until they negatively impact the business or its customers, creating difficulties in understanding what went wrong.

As we have already established, changes occur in models as they interact with new data. Sometimes, the data used in training the model is different from the data it receives during production or the learning algorithm used makes some incorrect predictions with the data in production. In some cases, this could cause bias in the ML system, resulting in ML bias.

ML bias is a phenomenon where the model predicts results that are systematically distorted due to mistaken assumptions and goes on to make unintentionally wrong  decisions about various features (ethnic, religious, etc). This occurs as data distributions and target labels (“ground truth”) evolve. This is especially true for models related to people. It must be quickly serviced, as ML bias can have serious consequences for business and society.

 

The Challenges of Machine Learning Model Monitoring

Every machine learning system has all the challenges of traditional software systems, coupled with its own machine learning-specific issues. For ML, most of its common issues occur when the machine learning models are in a production environment, making them more difficult to monitor. Some of these issues include:

Data changes – The production environment is dynamic, and a direct implication of this is that the input data is constantly changing. These new changes would usually affect how the model performs, causing model decay, etc. 

For ML engineers and data scientists, this means that you may need to train a new version each day or week, depending on the needs of your business. While this helps to ensure that the ML model always uses the most recent data and learns from new information as it becomes available, it is a lot of work! Another issue with production data is the data quality. This could be a result of changes in the data pipeline, during data collection and preprocessing, at production.

Algorithm changes – During development, different learning algorithms are tested and the best one is chosen. In production, the learning algorithms evolve because it now interacts with new data. However, the algorithm might not generalize well with new data as seen in a number of cases and you might need to upgrade them periodically to improve their accuracy and performance for a given problem or use case.

Infrastructure changes – Hardware, software, or infrastructure updates can require complete retraining of the machine learning model. If you switch to new servers with more memory or faster processors, this will be necessary.

Now that you understand the general necessity and challenges of model monitoring, the question remains –what do you need to consider before adopting an ML model monitoring solution for your project?

 

Getting Started with Machine Learning Monitoring For Your Model

All ML applications aren’t just about the model, but everything that enables the model in production from the input data to the algorithms to the infrastructure, and other upstream and/or downstream services. 

To get started with monitoring, you need to consider your model and ask some  important questions about it to employ the right type of monitoring. Some of these questions include: 

Data What type of data? What is the quality? What is the volume?
Algorithm Is it a predictive model or a deep learning model? Does the algorithm need periodic upgrades? 
Infrastructure  What kind of infrastructure is the model deployed on? 
Business metrics What does your business define as success? Does the model meet business’s KPI? What metrics are important to you? 

How true is your ground truth? 

Domain trends What’s happening to users of your ML application in real-time? 

Data

The type of data used in a model often determines the kind of issues it would have.

For labeled data, issues such as drift can be identified using methods like accuracy, precision, False Positive Rate, and Area Under the Curve (AUC). You can also use custom methods.

For unlabeled data, it’s best to start with setting up a periodic assessment of the data’s distribution because the training dataset is stale, so comparing the distribution of the training set with the new data helps to understand what shift has occurred. Examples of tests to check for distribution changes include the Kullback-Leibler divergence, Jenson-Shannon divergence, and Kolmogorov-Smirnov test.

For data quality, assuming the data source is good, ensure your data pipeline has both versioning and metadata storage features. For deep learning models, you can avoid and detect adversarial inputs at production by adding filters and looking at the inner convolutional layers of the network. 

Model Algorithms

Machine learning is an iterative process, so continuous evaluation and experimentation of different algorithms on your data are important. Continuous evaluation of the model’s algorithm helps you understand how the algorithms are performing with respect to the data it gets and by continuous experimentation with various training algorithms (challenger algorithm vs champion algorithm), you get to find out the best performing algorithm for your model. 

For general predictive models, i.e classification models, regression models, time-series models, etc algorithms like Random forest, Gradient Boosted Model (GBM), and Prophet algorithms are a good place to start. 

With these kinds of models you want to monitor for metrics such as R-squared, Mean Square Error (MSE) if you have a lot of outliers in the data, Median absolute error, and Average error.  

Business Objective

The goal of building ML models is to meet a particular set of business needs. For example, the required metrics for a weather forecasting model would be different from an image classification model. 

For classical machine learning models such as time-series models, predictive models, etc, the general rule of thumb is to create a set of business baseline scores for every metric you want to monitor and measure your model’s metrics against it. It’s also important that you constantly check your business baseline score to see if it’s up-to-date with market research and trends.

 

Choosing Your Machine Learning Monitoring Solution

When choosing a monitoring solution for your ML project, it is important to evaluate the features of that solution and whether it can meet the specific challenges and requirements of your ML project. 

Before you choose any ML model monitoring solution, there are a set of features it must have to help you monitor your models:

Model versioning  – Machine learning monitoring solutions must be able to track and store the performance history of each machine learning model that is being monitored based on its key metrics per time. Allowing you to compare how well it is performing over time.

Real-time monitoring – For production models, this is a must because it shows the current state of your model and provides observability into your models. This real-time monitoring should include monitors for

  • Data integrity – outliers,  missing values, etc.
  • Data behavior – data drift, concept drift, predictive drift, etc.
  • Model performance – performance degradations, metric changes, 
  • System health – pipeline checks, CPU/GPU usage, etc.

Key alerts – Machine learning monitoring solutions should also provide notifications when machine learning models drop below a certain threshold so that you can quickly resolve any issues before they impact the business. These alerts should have webhook integrations to other platforms like Slack, Jira, etc. 

Model and metrics comparisons – It should be able to compare ML model output performance  and metrics across multiple groups and compare them with each other to identify problem areas and trends to help improve your model predictions for future iterations.

Dashboards – Dashboards are critical for monitoring. It helps with simple presentation and visualization. ML monitoring solutions should have dashboards because it would be incredibly inefficient and hard for stakeholders to understand just a series of numbers with names, but visualizing them in graphs, charts, etc would be easily understandable and show the relationships among these numbers. 

Model History – View machine learning models over time with the option to compare them against each other, to identify trends and performance changes that may need your attention.

Operational Metrics – Tracking and monitoring the ML infrastructure capacity is another important part of an ML monitoring solution. The capacity of the CPUs, GPUs, memory, storage, and network I/O affects the performance of your model. If these are close to full capacity, maintenance is required for an effectively working ML model.

Metadata Store – ML monitoring solutions could also have metadata storage to help store versions and hyperparameters of your models in production. This helps with reproducibility, explainability, auditing, compliance, lineage traceability, and troubleshooting. 

Collaboration – ML monitoring solutions should allow for collaboration with your team and sharing your findings and notifications with other teammates – it needs to be a collaborative environment for teams to work together, become part of the creation process, and further monitor the models. Having real-time insight into what is happening with your models makes it easier to exchange ideas, thoughts, observations, spot errors, and resolve issues. 

Explainability – One important aim of model monitoring is the ability to provide insight into what’s going on in your model based on the metrics it’s monitoring. This ability to provide context to observable patterns in a model, although it seems advanced, is very important to ML production teams. Moreover, explainability allows for additional inclusivity in the organization, as more stakeholders can understand the ML model predictions.

 

Open-source tools like Grafana, Prometheus, and others offer some of the features. However, it comes with risks, such as technical debt, data security, and scaling that comes with many workarounds, which you must maintain. So, what you save on service costs, you very well may pay in configuration hours or performance degradation.

Additionally, some ML monitoring platforms, like Aporia, can track and evaluate model performance, investigate, debug, explain model predictions, and improve model performance in production. Aporia’s solution for ML model monitoring offers a powerful toolkit at your fingertips to monitor ML models quickly and easily. With Aporia, you can focus on improving machine learning model performance and iterating your models for future use.

 

Conclusion

Monitoring your machine learning model is essential for a successful model. Using ML model monitoring tools is one of the easiest ways to ensure ML engineers and data scientists get real-time visibility and alerts on their model’s performance and its data characteristics. Doing this will in turn help them debug errors and initiate proactive actions.

Lastly, in any solution, it is important to consider whether changes in model performance metrics are due to sampling bias or whether perceived drift is due to randomness or outliers, and not a shift in the data distribution or target concept.

 

Happy Monitoring!

You may also like

Data Science, MLOps
Building an ML platform is nothing like putting together Ikea furniture; obviously, Ikea is way more …
Start Monitoring Your Models in Minutes