Machine learning (ML) is a field that sounds exciting to work in. Once you discover its capabilities, it gets even more intriguing. We’ve all experienced this feeling while creating our first model in a Jupyter notebook. This makes good practice for learning and sharpening your ML skills. However, it is far from ML in real-life. Deploying a model into production and operating it as a continuous service is a whole other world.
Creating such a system is a process that involves interrelated steps from determining data requirements to model monitoring. Each step requires different skills and operational work. For instance, while data scientists focus on developing models, it is the job of data engineers to do data acquisition and maintain data quality. Standardizing and managing these steps properly can be grouped under the term machine learning operations, MLOps.
Building an ML system is a continuous and iterative process rather than a one-time job. Therefore, deploying your model into production is not the final step of the system. What comes after is the model monitoring step which is of vital importance for the performance, reliability, and accuracy of your model.
What is Machine Learning Monitoring?
Machine learning monitoring involves tracking the performance of ML models in production to identify potential issues that can affect the model’s performance, and also add negative business value. Since ML lifecycles are iterative in nature, i.e it involves lots of continuous experimentation, testing, and tuning at all stages – Model monitoring is used to get feedback from these experiments and tests, and then implement them in the system.
Model monitoring is more of a proactive solution for detecting issues in ML and improving existing ML systems. In this article, we will go over 4 reasons why we need to monitor machine learning models in production.
Why We Need to Monitor Machine Learning Models in Production
Mitigating Risks in Production Models
The prevalence of ML in business decision-making has been increasing drastically. We aim to create systems that are data-driven and require little to no human intervention. The main motivation is to make processes work more accurately and at scale. Consider the task of demand forecasting for a retail store chain with over 100 locations. It is not feasible or manageable to perform this task by simple calculations. ML offers a much better solution to this task.
Let’s say you build an ML system that forecasts the demand for products in over 100 stores. Then, the results are fed into the supply chain system and the products are automatically shipped to the stores. It sounds like a perfect use case of machine learning and automation. But, what happens if your model forecasts an absurd amount for a particular product? For instance, it might produce a daily forecast that is 50 times more than the average sold amount of the product. The result is sent to the supply chain and products are shipped to the store. If we do not monitor our machine learning system, we realize the excess amount at the store too late.
Sometimes the risks could come from inaccuracies or misrepresentations in the training data, and ML models are basically representations of the data they are trained on. While the training data might be a good representation of real life but usually, it can never really reflect it. Therefore, there is some risk associated with the results of ML models. To overcome or mitigate the risk, monitoring your ML models in production is essential.
Detecting Model Decay
You can create a state-of-the-art model that performs flawlessly. However, that model would only be as good as the data it was trained on. The real world is constantly changing and so is the data. As a result, the performance of our machine learning model decays over time. For example, people’s shopping behavior might change, which needs to be taken into consideration when forecasting. Such changes in the training data are related to a concept called input drift or data drift.
There are cases where the statistical properties of the target variable in the data changes. This means that, the relationship between the independent variables and the dependent variable changes. Consider we have a model in production that is used for detecting fraudulent transactions. What is classified as a safe transaction now might be fraudulent in the future. In a sense, our interpretation of the data changes. This situation is called concept drift, and it has a sizeable impact on model performance over time.
To eliminate the effect of data drift and concept drift on the performance of an ML model, the model needs to be retrained with the most recent data. The retraining can be a costly operation, so we should not randomly retrain the model. We need to monitor the machine learning system in production to determine when a model needs to be retrained.
Measuring IT Performance
While producing accurate predictions is of crucial importance, we cannot ignore how these predictions are made. ML systems involve good processors, high storage capacity, pipelines, etc. These are costly to maintain, especially live ML systems in production.
Therefore, monitoring how much time it takes to train a model and to make predictions with it, and the amount of memory and computation power used during that time, are highly important metrics to track.
You should be monitoring the processing times and consumed resources. These can be grouped under environment-related metrics, or system health, which can be a strong indication of a model’s performance. A sudden increase in one of these metrics might be due to a change in one of the libraries used in the code. If we experience a gradual increase, then we might consider redesigning the ML model or its features. In any case, these metrics are key in making a cost-benefit analysis. If the benefit obtained from the model in production is less than the cost to maintain it, you have a serious issue.
As ML models and data drive impact across different industries all over the world, especially in sectors like healthcare, and finance, trust is critical. Before ML solutions can be used, the stakeholders (ML engineers, product team, users, etc) involved should understand what happens in the model, and how it gets its predictions or results. Understanding what goes on in the model will in turn help us trust the model better and it will also improve the model’s fairness and reduce bias.
This concept of understanding what goes in a model is called Explainability. Building explainability to your ML system is one of the most intuitively powerful ways to build trust between a user and a model. But to be able to explain what goes on in a model during production, we’ll have to track, log, and monitor everything that happens inside the model. Explainability in production is impossible without reliable, real-time model monitoring. Machine learning monitoring provides us with insights into how the model interacts with the data and users it comes in contact with. From these insights, we can explain the model’s activities and then measure the degree of confidence, and trust in the model’s performance.
The task of adding machine learning monitoring after the deployment of an ML model into production is as important as building the model. Monitoring and creating feedback are essential parts of the post-deployment process. They are not only used for detecting problems but can also be used for improving model performance. It is a necessary and important component of building your ML systems.
In conclusion, our ability to track and monitor machine learning models in production is essential for building trust in AI solutions and products. Incorporating a monitoring solution like Aporia that provides real-time monitoring for model production, explainability, and observability is necessary to accurately monitor your models in production and build trust in AI applications and solutions.