April 7, 2024 - last updated
MLOps

Azure MLOps: Implementing MLOps with Azure Machine Learning

Noa Azaria
Noa Azaria
9 min read Jul 27, 2022

What Is Azure MLOps?

Azure MLOps enables organizations to adopt and manage machine learning operations (MLOps) pipelines in the Azure cloud. Microsoft provides Azure Machine Learning (Azure ML) as its primary solution for practicing MLOps in Azure, but there are other ways to create ML pipelines in the Azure cloud.

Azure Machine Learning lets you bring your own models built using popular ML frameworks like  TensorFlow, Pytorch, or scikit-learn.

ML professionals, engineers, and data scientists can use Azure Machine Learning to manage their ML workflows, including training, deploying, basic monitoring, and retraining production models. 

Here are key features of Azure Machine Learning: 

  • Automate and accelerate daily ML workflows
  • Integrate models into services and applications 
  • Platform development tools with durable Azure Resource Manager APIs

Azure MLOps Features

Azure Machine Learning offers many MLOps capabilities. It enables you to create reproducible ML pipelines that define reusable and repeatable steps for data preparation and training. You can build a reusable software environment to train and deploy ML models, and package, register, and deploy ML models from any location. 

Model Tracking

Azure Machine Learning lets you track metadata, capture governance information for your ML lifecycle, and include various information in the logged lineage information, such as the model’s publisher models, changes justification, and when the model was deployed and used in the production environment.

Automation

You can use Azure Pipelines and Azure Machine Learning to automate the entire ML lifecycle. Pipelines help you frequently update existing models, test new ML models, and periodically roll out new models with your services and applications.

Auditing

You can track the entire audit trail of all ML assets using metadata. Azure Machine Learning can integrate with Git to help you track information on the repositories, branches, and commits certain code originates from. You can employ Azure Machine Learning datasets to profile, version, and track data.

You can leverage this interpretability to explain ML models, understand how your models reach the result for a specific input, and ensure regulatory compliance. Additionally, Machine Learning job history can store a snapshot of all data, computes, and code you use to train your model. Machine Learning Model Registry can capture all metadata related to ML models.

Azure MLOps Reference Architecture for Python Models

The image below includes a reference architecture. It shows how you can implement continuous integration and continuous delivery (CI/CD) and retrain a pipeline for an artificial intelligence (AI) application with Azure DevOps and Azure Machine Learning. 

Azure built this example solution on the scikit-learn diabetes dataset. However, you can adapt it for other AI scenarios and build systems like Travis and Jenkins.

The build pipelines in the image include DevOps tasks for: 

  • Data sanity tests, unit tests, and integration testing
  • Model training on various compute targets
  • Model version management 
  • Model evaluation or model selection
  • Model deployment as a real-time web service
  • Staged deployment to testing or production

The architecture includes the following services:

  • Azure Pipelines—a building and testing system used for build and model release pipelines. The service breaks each pipeline into a series of logical steps or tasks. 
  • Azure Machine Learning—the architecture leverages Azure’s Machine Learning Python SDK to create compute resources, a workspace, an ML pipeline, and a scoring image. The workspace provides an area to train, deploy, and experiment with models.
  • Azure Machine Learning Compute—this architecture executed the training job on a cluster of virtual machines (VMs) available on-demand. You can set up auto scaling and choose various CPU or GPU node options. 
  • Azure Machine Learning pipelines—this service offers reusable ML workflows. These pipelines include different steps for training, model registration, image creation, and model evaluation. The architecture publishes or updates the pipeline only upon the completion of the build stage. It is triggered when new data arrives.
  • Azure Blob Storage—the architecture uses blob containers to store logs from the model scoring service, collecting the input data as well as model predictions. After some transformation, you can use these logs for model retraining.
  • Azure Container Registry—the architecture packages the Python scoring script in a Docker image and versions it in the registry.
  • Azure Container Instances—the release pipeline includes tasks that mimic the staging and testing environment by deploying the scoring web service image to container instances. This serverless service offers an easy way to run containers.
  • Azure Kubernetes Service—after testing the scoring web service image in the testing environment, the architecture deploys it to a production environment on top of a managed Kubernetes cluster.
  • Azure Application Insights—the architecture employs this monitoring service to identify performance anomalies.

MLOps Best Practices with Azure Machine Learning 

Here are best practices to follow when setting up MLOps with Azure Machine Learning:

  • Project teams—working in teams can help leverage domain and specialist knowledge across the organization. You can also comply with requirements for use case segregation by organizing a separate Azure Machine Learning workspace for each project.
  • Roles—you can define specific tasks and responsibilities across the organization for a given role, allowing one member in the ML project team to fulfill several roles. Custom roles in Azure enable you to define granular RBAC operations for your Azure Machine Learning project that certain roles can perform.
  • Templates—standardizing a code template ensures code reusability and helps increase ramp-up time when starting a project and adding new team members to the project. You can use several services to create templates, including Azure Machine Learning pipelines, CI/CD pipelines, and job submission scripts.
  • Feedback—you can run automated builds on data samples to reduce the time to feedback on your ML pipeline’s quality. Azure Machine Learning offers pipeline parameters to help you parameterize the input datasets.
  • Continuous deployment—you can automate the testing and deployment of real-time scoring services across various Azure environments, such as development and production, by setting up continuous deployment for ML models.
  • Naming conventions—standard tags and naming conventions for Azure ML experiments can help distinguish between retraining baseline ML pipelines and experimental work.
  • Automation—when using the CLI interface or studio UI to submit jobs, you can submit via the SDK. Prefer using Azure DevOps Machine Learning tasks or the CLI to configure automated pipeline steps. It can help reduce your code footprint, allowing you to reuse the job submissions from your automation pipelines.
  • Hosting—when developing Python packages for an ML application, prefer hosting them as artifacts in an Azure DevOps repository and publishing them as a feed. It enables you to integrate the Azure Machine Learning workspace with your DevOps workflow to build packages.

Azure MLOps Management with Aporia

Aporia is a natively supported monitoring solution for machine learning applications running on Azure. It can be seamlessly integrated into the end of your pipeline to provide real-time monitoring and tracking of your ML processes.

Aporia’s ML monitoring solution can be used in conjunction with Azure MLOps to provide additional capabilities for monitoring and managing machine learning models in production.

There are several reasons why an organization using Azure could benefit from integrating Aporia’s ML monitoring solution into their machine learning workflow:

  1. Improved model performance: By monitoring the performance of machine learning models in real-time, organizations can identify and address issues that may be affecting model performance. This can help improve the accuracy and reliability of the models, leading to better outcomes and a better overall user experience.
  2. Early detection of issues: Aporia can automatically detect issues such as drift or anomalies in model behavior, allowing organizations to identify and address problems before they become major issues. This can help prevent costly downtime or other disruptions caused by faulty models.
  3. Greater visibility into model behavior: Aporia provides detailed insights into the performance and behavior of machine learning models in production. This can help organizations understand how their models are being used and how they are performing under different conditions, allowing them to make informed decisions about how to optimize and improve their models.
  4. Enhanced security and compliance: By monitoring machine learning models in real-time, organizations can identify and address potential security or compliance issues before they become major problems. This can help organizations maintain compliance with industry regulations and standards, and protect the security and privacy of their data.
  5. Human readable Explainable AI: Aporia’s Explainable AI helps organizations understand the factors driving the predictions and decisions of their machine learning models, as well as identify any biases or inconsistencies. It provides insights into the feature impact that influences the model’s predictions, enabling organizations to make more informed decisions on how to optimize and use their models. Additionally, it helps organizations improve the fairness and transparency of their machine learning models, which is especially important as AI and machine learning become more prevalent in business operations.
  6. Root cause analysis: Aporia’s root cause analysis (RCA) feature is designed to help organizations identify the underlying cause of any issues or anomalies that they may be experiencing with their machine learning (ML) models in production. This can be especially useful in the context of Azure, where organizations may be using a range of different ML tools and services to train, deploy, and monitor their models. 

Overall, integrating Aporia with Azure MLOps can help organizations gain a more comprehensive view of their machine learning models and showcase their performance in production, helping to drive better outcomes and a better overall user experience.

To get a hands-on feel for Aporia’s ML Observability Platform, we recommend: 

Green Background

Control All your GenAI Apps in minutes