Product Updates

Aporia and Databricks Team Up to Bring ML Observability to Your Lakehouse

Reah Miyara Reah Miyara
3 min read May 31, 2023

We’re super excited to share that Aporia is now the first ML observability offering integration to the Databricks Lakehouse Platform. This partnership means that you can now effortlessly automate your data pipelines, monitor, visualize, and explain your ML models in production.

Aporia and Databricks: A Match Made in Data Heaven

One key benefit of this integration is the deployment of Aporia directly on the Databricks platform in minutes, ensuring no data leaves your lakehouse. This approach maintains data security and integrity while utilizing Databricks’ native capabilities for optimized resource usage and reduced latency. Let’s dive into the cool stuff Aporia and Databricks have to offer. 

Why Aporia and Databricks?

Our integration with Databricks addresses two primary challenges faced by ML practitioners:

  1. Having to maneuver through a saturated and often confusing MLOps landscape – no true end-to-end solution for managing the entire ML lifecycle, from training to production. 
  2. Scaling production ML workloads can be pretty challenging without solutions designed to handle large-scale data. 

Using Aporia on top of your Databricks environment completes the ML management lifecycle picture, providing a seamless and familiar experience from training into production. 

To solve that second challenge – our partnership with Databricks lets you automate your data pipelines, so you can focus on the fun part: extracting insights and creating value from your production data. Smooth sailing from here on. 

A Seamless Integration with Data Lakes

Aporia’s Direct Data Connectors (DDC) offers a smooth integration with Databricks Delta Lake and other data sources, providing a streamlined connection to your data lakes. This allows you to start monitoring billions of predictions in minutes, eliminating the need for data duplication and data sampling. This ensures that insights are derived from a single source of truth, simplifying data management and accelerating insights into actions. 

Enhanced Collaboration 

With our Databricks integration, ML teams can collaborate more effectively, sharing insights and expertise across the organization, especially when alerts sound off and root cause analysis (RCA) is initiated. The unified platform streamlines communication and fosters a culture of continuous learning, empowering teams to make data-driven decisions and optimize their ML models with confidence.

Getting Started with Aporia on Databricks: A Step-by-Step Guide

Integrating Aporia onto your Databricks environment is simple and easy, taking only a few minutes to start monitoring billions of predictions and extracting valuable insights to improve model performance. In three quick steps, you’re all setup:

  1. Establish a connection between Aporia and your Databricks Delta Lake by utilizing the provided Databricks Delta Connector (DDC).
  1. Link your training and inference datasets to Aporia to facilitate seamless integration.
  1. Define the schema for your model by specifying essential components, such as features, predictions, raw inputs, and actual values.

Upon completing the setup, we will monitor your ML models on Databricks, providing insights into performance and identifying potential issues. This enables teams to swiftly troubleshoot, rectify problems, and proactively enhance models based on real-time production feedback.

Let’s Talk Data Privacy and Security

Our integration with Databricks ensures local data processing within your Lakehouse, maintaining privacy and security. With no data leaving your Delta Lake, Aporia adheres to data sovereignty requirements while promoting efficient processing, benefiting teams that prioritize data confidentiality and performance.

For further information regarding this integration or to learn more about the benefits of using Aporia with Databricks, please don’t hesitate to contact us.

