Prompt engineering sucks. Break free from the endless tweaking with this revolutionary approach  - Learn more

Securing AI systems is tricky, ignoring it is risky. Discover the easiest way to secure your AI end to end  - Learn more

Back to Blog
Product Updates

Introducing Production Investigation Room – Aporia exclusive

Aporia Graphic
Reah Miyara Reah Miyara 6 min read Jul 11, 2023

We are excited to introduce our new tool to equip your ML observability journey. Production IR is the first all-in-one root cause analysis (RCA) environment, allowing teams to explore and analyze their production data, pinpoint issues, visualize their effect, and gain insights to improve model performance.

Recognizing the complexities, labor-intensive nature, and limited production insight provided by traditional root cause analysis, we set out to build a collaborative, notebook-like experience designed for practitioners to get root cause as quickly as possible.

In this guide, we’ll discuss the challenges you face when performing RCA and provide a new all-in-one tool for seamlessly investigating production data and getting to the root cause of all your ML issues. 

Production IR in action

Click play and follow as I investigate an insurance claim assessment alert with Production IR, and see step by step how easy it is to turn insights into actions. 

The challenges of traditional root cause analysis

You’re an ML engineer and your high-performing credit risk model is suddenly acting up today. The performance metrics are like a roller coaster ride, the outputs are wonky, and you have no idea why. Sounds familiar, doesn’t it?

Traditional Root Cause Analysis (RCA) on ML systems is akin to untangling a Gordian Knot. It’s a difficult, cumbersome, and tedious task that eats up time and resources.

Before we dive into the solution, let’s first take a close look at the challenges ML practitioners face when trying to find and fix production ML issues.

  • It’s complicated: Our ML systems are a web of interdependent components. Pinpointing issues can feel like trying to find a particular star in the night sky.
  • Resisting the resistance: Sometimes, the biggest roadblocks are not in our systems but in our organization. Changing processes requires effort, resources, and most importantly, stakeholder buy-in. This means that stakeholders need to be involved or at least receive reports on RCA progress. 
  • Hunting in the dark: Navigating through production data for RCA is not a walk in the park. Dealing with the volume, speed, and variety of data, not to mention data quality, data drift, and anomalies can feel like hunting in the dark.
  • Tick Tock: RCA is a time-consuming, iterative process, particularly with ML models. It demands precision, thoroughness, and a whole lot of patience.
  • The code life: Gaining insights often necessitates writing custom scripts or code. This code needs to be not just effective but also maintainable and comprehensible for others. This also further alienates business stakeholders from the RCA process, hampering data science-business alignment. 
  • The Beast of Big Data: Large datasets can be resource hogs. Efficient data processing methods are a must, and can save you from the chaos of big data. 
  • Gatekeepers of Data: Access to production data needs to be carefully managed. Not everyone has access to the data in question. Ensuring the right individuals have access to RCA while adhering to data privacy and security standards is a delicate balance.
  • Silos are for Farms: Teams working in isolation can hamper effective RCA. Cross-functional collaboration is not just nice to have, it’s a necessity.

Enter Aporia’s Production IR (Investigation Room). An all-in-one data exploration tool designed to tackle these challenges head-on and bring some well-deserved peace to your life.

How to use Production IR for effective RCA

The challenges we just laid down are the exact reason we set on releasing Production IR. It’s the first all-in-one root cause analysis tool, providing a notebook experience for collaboratively investigating your real, live production data and pinpointing model issues fast and effectively. 

Let’s see which tools Production IR offers to go from alert to issue resolved.

Segment analysis: The Magnifying Glass

This tool allows you to segment your data into meaningful groups, and identify which segments are excelling or facing issues. It considers the size of each segment and different comparison metrics, effectively serving as your magnifying glass into the data.

Drift analysis: The time traveler

If your model’s performance is degrading over time, Drift Analysis is your time-travel tool. It helps you investigate and visualize data drift behavior over time, pinpointing why, when, and where drift originated.

Data stats: Your data dashboard

Imagine having a dashboard that instantly displays the key statistical metrics of your data. That’s what Data Stats does. It ensures you’re not running blind and have all the relevant data to make informed decisions during your RCA.

Distribution analysis: The bird’s eye view

Being able to visualize the distribution of your data at specific points in time is crucial. Distribution Analysis allows you to see how your data spreads and changes, giving you insights that can inform your RCA.

Text: Your digital investigation diary

Documentation and collaboration are key. The Text tool allows you to keep notes, document your investigation process, and share insights and ideas with stakeholders – all within one place. It’s your digital investigation diary.

Embedding Projector: The unstructured data wizard

This is a powerful tool for visualizing unstructured data such as text or images in 2D/3D using UMAP dimension reduction. It lets you identify clusters in your data and uncover underlying patterns, detect drift, and learn more about your NLP, LLM, and CV models.

Incident response: Responsible AI in practice

Aporia’s Production IR tool puts Responsible AI into action by centralizing production data investigation and providing an effective path to incident response. When alerts fire, Production IR enables rapid, precise, and effective incident response. By providing an efficient tool for identifying and addressing issues, it ensures that AI systems remain reliable and perform as expected, reducing the potential for harm. 

Secondly, the tool facilitates greater transparency and collaboration in incident response. The tool’s collaborative, notebook-like experience, encourages detailed documentation and teamwork, which are integral to fostering accountability and openness in AI practices. With Production IR, you’re not just dealing with problems better, you’re also making your AI systems more reliable and trustworthy. 

Wrapping It Up

Aporia’s Production IR is the secret sauce that can transform the cumbersome, complex, and tedious process of performing RCA on production data into an efficient, insightful, and collaborative experience. It also ensures a key aspect of Responsible AI practice, helping maintain transparent and accountable ML models in production. 

Want to learn more about Production IR? Drop us a line and book a demo.

Rate this article

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.

On this page

Blog
Building an AI agent?

Consider AI Guardrails to get to production faster

Learn more

Related Articles