Last week I was lucky enough to attend MLOps World in Austin. There were panels, provoking keynotes, parties, and while not overwhelming in numbers, there was sheer quality in interactions with those who made the trip.
Last week I was lucky enough to attend MLOps World in Austin. There were panels, provoking keynotes, parties, and while not overwhelming in numbers, there was sheer quality in interactions with those who made the trip. LLMs were the main focus.
Back to Blog

The state of production LLMOps: My takeaways from MLOps World 2023

Alon Gubkin Alon Gubkin
3 min read Nov 05, 2023

Table of Contents

    Recently, I was lucky enough to attend MLOps World in Austin. There were panels, provoking keynotes, parties, and while not overwhelming in numbers, there was sheer quality in interactions with those who made the trip. 

    I was coming in hot with LLMs on my mind because over at Aporia we just released a new Guardrails tool for LLMs and I wanted to see how ML platform engineers are tackling production issues. Everyone is doing RAGs, so clearly there was a lot of focus on its possibilities. But, mostly people seemed really interested in cool LLM tricks, stuff like transforming LLM embedding space into something more useful, or how to use logit_bias with LLMs to solve classification problems.

    Hearing from big players like Roche and Instacart, talking about their ML journeys, each in their respective industries, offered some real eye-opening moments. However, with every new insight and trick, came the challenges too. The good stuff, the reason we’re all here. These are the very obstacles keeping LLMs and “ML 1.0” from reaching their full business potential:

    • You’re spending too much time troubleshooting model issues – only to find out that it’s some corrupt data pipeline – data science teams are expensive and their time is valuable. They should primarily focus on building, fine-tuning, and improving models, not firefighting data pipeline issues that have nothing to do with them.
    • RAGs are really hard – and they can get even more complicated when trying to balance figuring out chunk size or chunk order. The intricacy of determining optimal parameters intertwines with questions like How to handle long documents? What’s the ideal document count for retrieval? Which embedding suits retrieval best? These dilemmas cloud the path to effective RAG deployment.
    • LLM evaluation is a major pain point – especially when trying to measure and track the performance of RAGs. Their dual retrieval-generation nature complicates pinpointing errors, whether they arise from fetching wrong documents or flawed text synthesis, making comprehensive assessment intricate.
    • Fine-tuning cycles aren’t clear – raising questions like is fine-tuning even needed, how should I do it? Figuring out the best datasets, epochs, and loss functions can be tricky, and it’s making it hard to really nail down how to boost model performance.

    After immersing myself in the vibrant atmosphere of MLOps World 2023, there’s a genuine sense of excitement, innovation, and camaraderie. Yet, even with these breakthroughs and dazzling LLM tricks, it’s pretty clear that the road to seamless ML integration is riddled with challenges. From the nitty-gritty of RAGs’ optimal parameter configurations to the heart of ensuring context-derived responses in LLMs, we’re reminded that innovation is a double-edged sword. 

    While the complexities of fine-tuning cycles and LLM evaluation can be daunting, they also underscore the depth and potential of this field. The discussions in Austin rekindled my belief that for every obstacle, there’s a solution waiting to be discovered. 

    See you next time ????

    On this page

      Green Background

      Start Monitoring Your Models in Minutes