Prompt engineering sucks. Break free from the endless tweaking with this revolutionary approach  - Learn more

Securing AI systems is tricky, ignoring it is risky. Discover the easiest way to secure your AI end to end  - Learn more

Back to Blog
MLOps & LLMOps

The state of production LLMs: My takeaways from MLOps World 2023

takeaways from MLOps World 2023
Alon Gubkin Alon Gubkin 3 min read Nov 05, 2023

Recently, I was lucky enough to attend MLOps World in Austin. There were panels, provoking keynotes, parties, and while not overwhelming in numbers, there was sheer quality in interactions with those who made the trip. 

I was coming in hot with LLMs on my mind because over at Aporia we just released a new Guardrails tool for LLMs and I wanted to see how ML platform engineers are tackling production issues. Everyone is doing RAGs, so clearly there was a lot of focus on its possibilities. But, mostly people seemed really interested in cool LLM tricks, stuff like transforming LLM embedding space into something more useful, or how to use logit_bias with LLMs to solve classification problems.

Hearing from big players like Roche and Instacart, talking about their ML journeys, each in their respective industries, offered some real eye-opening moments. However, with every new insight and trick, came the challenges too. The good stuff, the reason we’re all here. These are the very obstacles keeping LLMs and “ML 1.0” from reaching their full business potential:

  • You’re spending too much time troubleshooting model issues – only to find out that it’s some corrupt data pipeline – data science teams are expensive and their time is valuable. They should primarily focus on building, fine-tuning, and improving models, not firefighting data pipeline issues that have nothing to do with them.
  • RAGs are really hard – and they can get even more complicated when trying to balance figuring out chunk size or chunk order. The intricacy of determining optimal parameters intertwines with questions like How to handle long documents? What’s the ideal document count for retrieval? Which embedding suits retrieval best? These dilemmas cloud the path to effective RAG deployment.
  • LLM evaluation is a major pain point – especially when trying to measure and track the performance of RAGs. Their dual retrieval-generation nature complicates pinpointing errors, whether they arise from fetching wrong documents or flawed text synthesis, making comprehensive assessment intricate.
  • Fine-tuning cycles aren’t clear – raising questions like is fine-tuning even needed, how should I do it? Figuring out the best datasets, epochs, and loss functions can be tricky, and it’s making it hard to really nail down how to boost model performance.

After immersing myself in the vibrant atmosphere of MLOps World 2023, there’s a genuine sense of excitement, innovation, and camaraderie. Yet, even with these breakthroughs and dazzling LLM tricks, it’s pretty clear that the road to seamless ML integration is riddled with challenges. From the nitty-gritty of RAGs’ optimal parameter configurations to the heart of ensuring context-derived responses in LLMs, we’re reminded that innovation is a double-edged sword. 

While the complexities of fine-tuning cycles and LLM evaluation can be daunting, they also underscore the depth and potential of this field. The discussions in Austin rekindled my belief that for every obstacle, there’s a solution waiting to be discovered. 

See you next time ????

Rate this article

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.

On this page

Blog
Building an AI agent?

Consider AI Guardrails to get to production faster

Learn more
Table of Contents

Related Articles