Ensure reliable, on-target Gen-AI responses
Protect intellectual property and ensure compliance
Safely navigate GenAI: Detect and avoid off-topic conversations
Keep interactions tasteful, filter NSFW content
Secure company data: Detect and anonymize sensitive info
Shield data from smart LLM SQL queries
Detect and filter out malicious input for prompt integrity
Safeguard LLM: Keep model instructions confidential
Explore LLM interactions for user engagement insights
Track costs, queries, and tokens for budget control
Tailored production ML dashboards to monitor key metrics
Real-time ML monitoring to detect drifts and monitor predictions
Direct Data Connectors: Monitor and observe billions of predictions
Root Cause Analysis to gain actionable insights and explore model predictions
LLM Observability for your ML: Monitor, troubleshoot and enhance efficiency
Explainable AI to understand, ensure trust, and communicate predictions
Tailored Aporia Observe for your models: Integrate any model in minutes
Integrate Aporia to every LLM and tool in the market
Empower tabular models with Aporia
Streamline AI Act compliance with Aporia Guardrails and Observe
Unlock potential in CV & NLP models
A team of Cybersecurity, Compliance, and AI Experts that ensures Aporia users top-tier protection
Optimize LLM & GenAI apps for peak performance
Your go-to resource for Aporia insights and guides
Integrate Aporia to your LLM as a Proxy with Guardrail Policies
Integrate Aporia with Your Firewall for AI Tool Security
Easily Integrate and Monitor ML Models in Production
Define ML Observability Resources as Code with SDK
Learn about AI control from our experts
Your dictionary for AI terminology.
Step-by-step guides to master AI
Dive into our GitHub projects and examples
Unlock AI secrets with our eBooks
Elevate your GenAI and LLM knwoledge
Navigate the core of ML observability
Metrics, feature importance and more
Alon is the CTO of Aporia.
In this article, I want to share a method to improve your LLM’s reliability, making LLM apps produce consistent results for particular inputs, by creating something I call “Islands of Confidence”.
An island of confidence is basically a set of inputs where we choose NOT to run an LLM. Instead, we run normal deterministic code.
We’ll start with a very simple example and build it from there, step by step.
Let’s say we have a customer support chatbot, where users frequently ask: “How do I create a new account?”.
Since this question is so frequent, there’s no reason to run the LLM. Instead, we can simply add an ‘if’ statement before the model that checks if the user input is equal to the question above. If it is – we can return a cached, verified answer.
But this example is useless because another user might ask the same question a little bit differently. Let’s fix that.
What if the user asks the same question but a little bit differently: “yo how to register for new account?”.
In this case, we still want to detect it and run the same logic. Fortunately, there’s a simple solution: we can fine-tune a small NLP binary model to detect paraphrases of our question.
One method is to use sentence transformers and a model such as paraphrase-mpnet-base-v2. From what I’ve seen, you only need around 10-20 examples for good results. Check out the SetFit library by Hugging Face.
Now, our island isn’t just a single string—it’s any paraphrase of the question “How do I create a new account?”.
But we ignored one important detail: our chatbot is probably a RAG, and to answer the question, it usually needs to retrieve context from the knowledge base.
By creating an island that simply returns a string, we basically ignore the retrieval part. This creates a problem: what if a new version of the web app is deployed and the process of creating a new account changes?
Even though the KB would probably be updated, our cached answer is now deprecated.
To solve this, instead of just returning a string, we can check if the context that was originally used to generate the answer is still relevant. If not, we can invalidate the island.
Talk-to-your-data use cases are really useful, this is how they work:
Unfortunately, a hallucination here can lead to incorrect SQL statements, which could lead to completely incorrect data. The user might not know how to interpret it correctly.
Fortunately, our technique just works out of the box here! The island of confidence can return the verified SQL query, as long as the database schema doesn’t change, very similarly to the way we handled RAGs.
In a future post, I’ll discuss how islands of confidence can work with more complex variations of a question (not just paraphrases), as well as tools. Let me know what you think!