Islands of Confidence: Make LLM apps more reliable by running *less* LLMs

Alon Gubkin
Alon Gubkin

Alon is the CTO of Aporia.

3 min read Nov 28, 2023

In this article, I want to share a method to improve your LLM’s reliability, making LLM apps produce consistent results for particular inputs, by creating something I call “Islands of Confidence”.

An island of confidence is basically a set of inputs where we choose NOT to run an LLM. Instead, we run normal deterministic code. 

We’ll start with a very simple example and build it from there, step by step.

STEP 1: Exact Match

Let’s say we have a customer support chatbot, where users frequently ask: “How do I create a new account?”

Since this question is so frequent, there’s no reason to run the LLM. Instead, we can simply add an ‘if’ statement before the model that checks if the user input is equal to the question above. If it is – we can return a cached, verified answer. 

But this example is useless because another user might ask the same question a little bit differently. Let’s fix that.

STEP 2: Paraphrased Input

What if the user asks the same question but a little bit differently: “yo how to register for new account?”.

In this case, we still want to detect it and run the same logic. Fortunately, there’s a simple solution: we can fine-tune a small NLP binary model to detect paraphrases of our question.

One method is to use sentence transformers and a model such as paraphrase-mpnet-base-v2. From what I’ve seen, you only need around 10-20 examples for good results. Check out the SetFit library by Hugging Face.

Now, our island isn’t just a single string—it’s any paraphrase of the question “How do I create a new account?”.

But we ignored one important detail: our chatbot is probably a RAG, and to answer the question, it usually needs to retrieve context from the knowledge base.

STEP 3: RAGs on infrequently-modified KB

By creating an island that simply returns a string, we basically ignore the retrieval part. This creates a problem: what if a new version of the web app is deployed and the process of creating a new account changes?

Even though the KB would probably be updated, our cached answer is now deprecated.

To solve this, instead of just returning a string, we can check if the context that was originally used to generate the answer is still relevant. If not, we can invalidate the island.

Step 4: Talk-to-your-Data with a Constant Question

Talk-to-your-data use cases are really useful, this is how they work:

  1. A user asks a question (e.g. “How many customers do we have and what’s the average ARR?”)
  2. A prompt is generated with the relevant part of the database schema to be used as context
  3. LLM generates a SQL query
  4. This SQL query is being executed against the data warehouse (e.g. Snowflake)
  5. Application UI is used to show the results of the query

Unfortunately, a hallucination here can lead to incorrect SQL statements, which could lead to completely incorrect data. The user might not know how to interpret it correctly.

Fortunately, our technique just works out of the box here! The island of confidence can return the verified SQL query, as long as the database schema doesn’t change, very similarly to the way we handled RAGs.

In a future post, I’ll discuss how islands of confidence can work with more complex variations of a question (not just paraphrases), as well as tools. Let me know what you think!

Green Background

Control All your GenAI Apps in minutes