April 7, 2024 - last updated
GenAI For Practitioners

Enhancing RAG performance using Hypothetical Document Embeddings (HyDE)

Gon Rappaport
Gon Rappaport

Solutions Architect

12 min read Feb 15, 2024


Discovering information on the internet is like a treasure hunt, and the key to success lies in search engines. One innovative technique in this quest is “HyDE” – Hypothetical Document Embeddings. It’s like a clever assistant for search engines, enhancing their ability to find information with precision. Think of it as a helper that crafts imaginary documents based on your questions, turning them into special codes to uncover real, relevant information. 

The magic of HyDE is that it doesn’t need a massive list of examples; rather it adapts and refines searches on the fly. It’s like a superhero sidekick ensuring your internet searches are not just efficient but also spot-on! 

What is the challenge of dense retrieval? 

Across a wide range of tasks and languages, dense retrieval—a method used by search engines to identify pertinent content by analyzing their semantic similarities has demonstrated considerable potential. Nonetheless, developing completely zero-shot dense retrieval systems without any relevant labels has proven to be extremely difficult. Traditional methods rely on supervised learning, necessitating a sizable labeled example dataset to train the model properly.

What is HyDE? 

HyDE, or Hypothetical Document Embeddings, is an approach that uses a Language Learning Model (similar to ChatGPT) to generate a theoretical document when answering a question. Instead of directly searching for answers in a database, it takes it a step further by using an unsupervised encoder, which is a special kind of learning method to turn the theoretical document into a unique code (embedding vector). This code helps find similar documents in a database by focusing on similarities between different answers rather than the questions themselves.

HyDE doesn’t just look for similar questions and answers. It understands the content it generates and finds answers based on how similar they are to other answers, making it effective in tasks like web search, question answering, and fact-checking.

You wouldn’t have realized, but must have depended more and more on HyDE. With a specific focus on enhancing the retrieval of relevant documents, the strategy aims to improve document retrieval. 

HyDE’s ability to generate a synthetic response to a user query makes it a unique offering. Then, to help with retrieval, this synthetic response is transformed into a vector embedding. This kind of approach works especially well in fields like qualitative research, medical transcripts, and case studies where responses can be more freely chosen. This increases retrieval coverage and improves retrieval accuracy.

However, just the initial user inquiry and the discovered pertinent documents are forwarded during the final LLM step.

How does HyDE work?

Instead of using the query and its generated vector to directly seek in the vector database, HyDE responds to a query by employing a Language Learning Model, similar to ChatGPT, to generate a theoretical document.

It goes one step further by using an unsupervised encoder that was trained using contrastive techniques. In order to find similar documents in a vector database, this encoder converts the theoretical document into an embedding vector.

It focuses on answer-to-answer embedding similarity rather than searching for embedding similarity for questions or queries. Its performance matches highly-tuned retrievers and is robust in a variety of tasks, including verification of facts, QA, and online search.

The HyDE method acknowledges the challenges associated with encoding relevance without labeled data and zero-shot learning. 

Instead, it makes use of hypothetical documents and language models. Here’s how it works:

1. Generating hypothetical documents

HyDE directs a language model, such as GPT-3, to create a hypothetical document in response to a user’s query. Although the purpose of this text is to identify pertinent patterns but may contain errors.

2. Unsupervised encoding

An unsupervised contrastive encoder is then used to encode the generated hypothetical document into an embedding vector. Based on vector similarity, this vector designates a region in the corpus embedding space where similar real documents are retrieved.

3. Retrieval process

HyDE uses the corpus to find actual documents that are most comparable to the encoded hypothetical document throughout the retrieval process.

Brief illustration of HyDE

Consider a scenario in a vast scientific database containing numerous articles, studies, and patient records related to environmental issues. A user is interested in understanding the impact of a specific pesticide known as “EcoShield.”

User Query: “What are the environmental effects of EcoShield?”

The system initially generates a hypothetical response: “EcoShield is known to have environmental effects, such as changes in soil microbial activity and potential harm to non-target organisms.” This generated response is then transformed into a vector embedding. Using this embedding, the system searches the environmental database to identify the most relevant documents. The top_k most pertinent documents are then sent to the language model, along with the original user query, resulting in the final answer: “EcoShield exhibits environmental effects, including alterations in soil microbial activity. Studies suggest potential risks to non-target organisms. It is recommended to consult scientific sources for comprehensive information.”

By employing the HyDE approach, which involves generating a synthetic response and utilizing its embedding for retrieval, the system aims to improve the precision and relevance of the retrieved documents, ultimately providing a more insightful and accurate response to the user.

                                                        RAG with HyDE

Advantages of HyDE

HyDE is interesting since it can function well even in the absence of pertinent labels. It transfers the burden of modeling relevance from conventional retrieval models to a language model that is versatile enough to handle a variety of activities and queries. This strategy offers several benefits:

1. Zero-shot retrieval: HyDE doesn’t need a sizable dataset of labeled samples to function “out of the box.”

2. Cross-lingual: It is appropriate for multilingual search applications since it functions well in a variety of languages.

3. Flexibility: HyDE’s methodology enables it to adjust to various jobs without requiring a great deal of fine-tuning.

Steps for implementation with code

1. Install the necessary dependencies

from langchain.llms import OpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.chains import LLMChain, HypotheticalDocumentEmbedder
from langchain.prompts import PromptTemplate

2. Initialize the embedding model and LLM

# instantiate llm
llm = OpenAI()
emebeddings = OpenAIEmbeddings()
embeddings = HypotheticalDocumentEmbedder.from_llm(llm, emebeddings, "web_search")
# Now we can use it as any embedding class!
result = embeddings.embed_query("What bhagavad gita tell us?")

Furthermore, we can create numerous documents and then merge their embeddings. By default, we use the average to aggregate those. This can be accomplished by changing the LLM that we use to produce documents that provide various results.

multi_llm = OpenAI(n=3, best_of=3)
embeddings = HypotheticalDocumentEmbedder.from_llm(
    multi_llm, embeddings, "web_search"
result = embeddings.embed_query("What bhagavad gita tell us?")

HypotheticalDocumentEmbedder does not actually generate full hypothetical documents. All it produces is an embedding vector that represents a hypothetical document. You can create “dummy” embeddings with this HypotheticalDocumentEmbedder, which you can then insert into a vector store index.

This allows you to set aside room for future, hypothetical papers so that you can gradually add additional, actual documents in the future.

3. Use your prompts

Moreover, you can create and utilize custom prompts when using LLMChain to create documents. If you know what subject you’re asking about, this is useful. You can obtain a sentence that more closely matches your topic by using a custom question.

Let’s give it a go. We are going to create a prompt for the following example.

prompt_template = """
As a knowledgeable and helpful research assistant, your task is to provide informative answers based on the given context.
Use your extensive knowledge base to offer clear, concise, and accurate responses to the user's inquiries.

Question: {question}

prompt = PromptTemplate(input_variables=["question"], template=prompt_template)
llm_chain = LLMChain(llm=llm, prompt=prompt)
embeddings = HypotheticalDocumentEmbedder(

4. Loading the current PDF

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader
#Load the  multiple pdfs
pdf_folder_path = '/content/book'
from langchain.document_loaders import PyPDFDirectoryLoader
loader = PyPDFDirectoryLoader(pdf_folder_path)
docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
documents = text_splitter.split_documents(docs)

5. Create a vector store to facilitate information retrieval

from langchain.vectorstores import LanceDB
import lancedb
# lancedb as vectorstore
db = lancedb.connect('/tmp/lancedb')
table = db.create_table("documentsai", data=[
    {"vector": embeddings.embed_query("Hello World"), "text": "Hello World", "id": "1"}
], mode="overwrite")
vector_store = LanceDB.from_documents(documents, embeddings, connection=table)

6. The result of vector_store retrieving some pertinent data from the document

[Document(page_content='gaged in work, such a Karma -yogi is not bound by Karma. (4.22) \nThe one who is free f rom attachment, whose mind is fixed in Self -\nknowledge, who does work as a service (Sev a) to the Lord, all K ar-\nmic bonds of such a philanthropic person ( Karma -yogi) dissolve \naway. (4.23) God shall be realized by the one who considers eve-\nrything as a manifest ation or an act of God. (Also see 9.16) (4.24)  \nDifferent types of spiritual practices', metadata={'vector': array([-0.00890432, -0.01419295,  0.00024622, ..., -0.0255662 ,
         0.01837529, -0.0352935 ], dtype=float32), 'id': '849b3475-6bf5-4a6a-955c-aa9c1426cdbb', '_distance': 0.2407873421907425}),
 Document(page_content='renunciation (Samny asa) is also known as Karma -yoga . No one \nbecomes a Karma -yogi who has not renounced the selfish motive \nbehind an action. (6.02)  \nA definition of yoga and yogi  \nFor the wise who seeks to attain yoga of meditation or calm-\nness of mind, Karma -yoga  is said to be the means. For the one \nwho has attained yoga, the calmness becomes the means of Self -\nrealization. A person is said to have attained yogic perfection when', metadata={'vector': array([ 0.00463139, -0.02188308,  0.01836756, ...,  0.00026087,
         0.01343005, -0.02467442], dtype=float32), 'id': 'f560dd78-48b8-419b-8576-978e6afee272', '_distance': 0.24962666630744934}),
 Document(page_content='one should know the nature of attached or selfish action, the nature \nof detached or selfless action, and also the nature of forbidden ac-\ntion. (4.17)  \nA Karma -yogi is not subject  to the K armic laws  \nThe one who sees inaction in action, and action in inaction, is \na wise person. Such a person is a yogi and has accomplished eve-\nrything. (4.18)  \nTo see inaction in action and vice versa is to understand that \nthe Lord does  all the work indirectly through His power by using us.', metadata={'vector': array([-0.01086397, -0.01465061,  0.00732531, ..., -0.00368611,
         0.01414126, -0.0371828 ], dtype=float32), 'id': 'a2088f52-eb0e-43bc-a93d-1023541dff9d', '_distance': 0.26249048113822937}),
 Document(page_content='the best of your ability, O Arjuna, with your mind attached to the \nLord, abandoning worry and attachment to the results, and remain-\ning calm in both success and failure. The calmness of mind  is \ncalled Karma -yoga . (2.48) Work done with selfish motives is infe-\nrior by far to selfless service or Karma -yoga . Therefore, be a \nKarma -yogi, O Arjuna. Those who work only to enjoy the fruits of \ntheir labor are, in truth, unhappy. Because , one has no control over \nthe results. (2.49)', metadata={'vector': array([ 0.00598168, -0.01145132,  0.01744962, ..., -0.01556102,
         0.00799331, -0.03753265], dtype=float32), 'id': 'b3e30fff-f3a5-4665-9569-b285f8cf9c76', '_distance': 0.2726559340953827})]

A screenshot of a PDF file containing all the query-related data is shown below.

7. Passing the string query to obtain some examples

# passing in the string query to get some reference
query = "which factors appear to be the major nutritional limitations of fast-food meals"

llm_chain.run("which factors appear to be the major nutritional limitations of fast-food meals")
The major nutritional limitations of fast-food meals
are typically high levels of saturated fat, trans fat, Sodium,
and added sugar. These ingredients can lead to an increased risk of obesity,
 type 2 diabetes, cardiovascular disease, and other health issues.
 Additionally, fast-food meals often lack essential vitamins, minerals,
 and fiber, which are important for optimal nutrition.

8. HyDE response: this shows that the output is being processed which is excellent

Because a vanilla RAG searches the database directly for comparable keywords, it is unable to find the correct response.

Normal RAG System

1. In a normal RAG system, the retrieval phase involves using conventional keyword-based or semantic matching techniques to look for pertinent material from a corpus.

2. The context and data provided by the retrieved documents are then used to enhance the generating process and provide responses or answers.

3. The effectiveness of the retrieval techniques has a major impact on the quality of the documents that are retrieved, and some highly relevant information may be missed.

An illustrative output

Final thoughts on HyDE

HyDE, our advanced search companion, uses Language Models (LLMs) to create imaginary documents, making information retrieval super precise. The brilliance of HyDE lies in its ability to excel without relying on vast labeled data, making it invaluable when training information is scarce.

What’s even more impressive is HyDE’s capability to assist in constructing the necessary training data for specific retrieval tasks. Its proficiency in retrieving relevant information chunks in the Retrieval Augmented Generation (RAG) pipeline showcases its technical prowess.

Improving RAG performance means nothing if hallucinations discredit your chatbot’s output. Aporia enables you full control over your AI, mitigating hallucinations and preventing risks from impacting your users.

Learn more about Guardrails, or book a demo to see them in action.

Green Background

Control All your GenAI Apps in minutes