July 23, 2024
Construct an Superior RAG App: Question Rewriting

Within the final article, I established the fundamental structure for a primary RAG app. In case you missed that, I like to recommend that you simply first learn that article. That can set the bottom from which we are able to enhance our RAG system. Additionally in that final article, I listed some frequent pitfalls that RAG functions are likely to fail on. We will likely be tackling a few of them with some superior strategies on this article.Question marks

To recap, a primary RAG app makes use of a separate data base that aids the LLM in answering the person’s questions by offering it with extra context. That is additionally referred to as a retrieve-then-read method.

The Downside

To reply the person’s query, our RAG app will retrieve applicable primarily based on the question itself. It would discover chunks of textual content on the vector DB with related content material to regardless of the person is asking. Different data bases (engines like google, and so forth.) additionally apply. The issue is that the chunk of data the place the reply lies won’t be just like what the person is asking. The query will be badly written, or expressed in another way to what we anticipate. And, if our RAG app can’t discover the data wanted to reply the query, it gained’t reply accurately.

There are lots of methods to unravel this drawback, however for this text, we’ll have a look at question rewriting.

What Is Question Rewriting?

Merely put, question rewriting means we’ll rewrite the person question in our personal phrases in order that our RAG app will know greatest methods to reply. As an alternative of simply doing retrieve-then-read, our app will do a rewrite-retrieve-read method.

We use a Generative AI mannequin to rewrite the query. This mannequin generally is a massive mannequin, like (or the identical as) the one we use to reply the query within the closing step. It will also be a smaller mannequin, specifically skilled to carry out this process.

Moreover, question rewriting can take many various kinds relying on the wants of the app. More often than not, primary question rewriting will likely be sufficient. However, relying on the complexity of the questions we have to reply, we would want extra superior strategies like HyDE, multi-querying, or step-back questions. Extra data on these is within the following part.

Why Does It Work?

Question rewriting normally offers higher efficiency in any RAG app that’s knowledge-intensive. It is because RAG functions are delicate to the phrasing and particular key phrases of the question. Paraphrasing this question is useful within the following situations:

  1. It restructures oddly written questions to allow them to be higher understood by our system.
  2. It erases context given by the person which is irrelevant to the question.
  3. It could possibly introduce frequent key phrases, which is able to give it a greater likelihood of matching up with the right context.
  4. It could possibly cut up complicated questions into completely different sub-questions, which will be extra simply responded to individually, every with their corresponding context.
  5. It could possibly reply questions that require a number of ranges of pondering by producing a step-back query, which is a higher-level idea query to the one from the person. It then makes use of each the unique and the step-back inquiries to retrieve context.
  6. It could possibly use extra superior question rewriting strategies like HyDE to generate hypothetical paperwork to reply the query. These hypothetical paperwork will higher seize the intent of the query and match up with the embeddings that comprise the reply within the vector DB.

How To Implement Question Rewriting

We’ve got established that there are completely different methods for question rewriting relying on the complexity of the questions. We are going to briefly focus on methods to implement every of them. After, we’ll see an actual instance to check the results of a primary RAG app versus a RAG app with question rewriting. You too can observe all of the examples in the article’s Google Colab notebook.

Zero-Shot Question Rewriting

That is easy question rewriting. Zero-shot refers back to the immediate engineering strategy of giving examples of the duty to the LLM, which on this case, we give none.

Zero-Shot Query Rewriting

Few-Shot Question Rewriting

For a barely higher end result at the price of utilizing just a few extra tokens per rewrite, we may give some examples of how we would like the rewrite to be carried out.

Few-Shot Query Rewriting

Trainable Rewriter

We are able to fine-tune a pre-trained mannequin to carry out the question rewriting process. As an alternative of counting on examples, we are able to train it how question rewriting ought to be carried out to realize the perfect ends in context retrieving. Additionally, we are able to additional prepare it utilizing reinforcement studying so it might probably study to acknowledge problematic queries and keep away from poisonous and dangerous phrases. We are able to additionally use an open-source mannequin that has already been skilled by anyone else on the duty of question rewriting.


If the person question comprises a number of questions, this may make context retrieval tough. Every query most likely wants completely different data, and we’re not going to get all of it utilizing all of the questions as the premise for data retrieval. To resolve this drawback, we are able to decompose the enter into a number of sub-queries, and carry out retrieval for every of the sub-queries.


Step-Again Immediate

Many questions generally is a bit too complicated for the RAG pipeline’s retrieval to know the a number of ranges of data wanted to reply them. For these instances, it may be useful to generate a number of further queries to make use of for retrieval. These queries will likely be extra generic than the unique question. This may allow the RAG pipeline to retrieve related data on a number of ranges.

Step-Back Prompt


One other methodology to enhance how queries are matched with context chunks is Hypothetical Document Embeddings or HyDE. Typically, questions and solutions aren’t that semantically related, which might trigger the RAG pipeline to overlook vital context chunks within the retrieval stage. Nevertheless, even when the question is semantically completely different, a response to the question ought to be semantically just like one other response to the identical question. The HyDE methodology consists of making hypothetical context chunks that reply the question and utilizing them to match the actual context that can assist the LLM reply.

HyDe 1

HyDE 2

Instance: RAG With vs With out Question Rewriting

Taking the RAG pipeline from the final article, “How To Construct a Primary RAG App,” we’ll introduce question rewriting into it. We are going to ask it a query a bit extra superior than final time and observe whether or not the response improves with question rewriting over with out it. First, let’s construct the identical RAG pipeline. Solely this time, I’ll solely use the highest doc returned from the vector database to be much less forgiving to missed paperwork.

Query: "Which evaluation tools are useful for evaluating RAG pipeline?"

The response is sweet and primarily based on the context, but it surely received caught up in me asking about analysis and missed that I used to be particularly asking for instruments. Subsequently, the context used does have data on some benchmarks, but it surely misses the following chunk of data that talks about instruments.

Now, let’s implement the identical RAG pipeline however now, with question rewriting. In addition to the question rewriting prompts, we now have already seen within the earlier examples, I’ll be utilizing a Pydantic parser to extract and iterate over the generated various queries.

Same example with query rewriting

The brand new question now matches with the chunk of data I wished to get my reply from, giving the LLM a greater likelihood of answering a a lot better response to my query.


We’ve got taken our first step out of primary RAG pipelines and into Superior RAG. Question rewriting is a quite simple Superior RAG approach, however a strong one for bettering the outcomes of a RAG pipeline. We’ve got gone over other ways to implement it relying on what sort of questions we have to enhance. In future articles, we’ll go over different Superior RAG strategies that may sort out completely different RAG points than these seen on this article.