The way to Construct Agentic RAG with Hybrid Search

, also often known as RAG, is a strong method to seek out relevant documents in a corpus of knowledge, which you then provide to an LLM to offer answers to user questions.

Traditionally, RAG first uses vector similarity to seek out relevant chunks of documents within the corpus after which feeds essentially the most relevant chunks into the LLM to supply a response.

This works very well in quite a lot of scenarios since semantic similarity is a strong strategy to find essentially the most relevant chunks. Nevertheless, semantic similarity struggles in some scenarios, for instance, when a user inputs specific keywords or IDs that must be explicitly situated for use as a relevant chunk. In these instances, vector similarity will not be that effective, and you would like a greater approach to seek out essentially the most relevant chunks.

That is where keyword search is available in, where you discover relevant chunks while using keyword search and vector similarity, also often known as hybrid search, which is the subject I’ll be discussing today.

This infographic highlights the predominant contents of this text. I’ll be discussing how you may implement an agentic RAG system using hybrid search. Image by Gemini

Why use hybrid search

Vector similarity may be very powerful. It’s capable of effectively find relevant chunks from a corpus of documents, even when the input prompt has typos or uses synonyms equivalent to the word lift as an alternative of the word elevator.

Nevertheless, vector similarity falls short in other scenarios, specifically when looking for specific keywords or identification numbers. The rationale for that is that vector similarity doesn’t weigh individual words or IDs specifically highly in comparison with other words. Thus, keywords or key identifiers are typically drowned in other relevant words, which makes it hard for semantic similarity to seek out essentially the most relevant chunks.

Keyword search, nonetheless, is incredibly good at keywords and specific identifiers, because the name suggests. With BM25, for instance, if you’ve a word that only exists in a single document and no other documents, and that word is within the user query, that document shall be weighed very highly and more than likely included within the search results.

That is the predominant reason you need to use a hybrid search. You’re simply capable of find more relevant documents if the user is inputting keywords into their query.

The way to implement hybrid search

There are many ways to implement hybrid search. If you need to implement it yourself, you may do the next.

Implement vector retrieval via semantic similarity as you’d have normally done. I won’t cover the precise details in this text since it’s out of scope, and the predominant point of this text is to cover the keyword search a part of hybrid search.
Implement BM25 or one other keyword search algorithm that you simply prefer. BM25 is a normal because it builds upon TF-IDF and has a greater formula, making it the better option. Nevertheless, the precise keyword search algorithm you employ doesn’t really matter, though I like to recommend using BM25 as the usual.
Apply a weighting between the similarity found via semantic similarity and keyword search similarity. You may determine this weighting yourself depending on what you regard as most significant. If you’ve an agent performing a hybrid search, you may also have the agent determine this weighting, as agents will typically have a very good intuition for when to make use of or when to attend, left or similarity more, and when to weigh keyword search similarity more

There are also packages you should utilize to realize this, equivalent to TurboPuffer vector storage, which has a Keyboard Search package implemented. To learn the way the system really works, nonetheless, it’s also beneficial that you simply implement this yourself to check out the system and see if it really works.

Overall, nonetheless, hybrid search isn’t really that difficult to implement and may give quite a lot of advantages. If you happen to’re looking right into a hybrid search, you usually understand how vector search itself works and also you simply must add the keyword search element to it. Keyword search itself will not be really that complicated either, which makes hybrid search a comparatively easy thing to implement, which may yield quite a lot of advantages.

Agentic hybrid search

Implementing hybrid search is great, and it should probably improve how well your RAG system works right off the bat. Nevertheless, I consider that in the event you actually need to get essentially the most out of a hybrid search RAG system, it’s essential make it agentic.

By making it agentic, I mean the next. A typical RAG system first fetches relevant chunks, document chunks, feeds those chunks into an LLM, and has it answer a user query

Nevertheless, an agentic RAG system does it a bit otherwise. As an alternative of doing the trunk retrieval before using an LLM to reply, you make the trunk retrieval function a tool that the LLM can access. This, in fact, makes the LLM agentic, so it has access to a tool and has several major benefits:

The agent can itself determine the prompt to make use of for the vector search. So as an alternative of using only the precise user prompt, it might rewrite the prompt to get even higher vector search results. Query rewriting is a widely known technique you should utilize to enhance RAG performance.
The agent can iteratively fetch the data, so it might first do one vector search call, check if it has enough information to reply an issue, and if not, it might fetch much more information. This makes it so the agent can review the data it fetched and, if needed, fetch much more information, which is able to make it higher capable of answer questions.
The agent can determine the weighting between keyword search and vector similarity itself. That is incredibly powerful since the agent typically knows if it’s looking for a keyword or if it’s looking for semantically similar content. For instance, if the user included a keyword of their search query, the agent will likely weigh the keyword search element of hybrid search higher, and let’s get even higher results. This works lots higher than having a static number for the weighting between keyword search and vector similarity.

Today’s Frontier LLMs are incredibly powerful and can have the ability to make all of those judgments themselves. Just a couple of months ago, I’d doubt in the event you should give the agent as much freedom as I described within the bullet points above, having it select prompt use, iteratively fetching information, and the weighting between keyword search and semantic similarity. Nevertheless, today I do know that the newest Frontier LLMs have turn out to be so powerful that this may be very doable and even something I like to recommend implementing.

Thus, by each implementing HybridSearch and by making it agentic, you may really supercharge your RAG system and achieve much better results than you’d have achieved with a static vector similarity-only RAG system.

Conclusion

In this text, I’ve discussed how you can implement hybrid search into your RAG system. Moreover, I described how you can make your RAG system authentic to realize much better results. Combining these two techniques will result in an incredible performance increase in your information retrieval system, and it might, in truth, be implemented quite easily using coding agents equivalent to Claude Code. I consider Agentex Systems is the longer term of knowledge retrieval, and I urge you to supply effective information retrieval tools, equivalent to a hybrid search, to your agents and make them perform the remainder of the work.

👉 My free eBook and Webinar:

🚀 10x Your Engineering with LLMs (Free 3-Day Email Course)

📚 Get my free Vision Language Models ebook

💻 My webinar on Vision Language Models

👉 Find me on socials:

💌 Substack

🔗 LinkedIn

🐦 X / Twitter