is a crucial task that’s critical to attain, with the vast amount of content available today. An information retrieval task is, for instance, each time you Google something or ask ChatGPT for a solution to a matter. The data you’re looking through could possibly be a closed dataset of documents or the complete web.
In this text, I’ll discuss agentic information finding, covering how information retrieval has modified with the discharge of LLMs, and particularly with the rise of AI Agents, who’re way more able to find information than we’ve seen until now. I’ll first discuss RAG, since that may be a foundational block in agentic information finding. I’ll then proceed by discussing on a high level how AI agents might be used to seek out information.
Why do we want agentic information finding
Information retrieval is a comparatively old task. TF-IDF is the primary algorithm used to seek out information in a big corpus of documents, and it really works by indexing your documents based on the frequency of words inside specific documents and the way frequent a word is across all documents.
If a user searches for a word, and that word occurs ceaselessly in a number of documents, but rarely across all documents, it indicates strong relevance for those few documents.
Information retrieval is such a critical task because, as humans, we’re so reliant on quickly finding information to unravel different problems. These problems could possibly be:
- The right way to cook a particular meal
- The right way to implement a certain algorithm
- The right way to get from location A->B
TF-IDF still works surprisingly well, though we’ve now discovered much more powerful approaches to finding information. Retrieval augmented generation (RAG), is one strong technique, counting on semantic similarity to seek out useful documents.
Agentic information finding utilises different techniques akin to keyword search (TF-IDF, for instance, but typically modernized versions of the algorithm, akin to BM25), and RAG, to seek out relevant documents, search through them, and return results to the user.
Construct your individual RAG

Constructing your individual RAG is surprisingly easy with all of the technology and tools available today. There are many packages on the market that assist you to implement RAG. All of them, nonetheless, depend on the identical, relatively basic underlying technology:
- Embed your document corpus (you furthermore mght typically chunk up the documents)
- Store the embeddings in a vector database
- The user inputs a search query
- Embed the search query
- Find embedding similarity between the document corpus and the user query, and return probably the most similar documents
This might be implemented in only a number of hours when you know what you’re doing. To embed your data and user queries, you may, for instance, use:
- Managed services akin to
- OpenAI’s text-embedding-large-3
- Google’s gemini-embedding-001
- Open-source options like
- Alibaba’s qwen-embedding-8B
- Mistral’s Linq-Embed-Mistral
After you’ve embedded your documents, you may store them in a vector database akin to:
After that, you’re mainly able to perform RAG. In the subsequent section, I’ll also cover fully managed RAG solutions, where you only upload a document, and all chunking, embedding, and searching is handled for you.
Managed RAG services
In the event you want an easier approach, it’s also possible to use fully managed RAG solutions. Listed here are a number of options:
- Ragie.ai
- Gemini File Search Tool
- OpenAI File search tool
These services simplify the RAG process significantly. You possibly can upload documents to any of those services, and the services routinely handle the chunking, embedding, and inference for you. All you’ve to do is upload your raw documents and supply the search query you ought to run. The service will then offer you the relevant documents to you’re queries, which you’ll be able to feed into an LLM to reply user questions.
Regardless that managed RAG simplifies the method significantly, I might also like to spotlight some downsides:
In the event you only have PDFs, you may upload them directly. Nonetheless, there are currently some file types not supported by the managed RAG services. A few of them don’t support PNG/JPG files, for instance, which complicates the method. One solution is to perform OCR on the image, and upload the txt file (which is supported), but this, in fact, complicates your application, which is the precise thing you ought to avoid when using managed RAG.
One other downside in fact is that you’ve to upload raw documents to the services. When doing this, that you must be sure that to remain compliant, for instance, with GDPR regulations within the EU. This is usually a challenge for some managed RAG services, though I do know OpenAI a minimum of supports EU residency.
I’ll also provide an example of using OpenAI’s File Search Tool, which is of course quite simple to make use of.
First, you create a vector store and upload documents:
from openai import OpenAI
client = OpenAI()
# Create vector store
vector_store = client.vector_stores.create(
name="",
)
# Upload file and add it to the vector store
client.vector_stores.files.upload_and_poll(
vector_store_id=vector_store.id,
file=open("filename.txt", "rb")
)
After uploading and processing documents, you may query them with:
user_query = "What's the meaning of life?"
results = client.vector_stores.search(
vector_store_id=vector_store.id,
query=user_query,
)
As you could notice, this code is quite a bit simpler than establishing embedding models and vector databases to construct RAG yourself.
Information retrieval tools
Now that we’ve the knowledge retrieval tools available, we are able to start performing agentic information retrieval. I’ll start off with the initial approach to make use of LLMs for information finding, before continuing with the higher and updated approach.
Retrieval, then answering
The primary approach is to begin by retrieving relevant documents and feeding that information to an LLM before it answers the user’s query. This might be done by running each keyword search and RAG search, finding the highest X relevant documents, and feeding those documents into an LLM.
First, find some documents with RAG:
user_query = "What's the meaning of life?"
results_rag = client.vector_stores.search(
vector_store_id=vector_store.id,
query=user_query,
)
Then, find some documents with a keyword search
def keyword_search(query):
# keyword search logic ...
return results
results_keyword_search = keyword_search(query)
Then add these results together, remove duplicate documents, and feed the contents of those documents to an LLM for answering:
def llm_completion(prompt):
# llm completion logic
return response
prompt = f"""
Given the next context {document_context}
Answer the user query: {user_query}
"""
response = llm_completion(prompt)
In loads of cases, this works super well and can provide high-quality responses. Nonetheless, there may be a greater approach to perform agentic information finding.
Information retrieval functions as a tool
The latest frontier LLMs are all trained with agentic behaviour in mind. This implies the LLMs are super good at utilizing tools to reply the queries. You possibly can provide an LLM with an inventory of tools, which it decides when to make use of itself, and which it may utilise to reply user queries.
The higher approach is thus to supply RAG and keyword search as tools to your LLMs. For GPT-5, you may, for instance, do it like below:
# define a custom keyword search function, and supply GPT-5 with each
# keyword search and RAG (file search tool)
def keyword_search(keywords):
# perform keyword search
return results
user_input = "What's the meaning of life?"
tools = [
{
"type": "function",
"function": {
"name": "keyword_search",
"description": "Search for keywords and return relevant results",
"parameters": {
"type": "object",
"properties": {
"keywords": {
"type": "array",
"items": {"type": "string"},
"description": "Keywords to search for"
}
},
"required": ["keywords"]
}
}
},
{
"type": "file_search",
"vector_store_ids": [""],
}
]
response = client.responses.create(
model="gpt-5",
input=user_input,
tools=tools,
)
This works a lot better since you’re not running a one-time information finding with RAG/keyword search after which answering the user query. It really works well because:
- The agent can itself determine when to make use of the tools. Some queries, for instance, don’t require vector search
- OpenAI routinely does query rewriting, meaning it runs parallel RAG queries with different versions of the user query (which it writes itself, based on the user query
- The agent can determine to run more RAG queries/keyword searches if it believes it doesn’t have enough information
The last point within the list above is a very powerful point for agentic information finding. Sometimes, you don’t find the knowledge you’re on the lookout for with the initial query. The agent (GPT-5) can determine that that is the case and select to fireside more RAG/keyword search queries if it thinks it’s needed. This often results in a lot better results and makes the agent more prone to find the knowledge you’re on the lookout for.
Conclusion
In this text, I covered the fundamentals of agentic information retrieval. I began by discussing why agentic information is so essential, highlighting how we’re highly depending on quick access to information. Moreover, I covered the tools you need to use for information retrieval with keyword search and RAG. I then highlighted that you could run these tools statically before feeding the outcomes to an LLM, but the higher approach is to feed these tools to an LLM, making it an agent able to find information. I believe agentic information finding might be increasingly essential in the long run, and understanding use AI agents might be a crucial skill to create powerful AI applications in the approaching years.
👉 Find me on socials:
💻 My webinar on Vision Language Models
🧑💻 Get in contact
✍️ Medium
You too can read my other articles:
