Methods to Train a Chatbot Using RAG and Custom Data

-

?

RAG, which stands for Retrieval-Augmented Generation, describes a process by which an LLM (Large Language Model) could be optimized by training it to tug from a more specific, smaller knowledge base relatively than its huge original base. Typically, LLMs like ChatGPT are trained on all the web (billions of information points). This implies they’re vulnerable to small errors and hallucinations.

Here is an example of a situation where RAG might be used and be helpful:

Creating your RAG LLM

Some of the popular tools for constructing RAG systems is LlamaIndex, which:

  • Simplifies the combination between LLMs and external data sources
  • Allows developers to structure, index, and query their data in a way that’s optimized for LLM consumption
  • Works with many sorts of data, reminiscent of PDFs and text files
  • Helps construct a RAG pipeline that retrieves and injects relevant chunks of information right into a prompt before passing it to the LLM for generation

Download your data

Start by getting the info you must train your model with. To download PDFs from Wikipedia (CC by 4.0) in the correct format, be certain that you click Print after which “Save as PDF.”

For the needs of this text and to maintain things easy, I’ll only download the pages of the next 5 popular states: 

  • Florida
  • California
  • Washington D.C.
  • Recent York
  • Texas

Ensure that to avoid wasting these all in a folder where your project can easily access them. I saved them in a single called “data”.

Get crucial API keys

Before you create your custom states database, there are 2 API keys you’ll must generate.

  • One from OpenAI, to access a base LLM
  • One from Llama to access the index database you upload custom data to

Once you will have these API keys, store them in a .env file in your project. 

#.env file
LLAMA_API_KEY = ""
OPENAI_API_KEY = ""

Create an Index and Upload your data 

Create a LlamaCloud account. When you’re in, find the Index section and click on “Create” to create a brand new index.

Screenshot by creator

An index stores and manages document indexes remotely in order that they could be queried via an API with no need to rebuild or store them locally.

Here’s how it really works:

  1. Whenever you create your index, there can be a spot where you may upload files to feed into the model’s database. Upload your PDFs here.
  2. LlamaIndex parses and chunks the documents.
  3. It creates an index (e.g., vector index, keyword index).
  4. This index is stored in LlamaCloud.
  5. You’ll be able to then query it using an LLM through the API.

The subsequent thing you want to do is to configure an embedding model. An embedding model is the LLM that can underlie your project and be accountable for retrieving the relevant information and outputting text.

Whenever you’re making a latest index you must select “Create a brand new OpenAI embedding”:

Screenshot by creator

Whenever you create your latest embedding you’ll have to supply your OpenAI API key and name your model.

Screenshot by creator

Once you will have created your model, leave the opposite index settings as their defaults and hit “Create Index” at the underside.

It might take just a few minutes to parse and store all of the documents, so be certain that that each one the documents have been processed before you are trying to run a question. The status should show on the correct side of the screen if you create your index in a box that claims “Index Files Summary”.

Accessing your model via code

When you’ve created your index, you’ll also get an Organization ID. For cleaner code, add your Organization ID and Index Name to your .env file. Then, retrieve all of the crucial variables to initialize your index in your code:

index = LlamaCloudIndex(
  name=os.getenv("INDEX_NAME"), 
  project_name="Default",
  organization_id=os.getenv("ORG_ID"),
  api_key=os.getenv("LLAMA_API_KEY")
)

Query your index and ask an issue

To do that, you’ll must define a question (prompt) after which generate a response by calling the index as such:

query = "What state has the very best population?"
response = index.as_query_engine().query(query)

# Print out just the text a part of the response
print(response.response)

Having an extended conversation together with your bot

By querying a response from the LLM the best way we just did above, you’re capable of easily access information from the documents you loaded. Nonetheless, in case you ask a follow up query, like “Which one has the least?” without context, the model won’t remember what your original query was. It is because we haven’t programmed it to maintain track of the chat history.

As a way to do that, you want to:

  • Create memory using ChatMemoryBuffer
  • Create a chat engine and add the created memory using ContextChatEngine

To create a chat engine:

from llama_index.core.chat_engine import ContextChatEngine
from llama_index.core.memory import ChatMemoryBuffer

# Create a retriever from the index
retriever = index.as_retriever()

# Arrange memory
memory = ChatMemoryBuffer.from_defaults(token_limit=2000)

# Create chat engine with memory
chat_engine = ContextChatEngine.from_defaults(
    retriever=retriever,
    memory=memory,
    llm=OpenAI(model="gpt-4o"),
)

Next, feed your query into your chat engine:

# To question:
response = chat_engine.chat("What's the population of Recent York?")
print(response.response)

This offers the response: “As of 2024, the estimated population of Recent York is nineteen,867,248.”

I can then ask a follow up query:

response = chat_engine.chat("What about California?")
print(response.response)

This offers the next response: “As of 2024, the population of California is 39,431,263.” As you may see, the model remembered that what we were asking about previously was population and responded accordingly.

Streamlit UI chatbot app for US state RAG. Screenshot by creator

Conclusion

Retrieval Augmented Generation is an efficient option to train an LLM on specific data. LlamaCloud offers a straightforward and easy option to construct your personal RAG framework and query the model that lies underneath.

The code I used for this tutorial was written in a notebook, but it will probably even be wrapped in a Streamlit app to create a more natural backwards and forwards conversation with a chatbot. I’ve included the Streamlit code here on my Github.

Thanks for reading

  • Connect with me on LinkedIn
  • Buy me a coffee to support my work!
  • I offer 1:1 data science tutoring, profession coaching/mentoring, writing advice, resume reviews & more on Topmate!
ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x