generate tons of words and responses based on general knowledge, but what happens when we’d like answers requiring accurate and specific knowledge? Solely generative models often struggle to offer answers on domain specific questions for a bunch of reasons; possibly the information they were trained on at the moment are outdated, possibly what we’re asking for is specific and specialized, possibly we would like responses that have in mind personal or corporate data that just aren’t public… 🤷♀️ the list goes on.
So, how can we leverage generative AI while keeping our responses accurate, relevant, and down-to-earth? An excellent answer to this query is the Retrieval-Augmented Generation (RAG) framework. RAG is a framework that consists of two key components: retrieval and generation (duh!). Unlike solely generative models which might be pre-trained on specific data, RAG incorporates an additional step of retrieval that enables us to push additional information into the model from an external source, resembling a database or a document. To place it in a different way, a RAG pipeline allows for providing coherent and natural responses (provided by the generation step), that are also factually accurate and grounded in a knowledge base of our selection (provided by the retrieval step).
In this fashion, RAG might be a particularly precious tool for applications where highly specialized data is required, as as an example customer support, legal advice, or technical documentation. One typical example of a RAG application is customer support chatbots, answering customer issues based on an organization’s database of support documents and FAQs. One other example can be complex software or technical products with extensive troubleshooting guides. Yet another example can be legal advice — a RAG model would access and retrieve custom data from law libraries, previous cases, or firm guidelines. The examples are really infinite; nevertheless, in all these cases, the access to external, specific, and relevant to the context data enables the model to supply more precise and accurate responses.
So, on this post, I walk you thru constructing a straightforward RAG pipeline in Python, utilizing ChatGPT API, LangChain, and FAISS.
What about RAG?
From a more technical perspective, RAG is a method used to boost an LLM’s responses by injecting it with additional, domain-specific information. In essence, RAG allows for a model to also have in mind additional external information — like a recipe book, a technical manual, or an organization’s internal knowledge base — while forming its responses.
This may be very essential since it allows us to eliminate a bunch of problems inherent to LLMs, as as an example:
- Hallucinations — making things up
- Outdated information — if the model wasn’t trained on recent data
- Transparency — not knowing where responses are coming from
To make this work, the external documents are first processed into vector embeddings and stored in a vector database. Then, once we submit a prompt to the LLM, any relevant data is retrieved from the vector database and passed to the LLM together with our prompt. Consequently, the response of the LLM is formed by considering each our prompt and any relevant information existing within the vector database within the background. Such a vector database might be hosted locally or within the cloud, using a service like Pinecone or Weaviate.
What about ChatGPT API, LangChain, and FAISS?
The primary component for constructing a RAG pipeline is the LLM model that may generate the responses. This might be any LLM, like Gemini or Claude, but on this post, I can be using OpenAI’s ChatGPT models via their API platform. With a view to use their API, we’d like to register and acquire an API key. We also must be sure that the respective Python libraries are installed.
pip install openaiThe opposite major component of constructing a RAG is processing external data — generating embeddings from documents and storing them in a vector database. The preferred framework for performing such a task is LangChain. Particularly, LangChain allows:
- Load and extract text from various document types (PDFs, DOCX, TXT, etc.)
- Split the text into chunks suitable for generating the embeddings
- Generate vector embeddings (on this post, with the help of OpenAI’s API)
- Store and search embeddings via vector databases like FAISS, Chroma, and Pinecone
We are able to easily install the required LangChain libraries by:
pip install langchain langchain-community langchain-openaiOn this post, I’ll be using LangChain along with FAISS, an area vector database developed by Facebook AI Research. FAISS is a really lightweight package, and is thus appropriate for constructing a straightforward/small RAG pipeline. It may well be easily installed with:
pip install faiss-cpuPutting the whole lot together
So, in summary, I’ll use:
- ChatGPT models via OpenAI’s API because the LLM
- LangChain, together with OpenAI’s API, to load the external files, process them, and generate the vector embeddings
- FAISS to generate an area vector database
The file that I can be feeding into the RAG pipeline for this post is a text file with some facts about me. This text file is positioned within the folder ‘RAG files’.

Now we’re all arrange, and we will start by specifying our API key and initializing our model:
from openai import OpenAI # Chat_GPT API key api_key = "your key" 
# initialize LLM 
llm = ChatOpenAI(openai_api_key=api_key, model="gpt-4o-mini", temperature=0.3)Then we will load the files we would like to make use of for the RAG, generate the embeddings, and store them as a vector database as follows:
# loading documents for use for RAG 
text_folder = "rag_files"  
all_documents = []
for filename in os.listdir(text_folder):
    if filename.lower().endswith(".txt"):
        file_path = os.path.join(text_folder, filename)
        loader = TextLoader(file_path)
        all_documents.extend(loader.load())
# generate embeddings
embeddings = OpenAIEmbeddings(openai_api_key=api_key)
# create vector database w FAISS 
vector_store = FAISS.from_documents(documents, embeddings)
retriever = vector_store.as_retriever()Finally, we will wrap the whole lot in a straightforward executable Python file:
def major():
    print("Welcome to the RAG Assistant. Type 'exit' to quit.n")
    
    while True:
        user_input = input("You: ").strip()
        if user_input.lower() == "exit":
            print("Exiting…")
            break
        # get relevant documents
        relevant_docs = retriever.get_relevant_documents(user_input)
        retrieved_context = "nn".join([doc.page_content for doc in relevant_docs])
        # system prompt
        system_prompt = (
            "You're a helpful assistant. "
            "Use ONLY the next knowledge base context to reply the user. "
            "If the reply will not be within the context, say you do not know.nn"
            f"Context:n{retrieved_context}"
        )
        # messages for LLM 
        messages = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_input}
        ]
        # generate response
        response = llm.invoke(messages)
        assistant_message = response.content.strip()
        print(f"nAssistant: {assistant_message}n")
if __name__ == "__main__":
    major()Notice how the system prompt is defined. Essentially, a system prompt is an instruction given to the LLM that sets the behavior, tone, or constraints of the assistant before the user interacts. For instance, we could set the system prompt to make the LLM provide responses like talking to a 4-year-old or a rocket scientist — here we ask to offer responses only based on the external data we provided, the ‘’
So, let’s see what we’ve cooked! 🍳
Firstly, I ask an issue that’s irrelevant to the provided external datasource, to be sure that that the model only uses the provided datasource when forming the responses and never general knowledge.

… after which I asked some questions specifically from the file I provided…

✨✨✨✨
On my mind
Apparently, this can be a very simplistic example of a RAG setup — there’s far more to think about when implementing it in an actual business environment, resembling security concerns around how data is handled, or performance issues when coping with a bigger, more realistic knowledge corpus and increased token usage. Nonetheless, I consider OpenAI’s API is really impressive and offers immense, untapped potential for constructing custom, context-specific AI applications.
📰💌💼☕


