Constructing Context-Aware Query-Answering Systems With LLMs What are LLMs? What Are We Doing Our Toolbox The Process Conclusion


Large language models are advanced machine learning models trained on massive amounts of text data to know and generate human-like language. These models, equivalent to GPT-4, LLaMA, Chinchilla, and LaMDA, are designed to perform various NLP tasks, including text generation, translation, summarization, sentiment evaluation, and question-answering, amongst others.

LLMs have quite a few use cases, with a few of the most typical ones being:

  • Customer Support
  • Content Generation
  • Sentiment Evaluation
  • Text Summarization
  • Machine Translation
  • Information Extraction
  • Data Evaluation and Insights
  • Email Automation
  • Personalized Marketing

These use cases exhibit the flexibility of huge language models, making them essential resources for businesses aiming to reinforce efficiency and gain a competitive advantage.

Nevertheless, LLMs have shortcomings, including the next:

  • producing plausible-sounding but incorrect responses
  • fighting ambiguity
  • exhibiting biased behavior
  • demanding computational resources for training and deployment
  • possessing limited comprehension of real-world concepts or reasoning capabilities beyond the textual patterns of their training data

We complement the LLM with our domain-specific corpus to mitigate the shortcomings of using a general model like ChatGPT. Quite than leverage the whole lot of its knowledge base, our model focuses its answers on a selected body of the text to limit its propensity to misinform.

We’ll arrange a Python API that, at a high level, integrates a big language model (LLM) for question-answering using an unstructured document as the information source.

The applying processes user input and generates appropriate responses based on the document’s content. It uses the LangChain library for document loading, text splitting, embeddings, vector storage, question-answering, and GPT-3.5-turbo under the hood providing the bot responses via JSON to our UI.

FastAPI is a contemporary, high-performance web framework for constructing APIs with Python based on standard Python-type hints. It’s designed to be easy to make use of and provides automatic validation, documentation, and code completion.

LangChain is a framework used to develop applications powered by language models created by Harrison Chase; it’s a Python library providing out-of-the-box support to construct NLP applications using LLMs. You’ll be able to connect with various data and computation sources and construct applications that perform NLP tasks on domain-specific data sources.

Chroma is an open-source embedding database designed to efficiently store and retrieve high-dimensional vector data, particularly in NLP and machine learning.

React is a well-liked open-source JavaScript library for constructing user interfaces, particularly web applications.

  • git clone the monorepo repository called llm-gpt-demo
  • Get an API key from OpenAI and place it within the .env files on the backend repo.
  • Be certain that you furthermore may install the vital packages within the backend and frontend repos:
cd backend/ 
pip install -r requirements.txt
cd frontend/
npm i

Laying the muse

By cleansing and reworking raw data right into a consistent, structured, and simply interpretable dataset, we are able to be certain that irrelevant information, noise, and inconsistencies are removed.

For this demo, we’ll skip this piece since we use a single document for example. First, we leverage LangChain’s document_loaders.unstructured package like this import below:

from langchain.document_loaders.unstructured import UnstructuredFileLoader

Then we load within the unstructured data like so:

loader = UnstructuredFileLoader(‘./docs/document.txt’)
documents = loader.load()

Note that we’re using Ethereum’s Whitepaper (2014) as text and loading within the raw text from the unique PDF.

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
split_texts = text_splitter.split_documents(documents)

We split the text into chunks of 1,000 characters at a time for several reasons:

  • Splitting the text into smaller chunks makes it easier to work with and process the text during various stages of the NLP pipeline, equivalent to generating embeddings or performing similarity searches. Smaller chunks also make it more efficient for the massive language model to know and reply to queries.
  • Large language models often have a maximum token limit for processing text. By breaking down the text into smaller pieces, you’ll be able to be certain that the text segments fit inside this limit, allowing the model to research and process the text effectively.
  • When working with smaller chunks of text, the similarity search and response generation processes are sometimes faster and more accurate. It’s because the model can give attention to a more limited context to generate a relevant response somewhat than trying to know the whole document.

CharacterTextSplitter in LangChain takes two arguments: chunk_size and chunk_overlap. The chunk_size parameter determines the scale of every text chunk, while chunk_overlap specifies the variety of overlapping characters between two adjoining chunks. By setting these parameters, you’ll be able to control the granularity of the text splitting and tailor it to your specific application’s requirements.


Representing text numerically

For our model to leverage what’s within the text, we must first convert the textual data into numerical representations called embeddings to make sense of it. These embeddings capture the semantic meaning of the text and permit for efficient and meaningful comparisons between text segments.

embeddings = OpenAIEmbeddings()

In LangChain, embeddings = OpenAIEmbeddings() creates an instance of the OpenAIEmbeddings class, which generates vector embeddings of the text data. Vector embeddings are the numerical representations of the text that capture its semantic meaning. These embeddings are utilized in various stages of the NLP pipeline, equivalent to similarity search and response generation.

The embeddings generated will enable the appliance to efficiently compare and discover relevant text segments based on their semantic meanings.

Efficient organization of embeddings

A vector database, a vector store or search engine, is an information storage and retrieval system designed to handle high-dimensional vector data. Within the context of natural language processing (NLP) and machine learning, vector databases are used to store and efficiently query embeddings or other vector representations of information.

LangChain leverages ChromaDB under the hood, as you’ll be able to see from this import:

from langchain.vectorstores import Chroma

The aim of the Chroma vector database is to efficiently store and query the vector embeddings generated from the text data.

Finding relevant matches

With the query embedding generated, a similarity search is performed within the vector database to discover essentially the most relevant matches. The search compares the query embedding to the stored embeddings, rating them based on similarity metrics like cosine similarity or Euclidean distance. The highest matches are essentially the most relevant text passages to the user’s query.

vector_db = Chroma.from_documents(documents=split_texts, embeddings=embeddings, persist_directory=persist_directory)

This line of code creates an instance of the Chroma vector database using the from_documents() method. By default, Chroma uses an in-memory database, which gets continued on exit and loaded on start, but for our project, we’re persisting the database locally using the persist_directory option and passing within the name with a variable of the identical name.

Producing informative and contextual answers

Finally, the crafted prompt is fed to ChatGPT, which generates a solution based on the input. The generated response is then returned to the user, completing the means of retrieving and delivering relevant information based on the user’s query. The language model produces a coherent, informative, and contextually appropriate response by leveraging its deep understanding of language patterns and the provided context.

"created": 1681098257,
"model": "llm-gpt-demo-v1",
"content": "The aim of making Ethereum was to merge and improve upon the concepts of scripting, altcoins, and on-chain meta-protocols, and to permit developers to create different sorts of decentralized applications which might be more scalable, standardized, feature-complete, easy to develop and interoperable. Ethereum achieves this by developing a blockchain with a built-in Turing-complete programming language that enables developers to create arbitrary rules for ownership, transaction formats, and state transition functions. With Ethereum, developers can create smart contracts and decentralized applications with more power and suppleness than Bitcoin scripting allows."

This JSON representation is eventually displayed in our React Chat UI.

We’ve got explored one approach to establishing a full-stack application that utilizes FastAPI, LangChain to generate embeddings, organizes embeddings in a vector database like Chroma, performs similarity searches, crafts prompts to generate informative and contextual answers from that text by supplementing the model with a domain-specific corpus.

This approach goals to beat the constraints of LLMs and supply quick and accurate information retrieval while limiting a few of the risks stated earlier.

For a take a look at the finished code, please reference the GitHub repo. I hope you enjoyed this text.

Follow me on Linkedin.


What are your thoughts on this topic?
Let us know in the comments below.


0 0 votes
Article Rating
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

Would love your thoughts, please comment.x