A Latest Document Summary Index for LLM-powered QA Systems Background Document Summary Index Example Next Steps

Artificial Intelligence

A Latest Document Summary Index for LLM-powered QA Systems Background Document Summary Index Example Next Steps

admin

May 9, 2023

A Latest Document Summary Index for LLM-powered QA Systems
Background
Document Summary Index
Example
Next Steps

On this blog post, we introduce a brand latest LlamaIndex data structure: a Document Summary Index. We describe how it might help offer higher retrieval performance in comparison with traditional semantic search, and in addition walk through an example.

One among the core use cases of Large Language Models (LLMs) is question-answering over your individual data. To do that, we pair the LLM with a “retrieval” model that may perform information retrieval over a knowledge corpus, and perform response synthesis over the retrieved texts using the LLM. This overall framework known as Retrieval-Augmented Generation.

Most users constructing LLM-powered QA systems today are likely to do some type of the next:

Take source documents, split every one into text chunks
Store text chunks in a vector db
During query-time, retrieve text chunks by embedding similarity and/or keyword filters.
Perform response synthesis

For quite a lot of reasons, this approach provides limited retrieval performance.

Limitations of Existing Approaches

There are just a few limitations of embedding retrieval using text chunks.

Oftentimes the query requires context beyond what’s indexed in a selected chunk.
Make the worth too small and also you’ll miss context. Make the worth too big and value/latency might increase with more irrelevant context.
Embeddings are inherently determined individually between text and the context.

Adding keyword filters are one method to enhance the retrieval results. But that comes with its own set of challenges. We would want to adequately determine the correct keywords for every document, either manually or through an NLP keyword extraction/topic tagging model. Also we would want to adequately infer the correct keywords from the query.

A diagram for the Document Summary Index

We propose a latest index in LlamaIndex that can extract/index an . This index can assist enhance retrieval performance beyond existing retrieval approaches. It helps to index more information than a single text chunk, and carries more semantic meaning than keyword tags. It also allows for a more flexible type of retrieval: we will do each LLM retrieval and embedding-based retrieval.

How It Works

During build-time, we ingest each document, and use a LLM to extract a summary from each document. We also split the document up into text chunks (nodes). Each the summary and the nodes are stored inside our Document Store abstraction. We maintain a mapping from the summary to the source document/nodes.

During query-time, we retrieve relevant documents to the query based on their summaries, using the next approaches:

We present sets of document summaries to the LLM, and ask the LLM to find out which documents are relevant + their relevance rating.
We retrieve relevant documents based on summary embedding similarity (with a top-k cutoff).

Note that this approach of retrieval for document summaries (even with the embedding-based approach) is different than embedding-based retrieval over text chunks. The retrieval classes for the document summary index retrieve for any chosen document, as an alternative of returning relevant chunks on the node-level.

Storing summaries for a document also enables . As an alternative of feeding your entire document to the LLM to start with, we will first have the LLM inspect the concise document summary to see if it’s relevant to the query in any respect. This leverages the reasoning capabilities of LLM’s that are more advanced than embedding-based lookup, but avoids the price/latency of feeding your entire document to the LLM

Additional Insights

Document retrieval with summaries may be regarded as a “middle ground” between semantic search and brute-force summarization across all docs. We glance up documents based on summary relevance with the given query, after which return all *nodes* corresponding to the retrieved docs.

Why should we do that? This retrieval method gives user more context than top-k over a text-chunk, by retrieving context at a document-level. But, it’s also a more flexible/automatic approach than topic modeling; no more worrying about whether your text has the precise keyword tags!

Let’s walk through an example that showcases the document summary index, over Wikipedia articles about different cities.

The remainder of this guide showcases the relevant code snippets. You will discover the full walkthrough here (and here’s the notebook link).

We are able to construct the GPTDocumentSummaryIndex over a set of documents, and pass in a ResponseSynthesizer object to synthesize summaries for the documents.

from llama_index import (
SimpleDirectoryReader,
LLMPredictor,
ServiceContext,
ResponseSynthesizer
)
from llama_index.indices.document_summary import GPTDocumentSummaryIndex
from langchain.chat_models import ChatOpenAI# load docs, define service context
...
# construct the index
response_synthesizer = ResponseSynthesizer.from_args(response_mode="tree_summarize", use_async=True)
doc_summary_index = GPTDocumentSummaryIndex.from_documents(
city_docs, 
service_context=service_context,
response_synthesizer=response_synthesizer
)

Once the index is built, we will get the summary for any given document:

summary = doc_summary_index.get_document_summary("Boston")

Next, let’s walk through an example LLM-based retrieval over the index.

from llama_index.indices.document_summary import DocumentSummaryIndexRetrieverretriever = DocumentSummaryIndexRetriever(
doc_summary_index,
# choice_select_prompt=choice_select_prompt,
# choice_batch_size=choice_batch_size,
# format_node_batch_fn=format_node_batch_fn,
# parse_choice_select_answer_fn=parse_choice_select_answer_fn,
# service_context=service_context
)
retrieved_nodes = retriever.retrieve("What are the sports teams in Toronto?")
print(retrieved_nodes[0].rating)
print(retrieved_nodes[0].node.get_text())The retriever will retrieve a set of relevant nodes for a given index.

Note that the LLM returns relevance scores along with the document text:

8.0
Toronto ( (listen) tə-RON-toh; locally [təˈɹɒɾ̃ə] or [ˈtɹɒɾ̃ə]) is the capital city of the Canadian province of Ontario. With a recorded population of two,794,356 in 2021, it's essentially the most populous city in Canada...

We may also use the index as a part of an overall query engine, to not only retrieve the relevant context, but additionally synthesize a response to a given query. We are able to do that through each the high-level API in addition to lower-level API.

query_engine = doc_summary_index.as_query_engine(
response_mode="tree_summarize", use_async=True
)
response = query_engine.query("What are the sports teams in Toronto?")
print(response)

# use retriever as a part of a question engine
from llama_index.query_engine import RetrieverQueryEngine# configure response synthesizer
response_synthesizer = ResponseSynthesizer.from_args()
# assemble query engine
query_engine = RetrieverQueryEngine(
retriever=retriever,
response_synthesizer=response_synthesizer,
)
# query
response = query_engine.query("What are the sports teams in Toronto?")
print(response)

The approach of autosummarization over any piece of text is basically exciting. We’re excited to develop extensions in two areas:

Proceed exploring autosummarization in several layers. Currently it’s on the doc-level, but what about summarizing an enormous text chunk right into a smaller one? (e.g. a one-liner).
Proceed exploring LLM-based retrieval, which summarization helps to unlock.

Also we’re sharing the instance guide/notebook below in case you missed it above:

Document Summary Guide

Notebook Link