GenAI
Constructing Retrieval-Augmented Generation systems, or RAGs, is simple. With tools like LamaIndex or LangChain, you’ll be able to get your RAG-based Large Language Model up and running very quickly. Sure, some engineering effort is required to make sure the system is efficient and scales well, but in principle, constructing the RAG is the straightforward part. What’s rather more difficult is designing it well.
Having recently passed through the method myself, I discovered what number of big and small design selections have to be made for a Retrieval-Augmented Generation system. Each of them can potentially impact the performance, behavior, and value of your RAG-based LLM, sometimes in non-obvious ways.
Without further ado, let me present this — on no account exhaustive yet hopefully useful — list of RAG design selections. Let it guide your design efforts.
Retrieval-Augmented Generation gives a chatbot access to some external data in order that it might answer users’ questions based on this data somewhat than general knowledge or its own dreamed-up hallucinations.
As such, RAG systems can change into complex: we’d like to get the info, parse it to a chatbot-friendly format, make it available and searchable to the LLM, and eventually be sure that the chatbot is making the proper use of the info it was given access to.
I prefer to take into consideration RAG systems by way of the components they’re fabricated from. There are five most important pieces to the puzzle:
- Indexing: Embedding external data right into a vector representation.
- Storing: Persisting the indexed embeddings in a database.
- Retrieval: Finding relevant pieces within the stored data.
- Synthesis: Generating answers to user’s queries.
- Evaluation: Quantifying how good the RAG system is.
In the rest of this text, we’ll undergo the five RAG components one after the other, discussing the design selections, their implications and trade-offs, and a few useful resources helping to make the choice.