RAG

Agentic RAG Failure Modes: Retrieval Thrash, Tool Storms, and Context Bloat (and How you can Spot Them Early)

fails in predictable ways. Retrieval returns bad chunks; the model hallucinates. You fix your chunking and move on. The debugging surface is small since the architecture is straightforward: retrieve once, generate once, done. Agentic...

Beyond Prompt Caching: 5 More Things You Should Cache in RAG Pipelines

, we talked intimately about what Prompt Caching is in LLMs and the way it might prevent loads of time and money when running AI-powered apps with high traffic. But other than Prompt Caching,...

Introducing Gemini Embeddings 2 Preview

a preview version of its latest embedding model. This model is notable for one important reason. It will possibly embed text, PDFs, images, audio, and video, making it a one-stop shop for embedding...

Why Care About Prompt Caching in LLMs?

, we’ve talked lots about what an incredible tool RAG is for leveraging the facility of AI on custom data. But, whether we're talking about plain LLM API requests, RAG applications, or more complex...

The way to Construct Agentic RAG with Hybrid Search

, also often known as RAG, is a strong method to seek out relevant documents in a corpus of knowledge, which you then provide to an LLM to offer answers to user questions. Traditionally, RAG...

Scaling Vector Search: Comparing Quantization and Matryoshka Embeddings for 80% Cost Reduction

is on the core of AI infrastructure, powering multiple AI features from Retrieval-Augmented Generation (RAG) to agentic skills and long-term memory. Consequently, the demand for indexing large datasets is growing rapidly. For engineering...

Understanding Context and Contextual Retrieval in RAG

In my latest post, I how hybrid search will be utilised to significantly improve the effectiveness of a RAG pipeline. RAG, in its basic version, using just semantic search on embeddings, will be...

RAG with Hybrid Search: How Does Keyword Search Work?

, I’ve talked quite a bit about Reterival Augmented Generation (RAG). Specifically, I’ve covered the fundamentals of the RAG methodology, in addition to a bunch of relevant concepts, like chunking, embeddings, reranking, and retrieval...

Recent posts

Popular categories

ASK ANA