Is RAG Dead? The Rise of Context Engineering and Semantic Layers for Agentic AI

-

Introduction

Retrieval-Augmented Generation (RAG) could have been obligatory for the primary wave of enterprise AI, but it surely’s quickly evolving into something much larger. Over the past two years, organizations have realized that simply retrieving text snippets using vector search isn’t enough. Context needs to be governed, explainable, and adaptive to an agent’s purpose.

This post explores how that evolution is taking shape and what it means for data and AI leaders constructing systems that may reason responsibly.

You’ll come away with answers to just a few key questions:

How do knowledge graphs improve RAG?

They supply structure and intending to enterprise data, linking entities and relationships across documents and databases to make retrieval more accurate and explainable for each humans and machines.

How do semantic layers help LLMs retrieve higher answers?

Semantic layers standardize data definitions and governance policies so AI agents can understand, retrieve, and reason over every kind of information in addition to AI tools, memories, and other agents.

How is RAG evolving within the age of agentic AI?

Retrieval is becoming one step in a broader reasoning loop (increasingly being called “context engineering”) where agents dynamically write, compress, isolate, and choose context across data and tools.

TL;DR

(RAG) rose to prominence following the launch of ChatGPT and the conclusion that there’s a limit on the context window: you’ll be able to’t just copy all of your data into the chat interface. Teams used RAG, and its variants like GraphRAG (RAG using a graph database) to bring additional context into prompts at query time. RAG’s popularity soon exposed its weaknesses: putting incorrect, irrelevant, or simply an excessive amount of information into the context window can actually degrade reasonably than improve results. Recent techniques like re-rankers were developed to beat those limitations but RAG wasn’t built to survive in the brand new agentic world. 

As AI shifts from single prompts to autonomous agents, retrieval and its variants are only one tool in an agent’s toolbelt, alongside writing, compressing, and isolating context. Because the complexity of workflows and the knowledge required to finish those workflows grows, retrieval will proceed to evolve (though it could be called context engineering, RAG 2.0, or agentic retrieval). The subsequent era of retrieval (or context engineering) would require metadata management across data structures (not only relational) in addition to tools, memories, and agents themselves. We are going to evaluate retrieval not only for accuracy but in addition relevance, groundedness, provenance, coverage, and recency. Knowledge graphs can be key for retrieval that’s context-aware, policy-aware, and semantically grounded.

The Rise of RAG

What’s RAG?

Shortly after ChatGPT went mainstream in November 2022, users realized that LLMs weren’t (hopefully) trained on their very own data. To bridge that gap, teams began developing ways to retrieve relevant data at query time to reinforce the prompt – an approach referred to as retrieval-augmented generation (RAG). The term got here from a 2020 Meta paper, but the recognition of the GPT models brought the term and the practice into the limelight. 

Tools like LangChain and LlamaIndex helped developers construct these retrieval pipelines. LangChain was launched at around the identical time as ChatGPT as a way of chaining different components like prompt templates, LLMs, agents, and memory together for generative AI applications. LlamaIndex was also launched concurrently a method to address the limited context window in GPT3 and thus enabling RAG. As developers experimented, they realized that vector databases provide a quick and scalable method to power the retrieval a part of RAG, and vector databases like Weaviate, Pinecone, and Chroma grow to be standard parts of the RAG architecture. 

What’s GraphRAG?

One variation of RAG became especially popular: GraphRAG. The concept here is that the underlying data to complement LLM prompts is stored in a knowledge graph. This enables the model to reason over entities and relationships reasonably than flat text chunks. In early 2023, researchers began publishing papers exploring how knowledge graphs and LLMs could complement one another. In late 2023, Juan Sequeda, Dean Allemang, and Bryon Jacob from data.world released a paper demonstrating how knowledge graphs can improve LLM accuracy and explainability. In July 2024, Microsoft open-sourced its GraphRAG framework, which made graph-based retrieval accessible to a wider developer audience and solidified GraphRAG as a recognizable category inside RAG. 

The rise of GraphRAG reignited interest in knowledge graphs akin to when Google launched its Knowledge Graph in 2012. The sudden demand for structured context and explainable retrieval gave them latest relevance. 

From 2023–2025, the market responded quickly:

  • January 23, 2023 – Digital Science acquired metaphacts, creators of the metaphactory platform: “a platform that supports customers in accelerating their adoption of information graphs and driving knowledge democratization.” 
  • February 7, 2023 – Progress acquired MarkLogic in February of 2023. MarkLogic is a multimodal NoSQL database, with a specific strength in managing RDF data, the core data format for graph technology.
  • July 18, 2024 – Samsung acquired Oxford Semantic Technologies, makers of the RDFox graph database, to power on-device reasoning and private knowledge capabilities.  
  • October 23, 2024 – Ontotext and Semantic Web Company merged to form Graphwise, explicitly positioning around GraphRAG. “The announcement is important for the graph industry, because it elevates Graphwise as essentially the most comprehensive knowledge graph AI organization and establishes a transparent path towards democratizing the evolution of Graph RAG as a category.” 
  • May 7, 2025 – ServiceNow announced its acquisition of information.world, integrating a graph-based data catalog and semantic layer into its enterprise workflow platform.

These are only the events related to knowledge graph and related semantic technology. If we expand this to incorporate metadata management and/or semantic layers more broadly then there are more deals, most notably the $8 billion acquisition of metadata leader Informatica by Salesforce. 

These moves mark a transparent shift: knowledge graphs aren’t any longer just metadata management tools—they’ve grow to be the semantic backbone for AI and closer to their origins as expert systems. GraphRAG made knowledge graphs relevant again by giving them a critical role in retrieval, reasoning, and explainability.

In my day job because the product lead for a semantic data and AI company, we work to resolve the gap between data and its actual meaning for among the world’s biggest firms. Making their data AI-ready is a combination of constructing it interoperable, discoverable, and usable so it could feed LLMs contextually relevant information with a purpose to produce protected, accurate results. This is not any small order for giant, highly regulated, and complicated enterprises managing exponential amounts of information. 

The autumn of RAG and the rise of context engineering

Is RAG dead? No, but it surely has evolved. The unique version of RAG relied on a single dense vector search and took the highest results to feed directly into an LLM. GraphRAG built on this by adding in some graph analytics and entity and/or relationship filters. Those implementations almost immediately bumped into constraints around relevance, scalability, and noise. These constraints pushed RAG forward into latest evolutions known by many names: agentic retrieval, RAG 2.0, and most recently, context engineering. The unique, naive implementation is basically dead, but its descendants are thriving and the term itself remains to be incredibly popular. 

Following the RAG hype cycle in 2024, there was inevitable disillusionment. While it is feasible to construct a RAG demo in minutes, and plenty of people did, getting your app to scale in an enterprise becomes quite a bit dicier. “People think that RAG is straightforward because you’ll be able to construct a pleasant RAG demo on a single document in a short time now and it is going to be pretty nice. But getting this to really work at scale on real world data where you’ve gotten enterprise constraints is a really different problem,” said Douwe Kiela of Contextual AI and one in every of the authors of the unique RAG paper from Meta in 2020.

One issue with scaling a RAG app is the amount of information needed at retrieval time. “I believe the difficulty that folks get into with it’s scaling it up. It’s great on 100 documents, but now impulsively I even have to go to 100,000 or 1,000,000 documents” says Rajiv Shah. But as LLMs matured, their context windows grew. The dimensions of context windows was the unique pain point that RAG was built to handle, raising the query if RAG remains to be obligatory or useful. As Dr. Sebastian Gehrmann from Bloomberg points out, “If I’m in a position to just paste in additional documents or more context, I don’t have to depend on as many tricks to narrow down the context window. I can just depend on the big language model. There’s a tradeoff here though” he notes, “where longer context normally comes at a value of significantly increased latency and price.”

It isn’t just cost and latency that you simply risk by arbitrarily dumping more information into the context window, you may as well degrade performance. RAG can improve responses from LLMs, If the context will not be relevant, you’ll be able to worsen results, something called “context poisoning” or “context clash”, where misleading or contradictory information contaminates the reasoning process. Even if you happen to are retrieving relevant context, you’ll be able to overwhelm the model with sheer volume, resulting in “context confusion” or “context distraction.” While terminology varies, multiple studies show that model accuracy tends to say no beyond a certain context size. This was present in a Databricks paper back in August of 2024 and reinforced through recent research from Chroma, something they termed “context rot”. Drew Breuning’s post usefully categorizes these issues as distinct “context fails”.

To deal with the issue of overwhelming the model, or providing incorrect, or irrelevant information, re-rankers have grown in popularity. As Nikolaos Vasiloglou from RelationalAI states, “a re-ranker is, after you bring the facts, how do you choose what to maintain and what to throw away, [and that] has a big effect.” Popular re-rankers are Cohere Rerank, Voyage AI Rerank, Jina Reranker, and BGE Reranker. Re-ranking will not be enough in today’s agentic world. The latest generation of RAG has grow to be embedded into agents–something increasingly referred to as context engineering. 

What’s Context Engineering?

I need to deal with context engineering for 2 reasons: the originators of the terms RAG 2.0 and Agentic Retrieval (Contextual AI and LlamaIndex, respectively) have began using the term context engineering; and it’s a way more popular term based on Google search trends. Context engineering may also be regarded as an evolution of prompt engineering. Prompt engineering is about crafting a prompt in a way that gets you the outcomes you wish, context engineering is about supplementing that prompt with the suitable context  

RAG grew to prominence in 2023, eons ago within the timeline of AI. Since then, the whole lot has grow to be ‘agentic’. RAG was created under the idea that the prompt can be generated by a human, and the response can be read by a human. With agents, we want to rethink how this works. Lance Martin breaks down context engineering into 4 categories: write, compress, isolate, and choose. Agents have to write (or persist or remember) information from task to task, similar to humans. Agents will often have an excessive amount of context as they go from task to task and want to compress or condense it one way or the other, normally through summarization or ‘pruning’. Fairly than giving all the context to the model, we will isolate it or split it across agents in order that they can, as Anthropic describes it, “explore different parts of the issue concurrently”. Fairly than risk context rot and degraded results, the concept here is to not give the LLM enough rope to hold itself. 

Agents must use their memories when needed or call upon tools to retrieve additional information, i.e. they should select (retrieve) what context to make use of. One among those tools could possibly be vector-based retrieval i.e. traditional RAG. But that is only one tool within the agent’s toolbox. As Mark Brooker from AWS put it, “I do expect what we’re going to see is among the flashy newness around vector type of calm down and us go to a world where we’ve this latest tool in our toolbox, but a whole lot of the agents we’re constructing are using relational interfaces. They’re using those document interfaces. They’re using lookup by primary key, lookup by secondary index. They’re using lookup by geo. All of these items which have existed within the database space for many years, now we even have this yet another, which is kinda lookup by semantic meaning, which could be very exciting and latest and powerful.”

Those on the forefront are already doing this. Martin quotes Varun Mohan of Windsurf who says, “we […] depend on a mix of techniques like grep/file search, knowledge graph based retrieval, and … a re-ranking step where [context] is ranked so as of relevance.”

Naive RAG could also be dead, and we’re still determining what to call the trendy implementations, but one thing seems certain: the long run of retrieval is vibrant. How can we ensure agents are in a position to retrieve different datasets across an enterprise? From relational data to documents? The reply is increasingly being called the semantic layer. 

Context engineering needs a semantic layer

What’s a Semantic Layer?

There’s a recent push from those within the relational data world to construct a semantic layer over relational data. Snowflake even created an Open Semantic Interchange (OSI) initiative to try and standardize the best way firms are documenting their data to make it ready for AI. 

But focusing solely on relational data is a narrow view of semantics. What about unstructured data and semi-structured data? That’s the sort of information that giant language models excel at and what began all of the RAG rage. If only there was a precedent for retrieving relevant search results across a ton of unstructured data 🤔.

Google has been retrieving relevant information across the complete web for many years using structured data. By structured data, here, I mean machine-readable metadata, or as Google describes it, “a standardized format for providing details about a page and classifying the page content.” Librarians, information scientists, and search engine optimisation practitioners have been tackling the unstructured data retrieval problem through knowledge organization, information retrieval, structured metadata, and Semantic Web technologies. Their methods for describing, linking, and governing unstructured data underpin today’s search and discovery systems, each publicly and on the enterprise. The longer term of the semantic layer will bridge the relational and the structured data worlds by combining the rigor of relational data management with the contextual richness of library sciences and knowledge graphs. 

Image by Writer

The longer term of RAG

Listed here are my predictions for the long run of RAG. 

RAG will proceed to evolve into more agentic patterns. Which means retrieval of context is only one a part of a reasoning loop which also includes writing, compressing, and isolating context. Retrieval becomes an iterative process, reasonably than one-shot. Anthropic’s Model Context Protocol (MCP) treats retrieval as a tool that may be given via MCP to an agent. OpenAI offers File search as a tool that agents can call. LangChain’s agent framework LangGraph enables you to construct agents using a node and edge pattern (like a graph). Of their quickstart guide here, you’ll be able to see that retrieval (on this case an internet search) is just one in every of the tools that the agent may be given to do its job. Here they list retrieval as one in every of the actions an agent or workflow can take. Wikidata also has an MCP that allows users to interact directly with public data. 

Retrieval will broaden and include every kind of information (aka multimodal retrieval): relational, content, after which images, audio, geodata, and video. LlamaIndex offers 4 ‘retrieval modes’: chunks, files_via_metadata, files_via_content, auto_routed. Additionally they offer composite retrieval, allowing you to retrieve from multiple sources without delay. Snowflake offers Cortex Search for content and Cortex Analyst for relational data. LangChain offers retrievers over relational data, graph data (Neo4j), lexical, and vector.  

Retrieval will broaden to incorporate metadata about tools themselves, in addition to “memories”. Anthropic’s MCP standardized how agents call tools using a registry of tools i.e. tool metadata. OpenAI, LangChain, LlamaIndex, AWS Bedrock, Azure, Snowflake, and Databricks all have capabilities for managing tools, some via MCP directly, others via their very own registries. On the memory side, each LlamaIndex and LangChain treat memories as retrievable data (short term and long run) that agents can query during workflows. Projects like Cognee push this further with dedicated, queryable agent memory.

Knowledge graphs will play a key role as a metadata layer between relational and unstructured data, replacing the narrow definition of semantic layer currently in use with a more robust metadata management framework. The market consolidation we’ve seen over the past couple years and described above, I imagine, is a sign of the market’s growing acknowledgement that knowledge graphs and metadata management are going to be crucial as agents are asked to do more complicated tasks across enterprise data. Gartner’s May 2025 report recommends data engineering teams adopt semantic techniques (akin to ontologies and knowledge graphs) to support AI use cases. Knowledge graphs, metadata management, and reference data management are already ubiquitous in large life sciences and financial services firms, largely because they’re highly regulated and require fact-based, grounded data to power their AI initiatives. Other industries are going to begin adopting the tried and true methods of semantic technology as their use cases grow to be more mature and require explainable answers. 

Evaluation metrics on context retrieval will gain popularity. Ragas, Databricks Mosaic AI Agent Evaluation, and TruLens all provide frameworks for evaluating RAG. Evidently offers open source libraries and instructional material on RAG evaluation. LangChain’s evaluation product LangSmith has a module focused on RAG. What is vital is that these frameworks aren’t just evaluating the accuracy of the reply given the prompt, they evaluate context relevance and groundedness (how well the response is supported by the context). Some vendors are constructing out metrics to judge provenance (citations and sourcing) of the retrieved context, coverage (did we retrieve enough?) and freshness or recency.  

Policy-as-code guardrails ensure retrieval respects access control, policies, regulations, and best practices. Snowflake and Databricks enable row level access control and column masking already. Policy engines like Open Policy Agent (OPA) and Oso are embedding access control into agentic workflows. As Dr. Sebastian Gehrmann of Bloomberg has found, “RAG will not be necessarily safer,” and may introduce latest governance risks. I expect the necessity for guardrails to grow to incorporate more complicated governance rules (beyond access control), policy requirements, and best practices.

Conclusion

RAG was never the top goal, just the start line. As we move into the agentic era, retrieval is evolving into an element of a full discipline: context engineering. Agents don’t just need to search out documents; they need to grasp which data, tools, and memories are relevant for every step of their reasoning. This understanding requires a semantic layer–a method to understand, retrieve, and govern over the complete enterprise. Knowledge graphs, ontologies, and semantic models will provide that connective tissue. The subsequent generation of retrieval won’t just be about speed and accuracy; it is going to even be about explainability and trust. The longer term of RAG will not be retrieval alone, but retrieval that’s context-aware, policy-aware, and semantically grounded. 

In regards to the writer: Steve Hedden is the Head of Product Management at TopQuadrant, where he leads the strategy for TopBraid EDG, a platform for knowledge graph and metadata management. His work focuses on bridging enterprise data governance and AI through ontologies, taxonomies, and semantic technologies. Steve writes and speaks recurrently about knowledge graphs, and the evolving role of semantics in AI systems.

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x