Enterprise data is inherently complex: real-world documents are multimodal, spanning text, tables, charts and graphs, images, diagrams, scanned pages, forms, and embedded metadata. Financial reports carry critical insights in tables, engineering manuals depend on diagrams, and legal documents often include annotated or scanned content.
Retrieval-augmented generation (RAG) was created to ground LLMs in trusted enterprise knowledge—retrieving relevant source data at query time to cut back hallucinations and improve accuracy. But when a RAG system processes only surrounding text, it misses key signals embedded in tables, charts, and diagrams—leading to incomplete or incorrect answers.
An intelligent agent is just nearly as good as the information foundation it’s built on. Modern RAG must due to this fact be inherently multimodal—capable of understand each visual and textual context to realize enterprise-grade accuracy. The NVIDIA Enterprise RAG Blueprint is built for this, providing a modular reference architecture that connects unstructured enterprise data to the intelligent systems built on top of it.
The blueprint also serves as a foundational layer for the NVIDIA AI Data Platform, helping to bridge the normal gap between compute and data. By enabling retrieval and reasoning closer to the information layer, it preserves governance, reduces operational friction, and makes enterprise knowledge immediately usable by intelligent systems. The result’s a contemporary AI data stack—storage that may retrieve, enrich, and reason alongside your models.
While the Enterprise RAG Blueprint provides many configurable options, this post highlights the next five key configurations that the majority directly improve accuracy and contextual relevance across enterprise use cases:
- Baseline multimodal RAG pipeline
- Reasoning
- Query decomposition
- Filtering metadata for faster and precise retrieval
- Visual reasoning for multimodal data
The post also explains how the blueprint might be embedded into AI data platforms to remodel traditional repositories into AI-ready knowledge systems.
Accuracy metrics on this blog are measured using the RAGAS framework, using well-known public datasets. Learn more about evaluating your NVIDIA RAG Blueprint system.
1. Document ingestion and understanding
Before an agent can deliver insights, it have to be perfectly grounded in your data. This foundational configuration focuses on intelligent document ingestion and core RAG functionality.
The Enterprise RAG Blueprint uses NVIDIA Nemotron RAG models to extract multimodal enterprise content—text, tables, charts and graphs, and infographics—then embeds that content into text for indexing in a vector database. At query time, the blueprint runs semantic retrieval, reranking, and Nemotron LLM to generate a grounded answer.
To maximise performance, this baseline intentionally avoids image captioning and heavy reasoning, making it the perfect place to begin for production deployments. Deploy this baseline on Docker.
Advantages of document ingestion and understanding
This foundational configuration is the blueprint’s highest-efficiency pipeline, optimized for accuracy and throughput while keeping GPU cost and time to first token (TTFT) low. This configuration establishes your baseline performance for retrieval quality and LLM grounding.


Table 1 summarizes the general impact across a number of datasets.
2. Reasoning
While you activate reasoning within the RAG blueprint, you enable the LLM to interpret the retrieved evidence, and synthesize logically grounded answers. That is the best change to get an accuracy boost for a lot of applications. Enable reasoning for the NVIDIA Enterprise RAG Blueprint.
Table 2 summarizes the general impact across several sample datasets.
| Accuracy (v2.3 Default) plus Reasoning MM = Multimodal, TO = Text-Only |
|||
| Dataset | Type | Reasoning on | Default |
| RAG Battle | MM | 0.85 | 0.809 |
| KG RAG | MM | 0.58 | 0.565 |
| FinanceBench | MM | 0.69 | 0.633 |
| BO767 | MM | 0.88 | 0.91 |
Advantages of reasoning
For any use case involving mathematical operations or complex data comparison, a typical easy similarity or hybrid search won’t suffice. Reasoning is required to correct errors and ensure precise contextual understanding. Accuracy improvements across datasets averaged ~5%, with several cases demonstrating dramatic reasoning-driven corrections.
Examples
Within the FinanceBench dataset, the baseline configuration incorrectly computed the Adobe FY2017 operating money flow ratio as 2.91. After enabling reasoning, the model produced the proper answer, 0.83. As well as, the Ragbattle dataset demonstrates the accuracy improvement from enabling VLM.
3. Query decomposition
Answering complex user questions often requires pulling facts from multiple places in the information foundation. Query decomposition breaks a single query into smaller subqueries, retrieves evidence for every, and recombines the outcomes into an entire, grounded response. Activate query decomposition for the NVIDIA Enterprise RAG Blueprint.


Advantages of query decomposition
Query decomposition significantly improves accuracy for multihop and context-rich questions that span multiple paragraphs or documents. It does add extra LLM calls (increasing latency and value), however the accuracy gains are sometimes price it for mission-critical enterprise use cases. Query decomposition may also be paired with reasoning for an extra boost when needed.
Example
As NVIDIA AI Data platform partners evolve to supply more relevant and accurate retrieval, this feature can either include some level of query processing as a part of the information platform or might be left to the agent. Learn more about how query decomposition might be an approach in some use cases.
Table 3 shows the general impact across a number of datasets.
Metadata, comparable to writer, date, category, and security tags, has all the time been integral to enterprise data. In RAG pipelines, metadata filters might be leveraged to narrow the search space and align retrieved content with the precise context, significantly improving retrieval precision and speed.
The RAG blueprint supports custom metadata ingestion and automatic query generation based on that data. To leverage your custom metadata, see Advanced Metadata Filtering with Natural Language Generation. To learn more about what’s possible with this feature set, take a look at the instance notebook on the NVIDIA-AI-Blueprints/rag GitHub repo.
Advantages of metadata filtering
Metadata filtering narrows the search space for faster retrieval and improves precision by aligning retrieved content with context. This enables developers to leverage metadata without manual filter logic to realize higher throughput and contextual relevance. When metadata filtering capabilities are embedded directly into AI data platforms, it could possibly make your storage smarter, resulting in faster retrieval and lower latency.
Example
To supply an example, consider two documents which might be ingested with the next metadata:
custom_metadata = [
{
"filename": "ai_guide.pdf",
"metadata": {
"category": "AI",
"priority": 8,
"rating": 4.5,
"tags": ["machine-learning", "neural-networks"],
"created_date": "2024-01-15T10:30:00"
}
},
{
"filename": "engineering_manual.pdf",
"metadata": {
"category": "engineering",
"priority": 5,
"rating": 3.8,
"tags": ["hardware", "design"],
"created_date": "2023-12-20T14:00:00"
}
}
When using metadata with dynamic filter expression, a question comparable to, “Show me high-rated AI documents with machine learning tags created after January 2024” will translate to at least one that mechanically generates a filtering expression comparable to:
filter_expression = `content_metadata["category"] == "AI" and content_metadata["rating"] >= 4.0 and
array_contains(content_metadata["tags"], "machine-learning") and content_metadata["created_date"] >= "2024-01-01”`
With metadata filtering enabled, the system retrieved 10 focused citations from one document, ai_guide.pdf, achieving 100% precision on the goal domain while reducing search space by 50%.
5. Visual reasoning for multimodal data
Enterprise data is visually wealthy. Where traditional text-only embeddings fall short, vision language models (VLMs) comparable to NVIDIA Nemotron Nano 2 VL (12B) introduce visual reasoning into the pipeline. Learn more about easy methods to leverage a VLM for generation within the RAG Blueprint.


Advantages of visual reasoning
Visual reasoning is crucial for handling real-world enterprise documents. Integrating a VLM within the generation pathway enables the RAG system to interpret images, charts, and infographics, making it possible to accurately answer queries where the knowledge lies in a structured visual element reasonably than simply the encircling text.
Example
A big accuracy improvement was observed when a VLM was enabled for the Ragbattle dataset within the RAG Blueprint, especially when the reply was in a visible element. Note that enabling VLM inference can increase response latency from additional image processing. Consider this tradeoff between accuracy and speed based in your requirements. Learn more in regards to the accuracy improvements with VLM for the Ragbattle dataset.
Transforming enterprise storage into an lively knowledge system
The Enterprise RAG Blueprint demonstrates how the progressive adoption of those five capabilities—from reasoning and metadata-driven retrieval to multimodal understanding—directly enhances the accuracy and groundedness of your intelligent agents. Each capability offers a novel balance between latency, token cost, and contextual precision, providing a versatile, tunable framework that might be adopted to numerous enterprise use cases.
This accelerates the evolution of the information foundation itself. The NVIDIA AI Data Platform transforms enterprise data into AI-searchable knowledge. As NVIDIA partners evolve their storage offerings, this blueprint serves as a reference for delivering embedded RAG capabilities that leverage metadata to implement permissions, track changes, and supply highly accurate retrieval directly on the storage layer.
NVIDIA storage partners are constructing AI data platforms based on the NVIDIA reference design which might be transforming enterprise storage from a passive repository to turn into an lively intelligent system within the AI workflow. The result’s a next-generation enterprise data infrastructure: faster, smarter, and purpose-built for the age of generative AI.
What’s latest with the NVIDIA Enterprise RAG Blueprint
The newest release of the NVIDIA EnterpriseRAG Blueprint deepens its give attention to serving agentic workflows. It introduces first-class document-level summarization with each shallow and deep strategies, enabling agents to quickly assess relevance, narrow search space, and balance accuracy with latency. A brand new data catalog improves discoverability and governance across large corpora, while upgrades to the best-in-class Nemotron RAG models further enhance retrieval quality, reasoning, and generation performance—making RAG a more efficient, agent-ready foundation for enterprise-scale knowledge systems.
Start with enterprise-grade RAG
Able to integrate these five capabilities into your RAG use cases? Access the modular code, documentation, and evaluation notebooks free of charge throughout the NVIDIA Enterprise RAG Blueprint.
Make your enterprise data AI-ready and transform your production data into an intelligent knowledge system with embedded RAG capabilities with NVIDIA AI Data Platform. Contact an NVIDIA AI storage partner to start together with your own NVIDIA-powered AI data platform.
