Proxy-Pointer RAG: Achieving Vectorless Accuracy at Vector RAG Scale and Cost

launch of PageIndex recently, is a component of a broader shift in AI architecture toward “Vectorless RAG” or “Reasoning-Based Retrieval.” As a substitute of the usual approach to splitting documents into random chunks and searching via mathematical similarity, PageIndex builds a “Smart Table of Contents” (a hierarchical tree) that permits LLMs to navigate documents like a human expert would. Quite a few blogs (including this one from Microsoft), outline the working principles (no vector database, no chunking, enhanced explainability) together with 98.7% accuracy achieved on a financial benchmark. Nevertheless, also they are careful to notice that Vectorless RAG is best suited to deep-dive queries on complex structured or semi-structured documents (reminiscent of financial statements), than searching across many independent documents, reminiscent of customer support knowledge bases, where we should always proceed to make use of Vector RAG.

Why is that?

If Vectorless RAG using PageIndex provides higher (or at the least pretty much as good) results on almost any query, why not use it for a big collection of documents. The first reason is that PageIndex’s tree-based approach cannot practically scale to multi-document scenarios. The hierarchical tree index that may be a pre-requisite ingestion step is slow and expensive to construct using a LLM. Moreover, the retrieval is a 2 step process: use an LLM to walk the tree, locate essentially the most relevant nodes, then use the content of those nodes as context for the response synthesis step using the LLM.

Compared, constructing a vector index is fast and cheap, and the retrieval step uses a LLM just once in the course of the synthesis step. Also, Ingestion using an embedding model costs much lower than summarization of the total document by an LLM.

What if you happen to could get the wonderful structure-aware reasoning accuracy of Vectorless RAG, together with the low latency and price of a Vector RAG, in a way that’s scalable across the enterprise database? In this text, I’ll walk through an actual use case on a big, complex document to construct Proxy-Pointer RAG—an ingestion and retrieval pipeline that achieves this through a set of novel engineering steps. Along the best way, we’ll explore and display the next:

Why exactly is PageIndex so accurate? And why it’s difficult to practically scale the concept to multi-document knowledge bases.
A fast comparison of Vectorless RAG using PageIndex vs Flat Vector RAG to ascertain a baseline.
How can we incorporate the principles of PageIndex right into a Vector index with not one of the associated latency and price?
Comparison of a wide selection of queries using PageIndex and Proxy-Pointer to check the standard of retrievals.

Use Case Setup

We are going to use a World Bank report named South Asia Development Update, April 2024: Jobs for Resilience (License: CC BY 3.0 IGO). This can be a 131 page report comprising multiple chapters, complex charts, tables, content in boxes etc. and is a great candidate for PageIndex to prove its capability. I even have used gemini-3-flash because the LLM to construct the pageindex tree and gemini-3.1-flash-lite for retrievals. I extracted the report pdf to a markdown file using the Adobe PDF Extract API, but every other method reminiscent of using a VLM which preserves the integrity of the tables, charts etc would work just as well. For vector database, is used.

How does PageIndex work?

As a substitute of the pipeline of a Vector RAG, PageIndex takes a radically different approach to document retrieval. As a substitute of treating a document as a flat sequence of chunks, it builds a semantic skeleton tree — a hierarchical map of each section, sub-section, and content block within the document — after which uses an LLM to that tree at query time.

Phase 1: Indexing(once per document)

PageIndex parses the document’s heading structure (Markdown headers, PDF outlines, etc.) right into a nested tree. Each node gets:

A title (extracted from the heading)
A node ID (unique identifier like 0012)
Line boundaries (start and end line within the source document)
A summary (generated by an LLM — that is the expensive and time-consuming part)

The result’s a JSON that appears like this:

{
  "node_id": "0011",
  "title": "Chapter 1. Deceptive Strength",
  "summary": "Covers South Asia's growth outlook, inflation trends, financial vulnerabilities, climate risks, and policy challenges...",
  "line_num": 621,
  "nodes": [
    {
      "node_id": "0012",
      "title": "Introduction",
      "summary": "Summarizes the chapter's key themes including regional growth driven by India...",
      "line_num": 625
    },
    ...
  ]
}

Phase 2: Retrieval (Per Query)

When a user asks a matter, PageIndex hands the to an LLM and says, “Which nodes contain the reply?”. That is unlike a Vector RAG which relies on mathematical similarity between query and chunk embeddings to construct the relevant context.

The LLM reads the summaries — not the total text — and returns a brief list of node IDs. PageIndex then uses the road boundaries to slice the from the unique markdown file and passes it to the synthesis LLM.

Why this works so well?

PageIndex excels due to three architectural benefits:

1. Structural Navigation, Not Pattern Matching

Once we ask , PageIndex doesn’t seek for chunks containing those words. It reads the summary of node 0011 (“Chapter 1. Deceptive Strength”) which says — and immediately knows that is the proper node. It reasons about , not .

2. Contiguous Context Extraction

Once the proper nodes are identified, PageIndex extracts the total, unbroken section that the node represents, from the unique Markdown — headers, sub-headers, bullet points, figure references, and all. The synthesis LLM receives context that reads like a properly authored document section, not a fragmented chunk with arbitrary boundaries.

3. Zero Chunk Boundary Artifacts

There aren’t any overlapping chunks, no split sentences, no context windows that start mid-paragraph. Each piece of context has a natural starting (the section header) and a natural end (the subsequent section’s start). This dramatically reduces hallucination from ambiguous context.

Nevertheless, this strength comes at a price—and that is where the approach starts to interrupt down.

Why that is difficult to scale?

The explanations are Cost and Latency (ingestion in addition to retrieval time). For our 131-page report, we get 137 structural nodes within the tree and due to this fact, PageIndex needs 137 LLM calls only for the indexing phase — one summary per node. For a corpus of fifty documents, that’s ~7,000 LLM calls before we’ve used it to reply a single query.

Here’s a comparative cost of PageIndex vs Vector Index:

Metric	PageIndex	Standard Vector RAG
Indexing LLM calls	~137 per document	0
Indexing time	5-10 minutes per document	30 seconds per document
Indexing cost	$$$ (LLM API fees)	$ (embedding API only)
Multi-document support	Requires per-document tree traversal during retrieval	Single unified index, fast and efficient
Retrieval quality	★★★★★	★★★☆☆

For a single critical document reminiscent of a financial report, this cost is definitely justified. For a 500-document enterprise knowledge base on customer support SOPs that updates weekly? It’s a non-starter.

That is where PageIndex starts becoming impractical for real systems at scale.

Before proposing an alternate, let’s establish a baseline: how significantly better is PageIndex compared to straightforward Vector RAG?

Comparison of Vectorless vs Flat Vector RAG

To determine a baseline, I compared the outcomes of a various range of queries using Vectorless and flat Vector RAG. As expected, the standard of Vectorless RAG was higher in all cases. Listed below are two examples:

Query about table content

Here is the response from Vectorless RAG:

=== VECTORLESS RAG INTERACTION ===
Query: what does annexure table a2.1.1 say
Nodes Retrieved: 0125, 0131
Response:
Annex Table A2.1.1, titled "Studies included within the literature review," provides a comprehensive list of the tutorial and policy papers utilized to estimate the expansion elasticity of employment (Annex Figure A2.1.1; Annex 2.1).

The table catalogs 25 specific studies, detailing the next information for every:
*   **Paper:** The writer(s) and publication source.
*   **Period of primary sample:** The timeframe covered by the study (starting from the Nineteen Sixties through 2019).
*   **Sample coverage:** The geographical or organizational scope (e.g., Worldwide, Europe, South Asia, Africa, Asia).
*   **Region:** The particular region analyzed.
*   **Economies:** The particular countries included within the study sample.

In response to the provided text, the articles listed on this table were chosen based on specific criteria: they concentrate on growth elasticity of employment using country-level estimates. The choice excludes articles that present exclusively subnational, gender-specific, or industry/sector-specific estimates, in addition to those focused solely on Okun's law and unemployment elasticity of growth (Annex 2.1).

And the identical using Vector RAG, which completely misses the content

=== VECTOR RAG INTERACTION ===
Query: what does annexure table a2.1.1 say
Retrieved 5 chunks.
Response:
Based on the provided context, there is no such thing as a information available describing the contents of a table labeled "A2.1.1."

The documents list "A2.1.1" within the table of contents as **"Studies included within the literature review"** (page 93). Nevertheless, the detailed content provided in your search results pertains to other tables, specifically:

*   **Annex 2.1:** Describes a summary of the literature on the expansion elasticity of employment, noting that "an in depth literature search was performed of studies on the elasticity of employment with respect to output growth."
*   **Annex Figure A2.1.1 (referenced as A2..1.11.1 within the text):** Titled "Estimates of output elasticity of employment," which states, "The literature provides a wide selection of estimates on the expansion elasticity of total employment."
*   **Annex Table A2.3.3:** Provides specific regression data regarding the robustness tests for the feminine employment ratio.

There isn't any specific data or content provided for Annex Table A2.1.1 beyond its title within the table of contents.

Query requiring Structural knowledge of the document

The query is what are the questions answered by chapter 2? Here is the response from Vectorless RAG. It precisely situated Chapter 2 and the section of “Questions” inside that and published it verbatim, together with the grounding information for explainability.

=== VECTORLESS RAG INTERACTION ===
Query: what are the questions answered by chapter 2
Nodes Retrieved: 0098
Response:
Based on the provided excerpts, Chapter 2 addresses the next questions:

*   **How does South Asia’s employment compare with that of other emerging market and developing economies (EMDEs)?**
*   **What are the correlates of long-run employment ratios amongst EMDEs?**
*   **How can South Asia create the roles needed to soak up its growing working-age population, while also boosting productivity?**

These questions are explicitly listed within the section titled **"Questions."**

The chunks in Vector RAG don’t maintain the structural hierarchy and section integrity, so while it locates the “Questions” section, it shouldn’t be in a position to find the precise questions in it.

=== VECTOR RAG INTERACTION ===
Query: what are the questions answered by chapter 2
Retrieved 5 chunks.
Response:
Based on the provided document, Chapter 2—titled "Jobless Development"—is structured to handle specific inquiries regarding labor markets and economic growth.

The chapter explicitly identifies a "Questions" section (page 65) to border its evaluation. While the document doesn't list the precise text of those questions, it indicates that the chapter covers the next thematic areas and analytical components:

*   **Trends and Productivity:** The chapter examines "Trends in employment and labor productivity" (page 67) and addresses "Labor productivity growth" (page 73).
*   **Employment Ratios:** A significant slice of the chapter is devoted to "Long-run employment ratios" (page 75) and their various "Correlates," including:
    *   **Policy and Structural Aspects:** Tax regimes, policy uncertainty (page 77), and trade openness (page 79).
    *   **Market and Financial Aspects:** Access to finance (page 81), in addition to product, labor, and land markets (page 82).
    *   **Human Capital and Firm Characteristics:** Education (page 83) and establishment size (page 78).
*   **Growth Dynamics:** The chapter explores the "Engines of output growth in EMDEs" (page 73) and the "Sectoral employment composition" (page 72).
*   **Policy Implications:** The research concludes by addressing "Policy options to spice up employment" (page 85).

So the query becomes: can we retain these benefits without paying the price?

Engineering a Higher Retriever — Proxy-Pointer RAG

Let’s construct Proxy-Pointer RAG to reply that query. The core insight is intuitive and as follows:

You don’t need LLM summaries to provide a vector database structural awareness. You simply have to encode the structure into the embeddings themselves.

The system uses the identical structural tree using PageIndex — but . Constructing this skeletal tree requires no expensive LLM calls during indexing. The tree is built purely from -based heading detection, which runs in milliseconds.

Then, as an alternative of asking an LLM to navigate the tree, we let do the retrieval — but we engineer the chunks in order that “understands” where each chunk lives within the document’s hierarchy.

Here’s a view of the Ingestion pipeline:

Ingestion Pipeline

Construct a Skeleton Tree

PageIndex’s tree parser doesn’t actually an LLM to construct the structural hierarchy. The heading detection is -based — it finds Markdown headers (#, ##, ###) and builds the nesting from indentation levels. The LLM is just used to each node.

We call the LLM-free version a Skeleton Tree: same structure, same node IDs, same line boundaries — but no summaries.

# Construct skeleton tree — no LLM, runs in milliseconds
pageindex = PageIndex(doc_path, enable_ai=False)
tree = pageindex.build_structure()  # Pure  parsing

The skeleton tree and the summarized tree produced for the sooner Vectorless RAG produce equivalent structures — same 137 nodes, same nesting depths, same line numbers, same titles. The one difference is the missing summary field.

Cost: $0. Time: < 1 second.

Structural Metadata Pointers (The Core Differentiator)

That is the guts of why PageIndex works so well — and the trick we’ll adopt.

In standard vector RAG, a retrieved chunk the context. Whatever 500 words returns, that’s what the LLM sees. If the chunk starts mid-sentence or ends before the important thing data point, the response will miss the intent of the query entirely (as illustrated within the previous section on Vectorless vs Vector RAG comparison).

PageIndex does something fundamentally different: the chunk is rarely the context. Each node within the tree knows its exact position in the unique document — its title, its node ID, and crucially, the start and end line numbers of the complete section it represents. When retrieval selects a node, PageIndex goes back to the unique Markdown file and slices out the full, contiguous section between those line boundaries.

We replicate this exactly. Every chunk we embed into the vector index carries wealthy structural metadata from the tree node:

metadata = {
    "doc_id": "SADU",           # Which document
    "node_id": "0012",          # Which structural node
    "title": "Introduction",    # Section heading
    "start_line": 624,          # Where the section starts in the unique file
    "end_line": 672             # Where the section ends
}

At retrieval time, we don’t feed the matched chunks to the LLM. As a substitute, we:

Use the chunks as proxies — they’re only there to discover are relevant. Remove duplicate (doc_id, node_id) combos to get unique top-k.
Follow the metadata pointers — open the unique Markdown, slice lines of nodes, e.g. 624 to 672
Deliver the total sections — the LLM receives the entire, pristine, structurally-intact text

Here’s a view of the retrieval pipeline:

This implies even when a piece only matched on a single sentence deep inside a piece, the synthesis LLM gets the — with its header, its context, its figures, its conclusions. The chunk was disposable; the pointer is what matters.

Because of this I call it Proxy-Pointer RAG: the vectors are proxies for location, the metadata are tips that could the actual content.

Cost: $0. Impact: Transforms context quality from fragmented chunks to finish document sections.

Breadcrumb Injection (Structural Context)

This is essential to reply the queries related to specific section of the document (reminiscent of Chapter 2). Standard vector RAG embeds raw text:

"While private investment growth has slowed in each South Asia and other EMDEs..."

has no idea this chunk comes from Chapter 1, under Economic Activity, inside Box 1.1. So when user asks this chunk won’t rank highly — it doesn’t contain the words “Chapter 1” or “primary messages.”

Breadcrumb injection prepends the total ancestry path from the Skeleton Tree to each chunk before embedding:

"[Chapter 1. Deceptive Strength > Economic activity > Regional developments > BOX 1.1 Accelerating Private Investment]
While private investment growth has slowed in each South Asia and other EMDEs..."

Now the embedding vector encodes the content AND its structural location. When someone asks about “Chapter 1,” knows which chunks belong to Chapter 1 — since the words “Chapter 1. Deceptive Strength” are present within the embedding.

# Construct breadcrumb from ancestry
current_crumb = f"{parent_breadcrumb} > {node_title}"

# Prepend to chunk text before embedding
enriched_text = f"[{current_crumb}]n{section_text}"
chunks = text_splitter.split_text(enriched_text)

This can be a zero-cost encoding of the tree structure into the vector space. We’re using the identical embeddings API, the identical index, the identical retrieval code. The one difference is what we feed into the embedder.

Cost: $0 extra. Impact: Transforms retrieval quality for structural queries.

Structure Guided Chunking (No Blind Sliding Windows)

Standard vector RAG applies a sliding window across the entire document — a 2000-character window that moves forward with some overlap, completely oblivious to the document’s structure. A bit might start mid-paragraph within the Introduction and end mid-sentence in a Figure caption. The boundaries are arbitrary, and each chunk is an island in itself, with no knowledge of its position in the general document structure.

Proxy-Pointer does something fundamentally different: we walk the tree, not the text.

For every node within the skeleton tree, we extract — from start_line to end_line — after which apply the text splitter to that isolated section. If a piece is brief enough, it becomes a single chunk. If it’s longer, the splitter divides it — but strictly inside that section’s boundaries.

Standard RAG:  Blind sliding window across entire document
[====chunk1====][====chunk2====][====chunk3====]...
    ↑ might start in Introduction, end in Figure caption

Proxy-Pointer: Chunk inside each node's boundaries
Introduction (lines 624-672)     → [chunk A] [chunk B]
Economic Activity (lines 672-676) → [chunk C]
BOX 1.1 (lines 746-749)          → skipped (< 100 chars)
Inflation (lines 938-941)        → [chunk D]

This guarantees three things:

Chunks never cross section boundaries — a piece from Introduction won't ever overlap with Economic Activity
Each chunk belongs to precisely one node — so the node_id metadata is at all times precise
Breadcrumbs are accurate per-chunk — they reflect the actual structural container, not a guess

Importantly, when a node is skipped (because its text is just too short — e.g., a “BOX 1.1” heading with no body content), the tree walk still recurses into its children. The actual content lives in child nodes like “Introduction,” “Features,” and “Figures” — all of which get embedded with the parent’s title of their breadcrumb (eg: BOX 1.1 Accelerating Private Investment > Introduction, BOX 1.1 > Features of...). No content is ever lost; only empty structural headers are excluded.

Cost: $0. Impact: Every chunk is structurally traceable — enabling precise metadata pointers.

Noise Filtering (Removing Distractions)

Some sections in any document are irrelevant for retrievals and contaminate the context: they contain keywords that match almost every query but provide no useful content.

Table of Contents — mentions every chapter title (matches “Chapter 1,” “Chapter 2,” all the pieces)
Executive Summary — paraphrases every key finding (matches every topic query)
Abbreviations — lists every acronym utilized in the document
Acknowledgments — mentions organizations, countries, and themes

These sections act as distractions in vector space. They sit near the centroid of the document’s embedding distribution, pulling retrieval toward shallow overview text as an alternative of deep analytical content.

We apply an easy title-based filter during indexing:

NOISE_TITLES = {
    "contents", "summary of contents", "table of contents",
    "abbreviations", "acknowledgments", "foreword",
    "executive summary", "references"
}

if title.strip().lower() in NOISE_TITLES:
    proceed  # Skip this node entirely

This filter removed 7 nodes from our test document. The impact was immediate: for the query , the system stopped retrieving the Executive Summary and Table of Contents and commenced retrieving the actual Introduction section.

Cost: $0. Impact: Eliminates essentially the most common retrieval failure mode.

That’s it. Proxy-Pointer RAG is now able to test.

Test Proxy-Pointer pipeline

While detailed comparison is mentioned in the subsequent section, listed here are the outcomes of the identical two queries that were cited within the “Vectorless vs Flat Vector RAG” section. As you'll be able to compare, the answers are as accurate and detailed as Vectorless RAG using PageIndex.

Query about table content

=== PROXY-POINTER RAG INTERACTION ===
Query: what does annexure table a2.1.1 say
Unique Context Nodes Extracted:
  - [SADU] Node 0131: 'ANNEX TABLE A2.1.1 Studies included within the literature review' | Lines: 3952-4157
  - [SADU] Node 0092: 'ANNEX TABLE SL.B1.1.1 Marginal probability of study documenting above-average effect for poor-average households' | Lines: 2537-2745
  - [SADU] Node 0052: 'TABLE B1.1.1 Correlates of the likelihood of personal investment accelerations starting' | Lines: 1663-1899

Response:
Based on the provided documents, **Annex Table A2.1.1** is titled **"Studies included within the literature review."**

This table provides a summary of varied research studies, detailing the next information for every:
*   **Paper:** The writer(s) and yr of the study (e.g., Kapsos (2005), Saget (2000), Papola and Sahu (2012), etc.).
*   **Period of primary sample:** The timeframe covered by the study (starting from the Nineteen Sixties to 2019).
*   **Sample coverage:** The size or variety of entities analyzed (e.g., "160 countries," "90 developing countries," "47 African countries").
*   **Region:** The geographic scope of the research (e.g., Worldwide, South Asia, Africa, Europe, Middle East, Asia).
*   **Economies:** The particular countries or economic groupings included within the research (e.g., "India," "Bulgaria, Czech Republic, Estonia...", "G20 countries").

The table documents a various set of research spanning from 1960 through 2019, covering regions and economies across the globe, including specific data on transition economies, various OECD countries, and specific regions reminiscent of South Asia and Africa.

Query requiring Structural knowledge

=== PROXY-POINTER RAG INTERACTION ===
Query: what are the questions answered by chapter 2
Unique Context Nodes Extracted:
  - [SADU] Node 0099: 'FIGURE 2.3 Growth models' | Lines: 3127-3155
  - [SADU] Node 0098: 'Questions' | Lines: 3115-3127
  - [SADU] Node 0124: 'ANNEX 2.1 Summary of literature on the expansion elasticity of employment' | Lines: 3807-3812

Response:
Based on the provided text, Chapter 2 addresses the next questions regarding South Asia’s economic growth and labor market:

*   **How does South Asia’s employment compare with that of other emerging market and developing economies (EMDEs)?** (Source: *Chapter 2, Questions section*)
*   **What are the correlates of long-run employment ratios amongst EMDEs?** (Source: *Chapter 2, Questions section*)
*   **How can South Asia create the roles needed to soak up its growing working-age population, while also boosting productivity?** (Source: *Chapter 2, Questions section*)

Moreover, the chapter explores these issues with the goal of understanding how the region can harness its working-age population to speed up convergence toward the income levels of advanced economies.

While we've demonstrated the price advantage to this point, does the architecture overcome the scalability challenge of PageIndex?

Is Proxy-Pointer Scalable?

Here is why the architecture is scalable across an enterprise knowledge base. PageIndex pays a scalability penalty at each ends: ~137 LLM calls per document during indexing, and an extra LLM reasoning step per query for tree navigation. Proxy-Pointer eliminates each.

No LLM at indexing. The skeleton tree is -built in milliseconds. The one API calls are to the embedding model — equivalent to straightforward vector RAG.
No tree navigation at retrieval. Queries go straight to the vector index. No LLM reading summaries, no per-document traversal.

Proxy-Pointer is standard vector RAG with intelligent metadata baked in. The structural awareness lives contained in the embeddings (via breadcrumbs) and the chunk metadata (via node pointers) — not in an LLM reasoning loop. It inherits all of vector RAG’s scalability: unified multi-document indexes, sub-linear search, incremental updates, and nil per-query LLM overhead beyond the ultimate synthesis.

Fail-safe for unstructured documents: If a document has no headings — or the skeleton tree produces only a single root node — the system detects this during chunking and falls back to a regular sliding window. Chunks are flagged with empty node_id and line boundaries. At retrieval time, flagged chunks are used directly as LLM context as an alternative of following pointers back to the source. The system gracefully degrades to straightforward vector RAG — no errors, no special handling required.

Let’s compare Vectorless RAG and Proxy-Pointer head-to-head.

Vectorless vs Proxy-Pointer RAG

I ran quite a lot of queries — broad structural, cross-reference, specific factual, figure specific etc. And let Claude judge the responses for a comprehensive comparison. You will discover the detailed responses from Vectorless and Proxy-Pointer together with the total Quality Comparison report here.

The next table encapsulates the decision. The Final Rating: PageIndex 2 — Proxy 4 — Ties 4 . In other words, Proxy-Pointer matches or beats PageIndex on 8 out of 10 queries. And all on the scalability and price of a Flat Vector RAG.

Here is the summary verdict:

#	Query Type	Winner
1	Broad structural (Ch.1 messages)	🔴 PageIndex
2	Broad structural (Ch.2 messages)	🔴 PageIndex (narrow)
3	Specific factual (Box 1.1 features)	🟡 Tie
4	Cross-reference (inflation tables)	🟢 Proxy-Pointer
5	Comparative (India vs region)	🟢 Proxy-Pointer
6	Figure-specific (B1.1.1 trends)	🟢 Proxy-Pointer
7	Direct lookup (Annexure A2.1.1)	🟡 Tie
8	Entity-specific (currency crisis countries)	🟡 Tie
9	Navigational (Ch.2 questions)	🟡 Tie
10	Inferential/policy (govt vs shocks)	🟢 Proxy-Pointer

And here is the price comparison:

Metric	PageIndex	Proxy-Pointer	Standard Vector RAG
Indexing LLM calls	~137 per doc	0	0
Indexing time	5-10 min/doc	< 30 sec/doc	< 30 sec/doc
Retrieval quality	★★★★★	★★★★★ (8/10 vs PageIndex)	★★★☆☆
Multi-doc scalability	Poor (per-doc tree nav)	Excellent (unified vector index)	Excellent
Structural awareness	Full (LLM-navigated)	High (breadcrumb-encoded)	None
Index rebuild on update	Expensive (re-summarize)	Low-cost (re-embed affected nodes)	Low-cost
Explainability	High (section titles + doc IDs)	High (section titles + doc IDs)	Low (opaque chunks)

Key Takeaways

Structure is the missing ingredient in RAG. The standard gap between naive vector RAG and PageIndex isn’t about higher embeddings — it’s about preserving hierarchy.
You don’t need an LLM to encode structure. Breadcrumb injection and structural metadata give the vector index, structural awareness with none cost.
Noise filtering beats higher embeddings. Removing 7 low-value nodes from the index had more impact on retrieval quality than any model swap could.
Pointers beat chunks. Chunks act as proxies for the total section, which is what the synthesizer LLM sees.

Conclusion

Proxy-Pointer RAG proves an easy thesis: you don’t need an expensive LLM to make a retriever structurally aware — you only should be clever about what you embed.

Five zero-cost engineering techniques — skeleton trees, metadata pointers, breadcrumbs, structure-guided chunking, and noise filtering — close the standard gap with a full LLM-navigated system, while keeping the speed and scalability of normal vector RAG. On our 10-query benchmark, Proxy-Pointer matched or beat PageIndex on 8 out of 10 queries, at the price of a regular Vector RAG.

The following time you’re constructing RAG in your structured (or unstructured) document repository, don’t reach for an even bigger model. Reach for Proxy-Pointer index.

Reference

World Bank. 2024. South Asia Development Update, April 2024: Jobs for Resilience — License: CC BY 3.0 IGO.

_{Images utilized in this text are generated using Google Gemini. Code created by me.}

Proxy-Pointer RAG: Achieving Vectorless Accuracy at Vector RAG Scale and Cost

Use Case Setup

How does PageIndex work?

Phase 1: Indexing(once per document)

Phase 2: Retrieval (Per Query)

Why this works so well?

Why that is difficult to scale?

Comparison of Vectorless vs Flat Vector RAG

Query about table content

Query requiring Structural knowledge of the document

Engineering a Higher Retriever — Proxy-Pointer RAG

Construct a Skeleton Tree

Structural Metadata Pointers (The Core Differentiator)

Breadcrumb Injection (Structural Context)

Structure Guided Chunking (No Blind Sliding Windows)

Noise Filtering (Removing Distractions)

Test Proxy-Pointer pipeline

Query about table content

Query requiring Structural knowledge

Is Proxy-Pointer Scalable?

Vectorless vs Proxy-Pointer RAG

Key Takeaways

Conclusion

Reference

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

A Data Scientist’s Tackle the $599 MacBook Neo

Constructing Robust Credit Scoring Models with Python

Constructing a Python Workflow That Catches Bugs Before Production

OpenClaw gives users yet another excuse to be freaked out about security

Working to advance the nuclear renaissance

Proxy-Pointer RAG: Achieving Vectorless Accuracy at Vector RAG Scale and Cost

Use Case Setup

How does PageIndex work?

Phase 1: Indexing(once per document)

Phase 2: Retrieval (Per Query)

Why this works so well?

Why that is difficult to scale?

Comparison of Vectorless vs Flat Vector RAG

Query about table content

Query requiring Structural knowledge of the document

Engineering a Higher Retriever — Proxy-Pointer RAG

Construct a Skeleton Tree

Structural Metadata Pointers (The Core Differentiator)

Breadcrumb Injection (Structural Context)

Structure Guided Chunking (No Blind Sliding Windows)

Noise Filtering (Removing Distractions)

Test Proxy-Pointer pipeline

Query about table content

Query requiring Structural knowledge

Is Proxy-Pointer Scalable?

Vectorless vs Proxy-Pointer RAG

Key Takeaways

Conclusion

Reference

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.