GraphRAG in Practice: The way to Construct Cost-Efficient, High-Recall Retrieval Systems

article, , I outlined the core principles of GraphRAG design and introduced an augmented retrieval-and-generation pipeline that mixes graph search with vector search. I also discussed why constructing a wonderfully complete graph—one which captures every entity and relation within the corpus—could be prohibitively complex, especially at scale.

In this text, I expand on those ideas with concrete examples and code, demonstrating the sensible constraints encountered when constructing and querying real GraphRAG systems. I also illustrate that the retrieval pipeline helps balance cost and implementation complexity without sacrificing accuracy. Specifically, we’ll cover:

Constructing the graph: Should entity extraction occur on chunks or full documents—and the way much does this alternative actually matter?
Querying relations with no dense graph: Can we infer meaningful relations using iterative search-space optimisation as a substitute of encoding every relationship within the graph explicitly?
Handling weak embeddings: Why alphanumeric entities break vector search and the way graph context fixes it.

GraphRAG pipeline

To recall from the previous article, the GraphRAG embedding pipeline used is as follows. The Graph node and relations and their embeddings are stored in a Graph database. Also, the document chunks and their embeddings are stored within the database.

GraphRAG embedding

The proposed retrieval and response generation pipeline is as follows:

As could be seen, the graph result shouldn’t be directly used to answer user query. As a substitute it’s utilized in the next ways:

Node metadata (particularly ) acts as a powerful classifier, helping discover the relevant documents before vector search. That is crucial for big corpora where naive vector similarity can be noisy.
Context enrichment of the user query to retrieve essentially the most relevant chunks. That is crucial for certain forms of query with weak vector semantics akin to IDs, vehicle numbers, dates, and numeric strings.
Iterative search space optimisation, first by choosing essentially the most relevant documents, and inside those, essentially the most relevant chunks (using context enrichment). This permits us to maintain the graph easy, whereby all relations between the entities needn’t be necessarily extracted into the graph for queries about them to be answered accurately.

To reveal these ideas, we’ll use a dataset of 10 synthetically generated police reports, GPT-4o because the LLM, and Neo4j because the graph database.

Constructing the Graph

We shall be constructing a straightforward star graph with the Report Id because the central node and entities connected to the central node. The prompt to construct that might be as follows:

custom_prompt = ChatPromptTemplate.from_template("""
You're an information extraction assistant.
Read the text below and discover essential entities.

**Extraction rules:**
- At all times extract the **Report Id** (that is the central node).
- Extract **people**, **institutions**, **places**, **dates**, **monetary amounts**, and **vehicle registration numbers** (e.g., MH12AB1234, PK-02-4567, KA05MG2020).
- Don't ignore any people names; extract all mentioned within the document, even in the event that they seem minor or role not clear.
  Treat all of forms of vehicles (eg; cars, bikes etc) as the identical type of entity called "Vehicle".

**Output format:**
1. List all nodes (unique entities).
2. Discover the central node (Report Id).
3. Create relationships of the shape:
   (Report Id)-[HAS_ENTITY]->(Entity),
4. Don't create every other forms of relationships.                                            

Text:
{input}

Return only structured data like:
Nodes:
- Report SYN-REP-2024
- Honda bike ABCD1234
- XYZ College, Chennai
- NNN College, Mumbai
- 1434800
- Mr. John

Relationships:
- (Report SYN-REP-2024)-[HAS_ENTITY]->(Honda bike ABCD1234)
- (Report SYN-REP-2024)-[HAS_ENTITY]->(XYZ college, Chennai)
- ...
""")

Note that on this prompt, we aren’t extracting any relations akin to accused, witness etc. within the graph. All nodes may have a uniform relation with the central node which is the Report Id. I even have designed this as an extreme case, for example the indisputable fact that we will answer queries about relations between entities even with this minimal graph, based on the retrieval pipeline depicted within the previous section. If you happen to wish to incorporate a couple of essential relations, the prompt could be modified to incorporate clauses akin to the next:

3. For person entities, the relation must be based on their role within the Report (e.g., complainant, accused, witness, investigator etc).
    eg: (Report Id) -[Accused]-> (Person Name)
4. For all others, create relationships of the shape:
   (Report Id)-[HAS_ENTITY]->(Entity),

llm_transformer = LLMGraphTransformer(
    llm=llm,
    # allowed_relationships=["HAS_ENTITY"],
    prompt= custom_prompt,
)

Next we’ll create the graph for every document by making a Langchain document from the total text after which providing to Neo4j.

# Read entire file (no chunking)
with open(file_path, "r", encoding="utf-8") as f:
    text_content = f.read()

# Create LangChain Document
document = Document(
    page_content=text_content,
    metadata={
        "doc_id": doc_id,
        "source": filename,
        "file_path": file_path
    },
)
try:
    # Convert to graph (entire document)
    graph_docs = llm_transformer.convert_to_graph_documents([document])
    print(f"✅ Extracted {len(graph_docs[0].nodes)} nodes and {len(graph_docs[0].relationships)} relationships.")

    for gdoc in graph_docs:
        for node in gdoc.nodes:
            node.properties["doc_id"] = doc_id

            original_id = node.properties.get("id") or getattr(node, "id", None)
            if original_id:
                node.properties["entity_id"] = original_id

    # Add to Neo4j
    graph.add_graph_documents(
        graph_docs,
        baseEntityLabel=True,
        include_source=False
    )
except:
...

This creates a graph comprising 10 clusters as follows:

Key Observations

The variety of nodes extracted varies with LLM used and even for various runs of the identical LLM. With gpt-4o, each execution extracts between 15 to 30 nodes (depending upon the scale of the document) for every of the documents for a complete of 200 to 250 nodes. Since each is a star graph, the variety of relations is one lower than the variety of nodes for every document.

To see how severe this effect is, lets see the graph of considered one of the documents (SYN-REPORT-0008). The document has about 4000 words. And the resulting graph has 22 nodes and appears like the next:

Now, lets try generating the graph for this document by chunking it, then extracting entities from each chunk and merging them using the next logic:

The entities extraction prompt stays same as before, except we ask to extract entities apart from the Report Id.
First extract the Report Id from the document using this prompt.

report_id_prompt = ChatPromptTemplate.from_template("""
Extract ONLY the Report Id from the text.

Report Ids typically seem like:
- SYN-REP-2024

Return strictly one line:
Report: 

Text:
{input}
""")

Then, extract entities from each chunk using the entities prompt.

def extract_entities_by_chunk(llm, text, chunk_size=2000, overlap=200):
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=overlap
    )

    chunks = splitter.split_text(text)
    all_entities = []

    for i, chunk in enumerate(chunks):
        print(f"🔍 Processing chunk {i+1}/{len(chunks)}")
        raw = run_prompt(llm, entities_prompt, chunk)

        pairs = re.findall(r"- (.*?)s*|s*(w+)", raw)
        all_entities.extend([(e.strip(), t.strip()) for e, t in pairs])

    return all_entities

c. De-duplicate the entities

d. Construct the graph by connecting all of the entities to the central Report Id node.

The effect is kind of remarkable. The graph of SYN-REPORT-0008 now looks like the next. It has 78 nodes, 3X times the count before. The trade-off in constructing this dense graph are the time and usage incurred for the iterations for chunk extraction.

What are the implications?

The impact of the variation in graph density is in the flexibility to reply questions related to the entities directly and accurately; i.e if an entity or relation shouldn’t be present within the graph, a question related to it can’t be answered from the graph.

An approach to minimise this effect with our sparse star graph can be to create a question such that there’s a reference to a outstanding related entity more likely to be present within the graph.

As an example, the investigating officer is mentioned relatively fewer times than town in a police report, and there’s the next probability of town to be present within the graph relatively than the officer. Due to this fact, to search out out the investigating officer, as a substitute of claiming one can say , if it is understood that this officer is from Mumbai office. Our retrieval pipeline will then extract the reports related to Mumbai from the graph, and inside those documents, locate the chunks having the officer name accurately. That is demonstrated in the next sections.

Handling weak embeddings

Consider the next similar queries which can be more likely to be often asked of this data.

The small print concerning the incident within the report can’t be present in the graph as that holds the entities and relations only, and due to this fact, the response must be derived from the vector similarity search.

So, can the graph be ignored on this case?

. And the explanation is that the LLMs have an inherent understanding of person names and words as a consequence of their training, but find hard to connect any semantic intending to alphanumeric strings akin to report_id, vehicle numbers, amounts, dates etc. And due to this fact, the embedding of an individual’s name is far stronger than that of alphanumeric strings. So the chunks retrieved within the case of alphanumeric strings using vector similarity have a weak correlation to the user query, leading to an incorrect reply.

That is where the context enrichment using Graph helps. For a question like , we get all the main points from the star graph of the central node SYN-REPORT-0008 using a generated cypher, then have the LLM use this to generate a context (interpret the JSON response in natural language). The context also incorporates the sources for the nodes, which on this case returns 2 documents, considered one of which is the right document SYN-REPORT-0008. The opposite one SYN-REPORT-00010 is as a consequence of the indisputable fact that considered one of the attached nodes –city is common (Mumbai) for each the reports.

Now that the search space is refined to only 2 documents, chunks are extracted from each using this context together with the user query. And since the context from the graph mentions individuals, places, amounts and other details present in the primary report but not within the second, it enables the LLM to simply understand within the response synthesis step that the right chunks are those extracted from SYN-REPORT-0008 and never from 0010. And the reply is formed accurately. Here is the log of the graph query, JSON response and the natural language context depicting this.

Processing log

Generated Cypher:
cypher
MATCH (r:`__Entity__`:Report)
WHERE toLower(r.id) CONTAINS toLower("SYN-REPORT-0008")
OPTIONAL MATCH (r)-[]-(e)
RETURN DISTINCT 
    r.id AS report_id, 
    r.doc_id AS report_doc_id,
    labels(e) AS entity_labels,
    e.id AS entity_id, 
    e.doc_id AS entity_doc_id

JSON Response:
[{'report_id': 'Syn-Report-0008', 'report_doc_id': 'SYN-REPORT-0008', 'entity_labels': ['__Entity__', 'Person'], 'entity_id': 'Mr. Person_12', 'entity_doc_id': 'SYN-REPORT-0008'}, {'report_id': 'Syn-Report-0008', 'report_doc_id': 'SYN-REPORT-0008', 'entity_labels': ['__Entity__', 'Place'], 'entity_id': 'Latest Delhi', 'entity_doc_id': 'SYN-REPORT-0008'}, {'report_id': 'Syn-Report-0008', 'report_doc_id': 'SYN-REPORT-0008', 'entity_labels': ['__Entity__', 'Place'], 'entity_id': 'Kottayam', 'entity_doc_id': 'SYN-REPORT-0008'}, {'report_id': 'Syn-Report-0008', 'report_doc_id': 'SYN-REPORT-0008', 'entity_labels': ['__Entity__', 'Person'], 'entity_id': 'Person_4', 'entity_doc_id': 'SYN-REPORT-0008'}, {'report_id': 'Syn-Report-0008', 'report_doc_id': 'SYN-REPORT-0008', 'entity_labels':… truncated 

Natural language context:
The context describes an incident involving multiple entities, including individuals, places, monetary amounts, and dates. The next details are extracted:

1. **Individuals Involved**: Several individuals are mentioned, including "Mr. Person_12," "Person_4," "Person_11," "Person_8," "Person_5," "Person_6," "Person_3," "Person_7," "Person_10," and "Person_9."

2. **Places Referenced**: The places mentioned include "Latest Delhi," "Kottayam," "Delhi," and "Mumbai."

3. **Monetary Amounts**: Two monetary amounts are noted: "0.5 Million" and "43 1000's."

4. **Dates**: Two specific dates are mentioned: "07/11/2024" and "04/02/2025."

Sources: [SYN-REPORT-0008, SYN-REPORT-00010]

Can relations be successfully found?

What about finding relations between entities? Now we have ignored all specific relations in our graph and simplified it such that there is just one relation between the central report_id node and remainder of the entities. This is able to imply that querying for entities not present within the graph and relations between entities mustn’t be possible. Let’s test our iterative search optimisation pipeline against a wide range of such queries. We are going to consider two reports from Kolkata, and the next queries for this test.

Where the referred relation shouldn’t be present within the graph. Eg; Or “
Relation between two entities present within the graph. Eg;
Relation between any entities related to a 3rd entity. Eg;
Multi-hop relations:

Using our pipeline, all of the above queries yield accurate results. Lets take a look at the method for the last multi-hop query which is essentially the most complex one. Here the cypher doesn’t yield any result, so the flow falls back to semantic matching of nodes. The entities are extracted (Place: Kolkata) from the user query, then matched to get references to all of the reports connected to Kolkata, that are SYN-REPORT-0005 and SYN-REPORT-0006 on this case. Based on the context that the user query is inquiring about brothers and investigating officers, essentially the most relevant chunks are extracted from each the documents. The resultant reply successfully retrieves investigating officers for each reports.

Here is the response:

”

You may view the processing log here

> Entering latest GraphCypherQAChain chain...
2025-12-05 17:08:27 - HTTP Request: ... LLM called
Generated Cypher:
cypher
MATCH (p:`__Entity__`:Person)-[:HAS_ENTITY]-(r:`__Entity__`:Report)-[:HAS_ENTITY]-(pl:`__Entity__`:Place)
WHERE toLower(pl.id) CONTAINS toLower("kolkata") AND toLower(p.id) CONTAINS toLower("brother")
OPTIONAL MATCH (r)-[:HAS_ENTITY]-(officer:`__Entity__`:Person)
WHERE toLower(officer.id) CONTAINS toLower("investigating officer")
RETURN DISTINCT 
    r.id AS report_id, 
    r.doc_id AS report_doc_id, 
    officer.id AS officer_id, 
    officer.doc_id AS officer_doc_id

Cypher Response:
[]
2025-12-05 17:08:27 - HTTP Request: ...LLM called

> Finished chain.
is_empty: True
❌ Cypher didn't produce a confident result.
🔎 Running semantic node search...
📋 Detected labels: ['Place', 'Person', 'Institution', 'Date', 'Vehicle', 'Monetary amount', 'Chunk', 'GraphNode', 'Report']
User query for node search: investigating officer within the reports where brothers from Kolkata are accused
2025-12-05 17:08:29 - HTTP Request: ...LLM called
🔍 Extracted entities: ['Kolkata']
2025-12-05 17:08:30 - HTTP Request: ...LLM called
📌 Hits for entity 'Kolkata': [Document(metadata={'labels': ['Place'], 'node_id': '4:5b11b2a8-045c-4499-9df0-7834359d3713:41'}, page_content='TYPE: PlacenCONTENT: KolkatanDOC: SYN-REPORT-0006')]
📚 Retrieved node hits: [Document(metadata={'labels': ['Place'], 'node_id': '4:5b11b2a8-045c-4499-9df0-7834359d3713:41'}, page_content='TYPE: PlacenCONTENT: KolkatanDOC: SYN-REPORT-0006')]
Expanded node context:
 [Node] It is a __Place__ node. It represents 'TYPE: Place
CONTENT: Kolkata
DOC: SYN-REPORT-0006' (doc_id=N/A).
[Report Syn-Report-0005 (doc_id=SYN-REPORT-0005)] --(HAS_ENTITY)--> __Entity__, Institution: Mrs.Sri Balaji Forest Product Private Limited (doc_id=SYN-REPORT-0005)
[Report Syn-Report-0005 (doc_id=SYN-REPORT-0005)] --(HAS_ENTITY)--> __Entity__, Date: 2014 (doc_id=SYN-REPORT-0005)
[Report Syn-Report-0005 (doc_id=SYN-REPORT-0005)] --(HAS_ENTITY)--> __Entity__, Person: Mr. Pallab Biswas (doc_id=SYN-REPORT-0005)
[Report Syn-Report-0005 (doc_id=SYN-REPORT-0005)] --(HAS_ENTITY)--> __Entity__, Date: 2005 (doc_id=SYN-REPORT-0005).. truncated
[Report Syn-Report-0006 (doc_id=SYN-REPORT-0006)] --(HAS_ENTITY)--> __Entity__, Institution: M/S Jkjs & Co. (doc_id=SYN-REPORT-0006)
[Report Syn-Report-0006 (doc_id=SYN-REPORT-0006)] --(HAS_ENTITY)--> __Entity__, Person: B Mishra (doc_id=SYN-REPORT-0006)
[Report Syn-Report-0006 (doc_id=SYN-REPORT-0006)] --(HAS_ENTITY)--> __Entity__, Institution: Vishal Engineering Pvt. Ltd. (doc_id=SYN-REPORT-0006).. truncated

Key Takeaways

You don’t need an ideal graph. A minimally structured graph—even a star graph—can still support complex queries when combined with iterative search-space refinement.
Chunking boosts recall, but increases cost. Chunk-level extraction captures way more entities than whole-document extraction, but requires more LLM calls. Use it selectively based on document length and importance.
Graph context fixes weak embeddings. Entity types like IDs, dates, and numbers have poor semantic embeddings; enriching the vector search with graph-derived context is crucial for accurate retrieval.
Semantic node search is a strong fallback, to be exercised with caution. Even when Cypher queries fail (as a consequence of missing relations), semantic matching can discover relevant nodes and shrink the search space reliably.
Hybrid retrieval delivers accurate response on relations, with no dense graph. Combining graph-based document filtering with vector chunk retrieval allows accurate answers even when the graph lacks explicit relations.

Conclusion

Constructing a GraphRAG system that’s each accurate and cost-efficient requires acknowledging the sensible limitations of LLM-based graph construction. Large documents dilute attention, entity extraction is rarely perfect, and encoding every relationship quickly becomes expensive and brittle.

Nevertheless, as shown throughout this text, we will achieve highly accurate retrieval with no fully detailed knowledge graph. A straightforward graph structure—paired with iterative search-space optimization, semantic node search, and context-enriched vector retrieval—can outperform more complex and expensive designs.

This approach shifts the main target from extracting all the things upfront in a Graph to extracting what’s cost-effective, quick to extract and essential, and let the retrieval pipeline fill the gaps. The pipeline balances functionality, scalability and price, while still enabling sophisticated multi-hop queries across messy, real-world data.

You may read more concerning the GraphRAG design principles underpinning the concepts demonstrated here at

_{All images and data utilized in this text are synthetically generated. Figures and code created by me}

GraphRAG in Practice: The way to Construct Cost-Efficient, High-Recall Retrieval Systems

GraphRAG pipeline

Constructing the Graph

Key Observations

What are the implications?

Handling weak embeddings

Can relations be successfully found?

Key Takeaways

Conclusion

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

OpenAI steps into Anthropic’s Pentagon void

Zero-Waste Agentic RAG: Designing Caching Architectures to Minimize Latency and LLM Costs at Scale

Context Engineering as Your Competitive Edge

Constructing Telco Reasoning Models for Autonomous Networks with NVIDIA NeMo

5 Latest Digital Twin Products Developers Can Use to Construct 6G Networks

GraphRAG in Practice: The way to Construct Cost-Efficient, High-Recall Retrieval Systems

GraphRAG pipeline

Constructing the Graph

Key Observations

What are the implications?

Handling weak embeddings

Can relations be successfully found?

Key Takeaways

Conclusion

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.