Agentic GraphRAG for Business Contracts

-

, legal contracts are foundational documents that outline the relationships, obligations, and responsibilities between parties. Whether it’s a partnership agreement, an NDA, or a supplier contract, these documents often contain critical information that drives decision-making, risk management, and compliance. Nevertheless, navigating and extracting insights from these contracts is usually a complex and time-consuming process.

On this post, we’ll explore how we are able to streamline the technique of understanding and dealing with legal contracts by implementing an end-to-end solution using Agentic Graphrag. I see GraphRAG as an umbrella term for any method that retrieves or reasons over information stored in a knowledge graph, enabling more structured and context-aware responses. 

By structuring legal contracts right into a knowledge graph in Neo4j, we are able to create a strong repository of data that’s easy to question and analyze. From there, we’ll construct a LangGraph agent that permits users to ask specific questions on the contracts, making it possible to rapidly uncover latest insights.

The code is obtainable on this GitHub repository.

Why structuring data matters

Some domains work well with naive RAG, but legal contracts present unique challenges.

Pulling information from irrelevant contracts using naive vector RAG

As shown within the image, relying solely on a vector index to retrieve relevant chunks can introduce risks, akin to pulling information from irrelevant contracts. It’s because legal language is very structured, and similar wording across different agreements can result in incorrect or misleading retrieval. These limitations highlight the necessity for a more structured approach, akin to GraphRAG, to make sure precise and context-aware retrieval.

To implement GraphRAG, we first must construct a knowledge graph.

Legal knowledge graph containing each structured and unstructured information.

To construct a knowledge graph for legal contracts, we want a solution to extract structured information from documents and store it alongside the raw text. An LLM may help by reading through contracts and identifying key details akin to parties, dates, contract types, and essential clauses. As a substitute of treating the contract as only a block of text, we break it down into structured components that reflect its underlying legal meaning. For instance, an LLM can recognize that “ACME Inc. agrees to pay $10,000 per thirty days starting January 1, 2024” accommodates each a payment obligation and a start date, which we are able to then store in a structured format.

Once we now have this structured data, we store it in a knowledge graph, where entities like firms, agreements, and clauses are represented as represented together with their relationships. The unstructured text stays available, but now we are able to use the structured layer to refine our searches and make retrieval much more precise. As a substitute of just fetching probably the most relevant text chunks, we are able to filter contracts based on their attributes. This implies we are able to answer questions that naive RAG would struggle with, akin to what number of contracts were signed last month or whether we now have any energetic agreements with a particular company. These questions require aggregation and filtering, which isn’t possible with standard vector-based retrieval alone.

By combining structured and unstructured data, we also make retrieval more context-aware. If a user asks a couple of contract’s payment terms, we be sure that the search is constrained to the proper agreement moderately than counting on text similarity, which could pull in terms from unrelated contracts. This hybrid approach overcomes the restrictions of naive RAG and allows for a much deeper and more reliable evaluation of legal documents.

Graph construction

We’ll leverage an LLM to extract structured information from legal documents, using the CUAD (Contract Understanding Atticus Dataset), a widely used benchmark dataset for contract evaluation licensed under CC BY 4.0. CUAD dataset accommodates over 500 contracts, making it a great dataset for evaluating our structured extraction pipeline.

The token count distribution for the contracts is visualized below.

Most contracts on this dataset are relatively short, with token counts below 10,000. Nevertheless, there are some for much longer contracts, with a number of reaching as much as 80,000 tokens. These long contracts are rare, while shorter ones make up the bulk. The distribution shows a steep drop-off, meaning long contracts are the exception moderately than the rule.

We’re using Gemini-2.0-Flash for extraction, which has a 1 million token input limit, so handling these contracts isn’t an issue. Even the longest contracts in our dataset (around 80,000 tokens) fit well inside the model’s capability. Since most contracts are much shorter, we don’t need to worry about truncation or breaking documents into smaller chunks for processing.

Structured data extraction

Most industrial LLMs have the choice to make use of Pydantic objects to define the schema of the output. An example for location:

class Location(BaseModel):
    """
    Represents a physical location including address, city, state, and country.
    """

    address: Optional[str] = Field(
        ..., description="The road address of the situation.Use None if not provided"
    )
    city: Optional[str] = Field(
        ..., description="The town of the situation.Use None if not provided"
    )
    state: Optional[str] = Field(
        ..., description="The state or region of the situation.Use None if not provided"
    )
    country: str = Field(
        ...,
        description="The country of the situation. Use the two-letter ISO standard.",
    )

When using LLMs for structured output, Pydantic helps define a transparent schema by specifying the sorts of attributes and providing descriptions that guide the model’s responses. Each field has a kind, akin to str or Optional[str], and an outline that tells the LLM exactly how one can format the output.

For instance, in a Location model, we define key attributes like address, city, state, and country, specifying what data is anticipated and the way it must be structured. The country field, for example, follows two-letter country code standard like "US", "FR", or "JP", as an alternative of inconsistent variations like “United States” or “USA.” This principle applies to other structured data as well, ISO 8601 keeps dates in a normal format (YYYY-MM-DD), and so forth.

By defining structured output with Pydantic, we make LLM responses more reliable, machine-readable, and easier to integrate into databases or APIs. Clear field descriptions further help the model generate accurately formatted data, reducing the necessity for post-processing.

The Pydantic schema models will be more sophisticated just like the Contract model below, which captures key details of a legal agreement, ensuring the extracted data follows a standardized structure.

class Contract(BaseModel):
    """
    Represents the important thing details of the contract.
    """
  
    summary: str = Field(
        ...,
        description=("High level summary of the contract with relevant facts and details. Include all relevant information to supply full picture."
        "Do no use any pronouns"),
    )
    contract_type: str = Field(
        ...,
        description="The form of contract being entered into.",
        enum=CONTRACT_TYPES,
    )
    parties: List[Organization] = Field(
        ...,
        description="List of parties involved within the contract, with details of every party's role.",
    )
    effective_date: str = Field(
        ...,
        description=(
            "Enter the date when the contract becomes effective in yyyy-MM-dd format."
            "If only the yr (e.g., 2015) is understood, use 2015-01-01 because the default date."
            "All the time fill in full date"
        ),
    )
    contract_scope: str = Field(
        ...,
        description="Description of the scope of the contract, including rights, duties, and any limitations.",
    )
    duration: Optional[str] = Field(
        None,
        description=(
            "The duration of the agreement, including provisions for renewal or termination."
            "Use ISO 8601 durations standard"
        ),
    )
  
    end_date: Optional[str] = Field(
        None,
        description=(
            "The date when the contract expires. Use yyyy-MM-dd format."
            "If only the yr (e.g., 2015) is understood, use 2015-01-01 because the default date."
            "All the time fill in full date"
        ),
    )
    total_amount: Optional[float] = Field(
        None, description="Total value of the contract."
    )
    governing_law: Optional[Location] = Field(
        None, description="The jurisdiction's laws governing the contract."
    )
    clauses: Optional[List[Clause]] = Field(
        None, description=f"""Relevant summaries of clause types. Allowed clause types are {CLAUSE_TYPES}"""
    )

This contract schema organizes key details of legal agreements in a structured way, making it easier to investigate with LLMs. It includes various kinds of clauses, akin to confidentiality or termination, each with a brief summary. The parties involved are listed with their names, locations, and roles, while contract details cover things like start and end dates, total value, and governing law. Some attributes, akin to governing law, will be defined using nested models, enabling more detailed and complicated outputs.

We are able to test our approach using the next example. We’re using the LangChain framework to orchestrate LLMs.

llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")
llm.with_structured_output(Contract).invoke(
    "Tomaz works with Neo4j since 2017 and can make a billion dollar until 2030."
    "The contract was signed in Las Vegas"
)

which outputs

Contract(
    summary="Tomaz works with Neo4j since 2017 and can make a billion dollar until 2030.",
    contract_type="Service",
    parties=[
        Organization(
            name="Tomaz",
            location=Location(
                address=None,
                city="Las Vegas",
                state=None,
                country="US"
            ),
            role="employee"
        ),
        Organization(
            name="Neo4j",
            location=Location(
                address=None,
                city=None,
                state=None,
                country="US"
            ),
            role="employer"
        )
    ],
    effective_date="2017-01-01",
    contract_scope="Tomaz will work with Neo4j",
    duration=None,
    end_date="2030-01-01",
    total_amount=1_000_000_000.0,
    governing_law=None,
    clauses=None
)

Now that our contract data is in a structured format, we are able to define the Cypher query needed to import it into Neo4j, mapping entities, relationships, and key clauses right into a graph structure. This step transforms raw extracted data right into a queryable knowledge graph, enabling efficient traversal and retrieval of contract insights.

UNWIND $data AS row
MERGE (c:Contract {file_id: row.file_id})
SET c.summary = row.summary,
    c.contract_type = row.contract_type,
    c.effective_date = date(row.effective_date),
    c.contract_scope = row.contract_scope,
    c.duration = row.duration,
    c.end_date = CASE WHEN row.end_date IS NOT NULL THEN date(row.end_date) ELSE NULL END,
    c.total_amount = row.total_amount
WITH c, row
CALL (c, row) {
    WITH c, row
    WHERE row.governing_law IS NOT NULL
    MERGE (c)-[:HAS_GOVERNING_LAW]->(l:Location)
    SET l += row.governing_law
}
FOREACH (party IN row.parties |
    MERGE (p:Party {name: party.name})
    MERGE (p)-[:HAS_LOCATION]->(pl:Location)
    SET pl += party.location
    MERGE (p)-[pr:PARTY_TO]->(c)
    SET pr.role = party.role
)
FOREACH (clause IN row.clauses |
    MERGE (c)-[:HAS_CLAUSE]->(cl:Clause {type: clause.clause_type})
    SET cl.summary = clause.summary
)

This Cypher query imports structured contract data into Neo4j by creating Contract nodes with attributes akin to summary, contract_type, effective_date, duration, and total_amount. If a governing law is specified, it links the contract to a Location node. Parties involved within the contract are stored as Party nodes, with each party connected to a Location and assigned a task in relation to the contract. The query also processes clauses, creating Clause nodes and linking them to the contract while storing their type and summary.

After processing and importing the contracts, the resulting graph follows the next graph schema.

Imported legal graph schema

Let’s also take a take a look at a single contract.

This graph represents a contract structure where a contract (orange node) connects to varied clauses (red nodes), parties (blue nodes), and locations (violet nodes). The contract has three clauses: , , and . Two parties, and, are involved, each linked to their respective locations, Netherlands (NL) and United States (US). The contract is governed by U.S. law. The contract node also accommodates additional metadata, including dates and other relevant details.

A public read-only instance containing CUAD legal contracts is obtainable with the next credentials.

URI: neo4j+s://demo.neo4jlabs.com
username: legalcontracts
password: legalcontracts
database: legalcontracts

Entity resolution

Entity resolution in legal contracts is difficult resulting from variations in how firms, individuals, and locations are referenced. An organization might appear as “Acme Inc.” in a single contract and “Acme Corporation” in one other, requiring a process to find out whether or not they seek advice from the identical entity.

One approach is to generate candidate matches using text embeddings or string distance metrics like Levenshtein distance. Embeddings capture semantic similarity, while string distance measures character-level differences. Once candidates are identified, additional evaluation is required, comparing metadata akin to addresses or tax IDs, analyzing shared relationships within the graph, or incorporating human review for critical cases.

For resolving entities at scale, each open-source solutions like Dedupe and industrial tools like Senzing offer automated methods. Selecting the proper approach will depend on data quality, accuracy requirements, and whether manual oversight is possible.

With the legal graph constructed, we are able to move onto the agentic GraphRAG implementation. 

Agentic GraphRAG

Agentic architectures vary widely in complexity, modularity, and reasoning capabilities. At their core, these architectures involve an LLM acting as a central reasoning engine, often supplemented with tools, memory, and orchestration mechanisms. The important thing differentiator is how much autonomy the LLM has in making decisions and the way interactions with external systems are structured.

One among the best and simplest designs, particularly for chatbot-like implementations, is a direct LLM-with-tools approach. On this setup, the LLM serves because the decision-maker, dynamically choosing which tools to invoke (if any), retrying operations when obligatory, and executing multiple tools in sequence to satisfy complex requests. 

The diagram represents a straightforward LangGraph agent workflow. It begins at __start__, moving to the assistant node, where the LLM processes user input. From there, the assistant can either call tools to fetch relevant information or transition on to __end__ to finish the interaction. If a tool is used, the assistant processes the response before deciding whether to call one other tool or end the session. This structure allows the agent to autonomously determine when external information is required before responding.

This approach is especially well-suited to stronger industrial models like Gemini or GPT-4o, which excel at reasoning and self-correction.

Tools

LLMs are powerful reasoning engines, but their effectiveness often will depend on how well they’re equipped with external tools. These tools , whether database queries, APIs, or search functions, extend an LLM’s ability to retrieve facts, perform calculations, or interact with structured data. 

Designing tools which can be each general enough to handle diverse queries and precise enough to return meaningful results is more art than science. What we’re really constructing is a semantic layer between the LLM and the underlying data. Moderately than requiring the LLM to know the precise structure of a Neo4j knowledge graph or a database schema, we define tools that abstract away these complexities.

With this approach, the LLM doesn’t must know whether contract information is stored as graph nodes and relationships or as raw text in a document store. It only must invoke the proper tool to fetch relevant data based on a user’s query.

In our case, the contract retrieval tool serves as this semantic interface. When a user asks about contract terms, obligations, or parties, the LLM calls a structured query tool that translates the request right into a database query, retrieves relevant information, and presents it in a format the LLM can interpret and summarize. This allows a versatile, model-agnostic system where different LLMs can interact with contract data without having direct knowledge of its storage or structure.

There’s no one-size-fits-all standard for designing an optimal toolset. What works well for one model may fail for an additional. Some models handle ambiguous tool instructions gracefully, while others struggle with complex parameters or require explicit prompting. The trade-off between generality and task-specific efficiency means tool design requires iteration, testing, and fine-tuning for the LLM in use.
For contract evaluation, an efficient tool should retrieve contracts and summarize key terms without requiring users to phrase queries rigidly. Achieving this flexibility will depend on thoughtful prompt engineering, robust schema design, and adaptation to different LLM capabilities. As models evolve, so do strategies for making tools more intuitive and effective.

On this section, we’ll explore different approaches to tool implementation, comparing their flexibility, effectiveness, and compatibility with various LLMs.

My preferred approach is to dynamically and deterministically construct a Cypher query and execute it against the database. This method ensures consistent and predictable query generation while maintaining implementation flexibility. By structuring queries this fashion, we reinforce the semantic layer, allowing user inputs to be seamlessly translated into database retrievals. This keeps the LLM focused on retrieving relevant information moderately than understanding the underlying data model.

Our tool is meant to discover relevant contracts, so we want to supply the LLM with options to look contracts based on various attributes. The input description is again provided as a Pydantic object.

class ContractInput(BaseModel):
    min_effective_date: Optional[str] = Field(
        None, description="Earliest contract effective date (YYYY-MM-DD)"
    )
    max_effective_date: Optional[str] = Field(
        None, description="Latest contract effective date (YYYY-MM-DD)"
    )
    min_end_date: Optional[str] = Field(
        None, description="Earliest contract end date (YYYY-MM-DD)"
    )
    max_end_date: Optional[str] = Field(
        None, description="Latest contract end date (YYYY-MM-DD)"
    )
    contract_type: Optional[str] = Field(
        None, description=f"Contract type; valid types: {CONTRACT_TYPES}"
    )
    parties: Optional[List[str]] = Field(
        None, description="List of parties involved within the contract"
    )
    summary_search: Optional[str] = Field(
        None, description="Inspect summary of the contract"
    )
    country: Optional[str] = Field(
        None, description="Country where the contract applies. Use the two-letter ISO standard."
    )
    energetic: Optional[bool] = Field(None, description="Whether the contract is energetic")
    monetary_value: Optional[MonetaryValue] = Field(
        None, description="The whole amount or value of a contract"
    )

With LLM tools, attributes can take various forms depending on their purpose. Some fields are easy strings, akin to contract_type and country, which store single values. Others, like parties, are lists of strings, allowing multiple entries (e.g., multiple entities involved in a contract).

Beyond basic data types, attributes can even represent complex objects. For instance, monetary_value uses a MonetaryValue object, which incorporates structured data akin to currency type and the operator. While attributes with nested objects offer a transparent and structured representation of information, models are likely to struggle to handle them effectively, so we must always keep them easy.

As a part of this project, we’re experimenting with an extra cypher_aggregation attribute, providing the LLM with greater flexibility for scenarios that require specific filtering or aggregation.

cypher_aggregation: Optional[str] = Field(
    None,
    description="""Custom Cypher statement for advanced aggregations and analytics.

    This will likely be appended to the bottom query:
    ```
    MATCH (c:Contract)
    
    WITH c, summary, contract_type, contract_scope, effective_date, end_date, parties, energetic, monetary_value, contract_id, countries
    
    ```
    
    Examples:
    
    1. Count contracts by type:
    ```
    RETURN contract_type, count(*) AS count ORDER BY count DESC
    ```
    
    2. Calculate average contract duration by type:
    ```
    WITH contract_type, effective_date, end_date
    WHERE effective_date IS NOT NULL AND end_date IS NOT NULL
    WITH contract_type, duration.between(effective_date, end_date).days AS duration
    RETURN contract_type, avg(duration) AS avg_duration ORDER BY avg_duration DESC
    ```
    
    3. Calculate contracts per effective date yr:
    ```
    RETURN effective_date.yr AS yr, count(*) AS count ORDER BY yr
    ```
    
    4. Counts the party with the very best variety of energetic contracts:
    ```
    UNWIND parties AS party
    WITH party.name AS party_name, energetic, count(*) AS contract_count
    WHERE energetic = true
    RETURN party_name, contract_count
    ORDER BY contract_count DESC
    LIMIT 1
    ```
    """

The cypher_aggregation attribute allows LLMs to define custom Cypher statements for advanced aggregations and analytics. It extends the bottom query by appending question-specified aggregation logic, enabling flexible filtering and computation.

This feature supports use cases akin to counting contracts by type, calculating average contract duration, analyzing contract distributions over time, and identifying key parties based on contract activity. By leveraging this attribute, the LLM can dynamically generate insights tailored to specific analytical needs without requiring predefined query structures.

While this flexibility is helpful, it must be fastidiously evaluated, as increased adaptability comes at the associated fee of reduced consistency and robustness resulting from the added complexity of the operation.

We must clearly define the function’s name and outline when presenting it to the LLM. A well-structured description helps guide the model in using the function accurately, ensuring it understands its purpose, expected inputs, and outputs. This reduces ambiguity and improves the LLM’s ability to generate meaningful and reliable queries.

class ContractSearchTool(BaseTool):
    name: str = "ContractSearch"
    description: str = (
        "useful for when you'll want to answer questions related to any contracts"
    )
    args_schema: Type[BaseModel] = ContractInput

Finally, we want to implement a function that processes the given inputs, constructs the corresponding Cypher statement, and executes it efficiently.

The core logic of the function centers on constructing the Cypher statement. We start by matching the contract as the muse of the query.

cypher_statement = "MATCH (c:Contract) "

Next, we want to implement the function that processes the input parameters. In this instance, we primarily use attributes to filter contracts based on the given criteria.

Easy property filtering
For instance, the contract_type attribute is used to perform easy node property filtering.

if contract_type:
    filters.append("c.contract_type = $contract_type")
    params["contract_type"] = contract_type

This code adds a Cypher filter for contract_type while using query parameters for values to forestall query injection security issue.

For the reason that possible contract type values are presented within the attribute description

contract_type: Optional[str] = Field(
    None, description=f"Contract type; valid types: {CONTRACT_TYPES}"
)

we don’t need to worry about mapping values from input to valid contract types because the LLM will handle that.

Inferred property filtering

We’re constructing tools for an LLM to interact with a knowledge graph, where the tools function an abstraction layer over structured queries. A key feature is the power to make use of inferred properties at runtime, just like an ontology but dynamically computed.

if energetic isn't None:
    operator = ">=" if energetic else "<"
    filters.append(f"c.end_date {operator} date()")

Here, energetic acts as a runtime classification, determining whether a contract is ongoing (>= date()) or expired (< date()). This logic extends structured KG queries by computing properties only when needed, enabling more flexible LLM reasoning. By handling logic like this inside tools, we make sure the LLM interacts with simplified, intuitive operations, keeping it focused on reasoning moderately than query formulation.

Neighbor filtering

Sometimes filtering will depend on neighboring nodes, akin to restricting results to contracts involving specific parties. The parties attribute is an optional list, and when provided, it ensures only contracts linked to those entities are considered:

if parties:
    parties_filter = []
    for i, party in enumerate(parties):
        party_param_name = f"party_{i}"
        parties_filter.append(
            f"""EXISTS {{
            MATCH (c)<-[:PARTY_TO]-(party)
            WHERE toLower(party.name) CONTAINS ${party_param_name}
        }}"""
        )
        params[party_param_name] = party.lower()

This code filters contracts based on their associated parties, treating the logic as AND, meaning all specified conditions have to be met for a contract to be included. It iterates through the provided parties list and constructs a question where each party condition must hold.

For every party, a singular parameter name is generated to avoid conflicts. The EXISTS clause ensures that the contract has a PARTY_TO relationship to a celebration whose name accommodates the desired value. The name is converted to lowercase to permit case-insensitive matching. Each party condition is added individually, enforcing an implicit AND between them.

If more complex logic were needed, akin to supporting OR conditions or allowing different matching criteria, the input would want to vary. As a substitute of a straightforward list of party names, a structured input format specifying operators could be required.

Custom operator filtering

So as to add more flexibility, we are able to introduce an operator object as a nested attribute, allowing more control over filtering logic. As a substitute of hardcoding comparisons, we define an enumeration for operators and use it dynamically.

For instance, with monetary values, a contract might must be filtered based on whether its total amount is larger than, lower than, or exactly equal to a specified value. As a substitute of assuming a hard and fast comparison logic, we define an enum that represents the possible operators:

class NumberOperator(str, Enum):
    EQUALS = "="
    GREATER_THAN = ">"
    LESS_THAN = "<"

class MonetaryValue(BaseModel):
    """The whole amount or value of a contract"""
    value: float
    operator: NumberOperator

if monetary_value:
    filters.append(f"c.total_amount {monetary_value.operator.value} $total_value")
    params["total_value"] = monetary_value.value

This approach makes the system more expressive. As a substitute of rigid filtering rules, the tool interface allows the LLM to specify not only a price but the way it must be compared, making it easier to handle a broader range of queries while keeping the LLM’s interaction easy and declarative.

Some LLMs struggle with nested objects as inputs, making it harder to handle structured operator-based filtering. Adding a operator introduces additional complexity because it requires two separate values, which may result in ambiguity in parsing and input validation.

Min and Max attributes

To maintain things simpler, I are likely to gravitate toward using min and max attributes for dates, as this naturally supports range filtering and makes the logic straightforward.

if min_effective_date:
    filters.append("c.effective_date >= date($min_effective_date)")
    params["min_effective_date"] = min_effective_date
if max_effective_date:
    filters.append("c.effective_date <= date($max_effective_date)")
    params["max_effective_date"] = max_effective_date

This function filters contracts based on an efficient date range by adding an optional lower and upper sure condition when min_effective_date and max_effective_date are provided, ensuring that only contracts inside the desired date range are included.

Semantic search

An attribute can be used for semantic search, where as an alternative of counting on a vector index upfront, we use a post-filtering approach to metadata filtering. First, structured filters, like date ranges, monetary values, or parties, are applied to narrow down the candidate set. Then, vector search is performed over this filtered subset to rank results based on semantic similarity. 

if summary_search:
    cypher_statement += (
        "WITH c, vector.similarity.cosine(c.embedding, $embedding) "
        "AS rating ORDER BY rating DESC WITH c, rating WHERE rating > 0.9 "
    )  # Define a threshold limit
    params["embedding"] = embeddings.embed_query(summary_search)
else:  # Else we sort by latest
    cypher_statement += "WITH c ORDER BY c.effective_date DESC "

This code applies semantic search when summary_search is provided by computing cosine similarity between the contract’s embedding and the query embedding, ordering results by relevance, and filtering out low-scoring matches with a threshold of 0.9. Otherwise, it defaults to sorting contracts by probably the most recent effective_date.

Dynamic queries

The cypher aggregation attribute is an experiment I desired to test that provides the LLM a level of partial text2cypher capability, allowing it to dynamically generate aggregations after the initial structured filtering. As a substitute of predefining every possible aggregation, this approach lets the LLM specify calculations like counts, averages, or grouped summaries on demand, making queries more flexible and expressive. Nevertheless, since this shifts more query logic to the LLM, ensuring all generated queries work accurately becomes difficult, as malformed or incompatible Cypher statements can break execution. This trade-off between flexibility and reliability is a key consideration in designing the system.

if cypher_aggregation:
    cypher_statement += """WITH c, c.summary AS summary, c.contract_type AS contract_type, 
      c.contract_scope AS contract_scope, c.effective_date AS effective_date, c.end_date AS end_date,
      [(c)<-[r:PARTY_TO]-(party) | {party: party.name, role: r.role}] AS parties, c.end_date >= date() AS energetic, c.total_amount as monetary_value, c.file_id AS contract_id,
      apoc.coll.toSet([(c)<-[:PARTY_TO]-(party)-[:LOCATED_IN]->(country) | country.name]) AS countries """
    cypher_statement += cypher_aggregation

If no cypher aggregation is provided, we return the full count of identified contracts together with only five example contracts to avoid overwhelming the prompt. Handling excessive rows is crucial, as an LLM combating a large result set isn’t useful. Moreover, LLM producing answers with 100 contract titles isn’t a great user experience either.

cypher_statement += """WITH collect(c) AS nodes
RETURN {
    total_count_of_contracts: size(nodes),
    example_values: [
      el in nodes[..5] |
      {summary:el.summary, contract_type:el.contract_type, 
       contract_scope: el.contract_scope, file_id: el.file_id, 
        effective_date: el.effective_date, end_date: el.end_date,
        monetary_value: el.total_amount, contract_id: el.file_id, 
        parties: [(el)<-[r:PARTY_TO]-(party) | {name: party.name, role: r.role}], 
        countries: apoc.coll.toSet([(el)<-[:PARTY_TO]-()-[:LOCATED_IN]->(country) | country.name])}
    ]
} AS output"""

This cypher statement collects all matching contracts into a listing, returning the full count and as much as five example contracts with key attributes, including summary, type, scope, dates, monetary value, associated parties with roles, and unique country locations.
Now that our contract search tool is built, we hand it off to the LLM and similar to that, we now have agentic GraphRAG implemented.

Agent Benchmark

For those who’re serious about implementing agentic GraphRAG, you wish an evaluation dataset, not only as a benchmark but as a foundation for all the project. A well-constructed dataset helps define the scope of what the system should handle, ensuring that initial development aligns with real-world use cases. Beyond that, it becomes a useful tool for evaluating performance, allowing you to measure how well the LLM interacts with the graph, retrieves information, and applies reasoning. It’s also essential for prompt engineering optimizations, letting you iteratively refine queries, tool use, and response formatting with clear feedback moderately than guesswork. With no structured dataset, you’re flying blind, making improvements harder to quantify and inconsistencies tougher to catch.

The code for the benchmark is available on GitHub.

I actually have compiled a listing of twenty-two questions which we'll use to judge the system. Moreover, we're going to introduce a brand new metric called answer_satisfaction where we will likely be provide a custom prompt.

answer_satisfaction = AspectCritic(
    name="answer_satisfaction",
    definition="""You'll evaluate an ANSWER to a legal QUESTION based on a provided SOLUTION.

Rate the reply on a scale from 0 to 1, where:
- 0 = incorrect, substantially incomplete, or misleading
- 1 = correct and sufficiently complete

Consider these evaluation criteria:
1. Factual correctness is paramount - the reply must not contradict the answer
2. The reply must address the core elements of the answer
3. Additional relevant information beyond the answer is suitable and should enhance the reply
4. Technical legal terminology must be used appropriately if present in the answer
5. For quantitative legal analyses, accurate figures have to be provided

+ fewshots
"""

Many questions can return a considerable amount of information. For instance, asking for contracts signed before 2020 might yield a whole lot of results. For the reason that LLM receives each the full count and a number of example entries, our evaluation should concentrate on the full count, moderately than which specific examples the LLM chooses to point out.

Benchmark results.

The provided results indicate that each one evaluated models (Gemini 1.5 Pro, Gemini 2.0 Flash, and GPT-4o) perform similarly well for many tool calls, with GPT-4o barely outperforming the Gemini models (0.82 vs. 0.77). The noticeable difference emerges primarily when partial text2cypher is used, particularly for various aggregation operations.

Moreover, I’ve seen projects where accuracy will be improved significantly by leveraging Python for aggregations, as LLMs typically handle Python code generation and execution higher than generating complex Cypher queries directly.

Web Application

We’ve also built a straightforward React web application, powered by LangGraph hosted on FastAPI, which streams responses on to the frontend. Special due to Anej Gorkic for creating the net app.

You'll be able to launch all the stack with the next command:

docker compose up

And navigate to localhost:5173 

Summary

As LLMs gain stronger reasoning capabilities, they, when paired with the proper tools, can develop into powerful agents for navigating complex domains like legal contracts. On this post, we’ve only scratched the surface, specializing in core contract attributes while barely touching the wealthy number of clauses present in real-world agreements. There’s significant room for growth, from expanding clause coverage to refining tool design and interaction strategies.

The code is obtainable on GitHub.

Images

All images on this post were created by the creator.

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x