Constructing Cost-Efficient Agentic RAG on Long-Text Documents in SQL Tables

a reliable, low-latency, cost-efficient RAG system on a SQL table that stores large documents in long-text fields — without changing the prevailing schema?

This just isn’t a theoretical problem.

In most enterprises, critical business knowledge already lives inside traditional relational databases. Proposals, reports, contracts, articles — all stored in TEXT or LONGTEXT columns — designed for keyword matching and aggregations, not semantic retrieval.

With the appearance of LLMs, business demands have evolved to structured computation, deep semantic understanding and contextual insights in a natural, conversational manner.

For instance:

They require a retrieval strategy that may determine when to compute, when to semantically search, and when to mix each. In this text, I’ll display an Agentic RAG architecture that operates directly on top of a standard SQL database — without schema changes — and discuss the design principles required to make it reliable in production.

System setup

For this illustration, I actually have used a subset of the Social Animal 10K Articles with NLP dataset, which has numerous news articles and blog posts together with metadata. The SQL database created has the next columns — .

The title may be considered to be a novel identifier (primary key) for the content. The article categories are technology, business, sports, travel, health, entertainment, politics and fashion. The articles are distributed roughly evenly across the categories. The LLM used is gemini-2.5-flash and to index and store the vector embeddings. The design is applicable for any alternative of LLM or vector database.

Architecture

Besides embedding the raw text, we mirrored the vector store metadata with the identical fields present in SQL (except the total content). This permits for Filtering, as we are going to see in the outcomes. For long documents, a sliding window chunking and embedding strategy may be adopted with the metadata attached to every embedding.

The metadata code snippet is attached

for idx, row in df_sql.iterrows():
    content = str(row['full_content']).strip()
    if not content:
        proceed
        
    metadata = {
        "source": row.get('url', ''),
        "title": row.get('title', ''),
        "authors": str(row.get('authors', '')),
        "article_category": str(row.get('article_category', 'unknown')),
        "published_date": str(row.get('published_date', '')),
        "word_count": int(row.get('content_word_count', 0))
    }
    
    doc = Document(page_content=content, metadata=metadata)
    documents.append(doc)

We built two specialised, intelligent tools that the ReAct agent can invoke using the next architecture. The ReAct (router) agent orchestrates all the query pipeline by intelligently deciding which tool to invoke based on the character of the query. It uses the metadata and query context to find out whether the SQL tool, the vector tool, or a hybrid approach is most appropriate. The next figure depicts the query decision flow:

Query decision flow

The tools are as follows:

search_database (SQL tool): Handles questions that require computation, aggregation or complex logic. It executes SQL queries
search_articles (Vector tool): Handles questions on content, topic or specific entities. Accepts a natural language query, and optionally, metadata filters to execute a worldwide semantic search (eg: ““) or search a subset of the information (eg: “filter_authors=’XYZ’, “query”=”articles”).

As may be seen within the above figure, a question can take the next pathways:

For computations (eg; what number of articles..), inequalities/range (eg:) or aggregations (eg; ), use the SQL tool only.
Semantic search, with or without filters, use the Vector tool as explained above.
Hybrid query: Hybrid queries are essential while you need each structured data (e.g., filtering by date) and unstructured content (e.g., semantic search on articles). The query has a metadata filter criteria (normally category or a date range), for which the SQL tool is used to fetch articles. Then the list of titles is passed to the Vector tool to perform a semantic search on only those articles. An example can be ““

Results

Listed below are the outcomes of a number of queries of every type. It will be interesting to notice the tool pathway followed in each case.

1. “” — This can be a complex computation query, so only the SQL tool will likely be used.

The result’s here

QUERY: what are the articles by yr and category ?
TOOL SELECTION:
  - search_database: {'query': "SELECT strftime('%Y', published_date) AS yr, article_category, COUNT(*) FROM articles GROUP BY yr, article_category"}
DOCUMENTS SELECTED:
  - Tool Output: [('1970', 'business', 1), ('2001', 'politics', 1), ('2013', 'politics', 1), ('2018', 'entertainment', 1), ('2020', 'entertainment', 1), ('2020', 'sports', 1), ('2022', 'fashion', 1), ('2023', 'business', 19), ('2023', 'entertainment', 16), ('2023', 'fashion', 16), ('2023', 'health', 18), ('2023', 'politics', 16), ('2023', 'sports', 16), ('2023', 'technology', 18), ('2023', 'travel', 18)]
RESULT:
Listed below are the variety of articles by yr and category:
* 1970: business (1)
* 2001: politics (1)
* 2013: politics (1)
* 2018: entertainment (1)
* 2020: entertainment (1), sports (1)
* 2022: fashion (1)
* 2023: business (19), entertainment (16), fashion (16), health (18), politics (16), sports (16), technology (18), travel (18)

2. “” — This doesn’t match any of metadata categories that the schema consists of, so the agent decides to make use of the Vector tool with a worldwide semantic search.

The result’s here

QUERY: what articles about children do you've
TOOL SELECTION:
  - search_articles: {'query': 'children'}
DOCUMENTS SELECTED:
  - Tool Output: --- Document 1 ---
Source: https://www.filmcompanion.in/features/indepth-stories/streaming/best-childrens-movies-on-amazon-prime-video-netflix-hotstar-klaus-kung-fu-panda-jagga-jasoos-childrens-day-2020
Title: 10 Movies That Children Should Watch
Category: entertainment
Date: 2020-11-14 02:30:36
Content: On this kid's day, here's a listing of stories designed to assist young ones understand the world around them, gain life lessons and be dazzled by vibrant imagination. It's an awesome time to be a...
  - https://www.filmcompanion.in/features/indepth-stories/streaming/best-childrens-movies-on-amazon-prime-video-netflix-hotstar-klaus-kung-fu-panda-jagga-jasoos-childrens-day-2020
  - https://africabusiness.com/2023/04/07/save-the-children-and-thinkmd-expand-partnership-to-improve-the-lives-of-children-globally/
  - https://www.tcpalm.com/story/news/education/st-lucie-county-schools/2023/04/11/books-stay-in-st-lucie-county-schools-but-most-move-to-high-school/70098338007/
RESULT:
Listed below are some articles about children:

1.  10 Movies That Children Should Watch (entertainment)
2.  Save the Children and THINKMD Expand Partnership to Improve the Lives of Children Globally (health)
3.  St. Lucie County School Board decides to maintain challenged books at school libraries (health)

3. “” — The agent finds the category = fashion, and runs the semantic match using the Vector tool with this filter criteria.

The result’s here

QUERY: what are the trends in fashion
TOOL SELECTION:
  - search_articles: {'query': 'trends', 'filter_category': 'fashion'}
DOCUMENTS SELECTED:
  - Tool Output: --- Document 1 ---
Source: https://www.sightunseen.com/2023/04/the-best-thing-we-saw-in-milan-today-india-mahdavi-for-gebruder-thonet-vienna/
Title: The Best Thing We Saw in Milan Today: India Mahdavi for Gebrüder Thonet Vienna - Sight Unseen
Category: fashion
Date: 2023-04-18 12:00:00
Content: How one can Live With Objects

The Best Thing We Saw in Milan Today: India Mahdavi for Gebrüder Thonet Vienna

Sight Unseen is on the bottom on the Milan Furniture Fair right away and we’ll be bringing you load...
  - https://www.sightunseen.com/2023/04/the-best-thing-we-saw-in-milan-today-india-mahdavi-for-gebruder-thonet-vienna/
  - https://themoderndaygirlfriend.com/clean-make-up-skincare-brand-in-2023/
  - https://poprazzi.com/the-80s-inspired-jewelry-trend-im-absolutely-fawning-over/
RESULT:
The search results mention the next trends in fashion: India Mahdavi for Gebrüder Thonet Vienna, clean make up & skincare, and 80's-inspired jewelry.

4. “” — This can be a hybrid query where the SQL tool will likely be used to get the titles in 2023 for category = technology, then the Vector tool will likely be invoked with the query = crypto and the title list. The result will likely be found inside that subset.

The result’s here

QUERY: tell me technology articles about crypto in 2023
TOOL SELECTION:
  - search_database: {'query': "SELECT title FROM articles WHERE article_category = 'technology' AND published_date LIKE '2023%'"}
  - search_articles:  IRIS', 'Could this foam-filled £7,250 track bike rival the Team GB Hope HB.T?', 'Aave, Chainlink, and Binance are concentrating on enhancing the Shapella, while Big Eyes Coin is aiming for a significant final push'], 'query': 'crypto'
DOCUMENTS SELECTED:
  - Tool Output: [('NPR quits Twitter, says Musk-led platform is “undermining our credibility”',), ('Crypto.com arena considers rebrand after FTX fallout reignites investor anger',), ("World's Biggest EV Battery Maker Unveils S",), ('What Checks Should You Perform Before Buying a Used Car? - Free Car Mag',), ('Preeclampsia Diagnostics Market 2023-2030 Overall Analysis Report | Fortune Business Insights',), ('Spanish startup on ‘mission to save planet’s beer’ from climate crisis',), ('Best third-party apps for Re...
  - Tool Output: --- Document 1 ---
Source: https://www.analyticsinsight.net/crypto-com-arena-considers-rebrand-after-ftx-fallout-reignites-investor-anger/
Title: Crypto.com arena considers rebrand after FTX fallout reignites investor anger
Category: technology
Date: 2023-04-19 01:15:52
Content: The Crypto.com Arena, formerly known as the Staples Center, is reportedly considering a rebrand after the recent controversy surrounding FTX, a cryptocurrency exchange. The FTX fallout reignited anger among investors, wh...
  - https://www.analyticsinsight.net/crypto-com-arena-considers-rebrand-after-ftx-fallout-reignites-investor-anger/
  - https://theprint.in/ani-press-releases/aave-chainlink-and-binance-are-concentrating-on-enhancing-the-shapella-while-big-eyes-coin-is-aiming-for-a-significant-final-push/1527613/
  - https://echojobs.io/job/stripe-software-engineer-service-platform-wndr6
RESULT:
Here are some technology articles from 2023 that mention crypto:

1.  **Crypto.com arena considers rebrand after FTX fallout reignites investor anger**: This article discusses the possibility of the Crypto.com Arena rebranding due to the controversy surrounding the FTX cryptocurrency exchange.
2.  **Aave, Chainlink, and Binance are concentrating on enhancing the Shapella, while Big Eyes Coin is aiming for a significant final push**: This article mentions the Shapella upgrade and its implications for cryptocurrencies on the Ethereum network, including Binance, Aave, and Chainlink.
3. **Software Engineer, Service Platform at Stripe | echojobs.io**: This article mentions Stripe, a financial infrastructure platform that allows businesses to accept payments.

Key Considerations

As with any architecture, there are design principles to be considered for a robust application. Here are a few of them:

Tool docstrings vs system prompt: These are two types of instructions that guide the agent behavior in different ways. It is important to use them for the intended purposes without any overlap or conflict for a reliable agent performance. Tool docstring, located inside the @tool decorator, describes what the tool does and how to use it. Besides the tool name, it defines the parameters, types and descriptions.

Here is the example of the search_articles tool docstring.

@tool
def search_articles(query: str, filter_category: Optional[str] = None, ...):
    """Useful for locating details about specific topics, summaries, or details inside articles.
    
    You may filter by metadata for precision:
    - `filter_category`: 'health', 'tech', etc.
    - `filter_titles`: List of actual titles to retrieve (BATCH MODE).
    - `filter_date`: Published date (YYYY-MM-DD) for EXACT or PARTIAL match only.
    ...
    """

However, the system prompt intelligently guides the routing strategy for the agent, enabling it to determine when to make use of the SQL tool, Vector tool or a mix. It’s also probably the most complex and fragile component of the applying. It defines how tools are combined in hybrid workflows, provides examples of correct tool usage, and specifies mandatory rules and constraints. To adequately design the system prompt, It’s crucial to start with a test case repository of expected user queries, provide examples within the system prompt, and proceed enriching it for deviations that come up for edge cases during operations.

Here’s a sample of the system prompt

system_prompt = (
    "1. **LISTING/BROWSING QUERIES** (e.g., 'what articles are in politics'):n"
    "   - **ALWAYS use [search_database] to list titlesn"
    "   - DO NOT use [search_articles] and not using a semantic queryn"
    ...
    "### MANDATORY RULESn"
    "1. **DATE RANGES & INEQUALITIES**: Use SQL first, then pass titles to vector tooln"
    ...
)

Pre and Post filtering vector databases: This can be a subtle point that may have unintended and hard-to-explain results for specific queries. Consider the next two queries where the one difference is the mis-spelt name: “” and ““. Each the queries follow similar paths, whereby the SQL tool successfully selects the titles for this category and date (there is just 1 article mentioning Judge Dooley), then the Vector tool is named on this title list with the query. Strangely, for the primary query, the Vector tool returns "Tool Output: No documents found matching the factors." for this minor spelling error even when the list has just one article to pick out from, whereas for the second query it returns the proper article.

Here is the results of the primary query

QUERY: QUERY: summarize articles about Doo ley in politics on seventeenth apr 2023
TOOL SELECTION:
  - search_database: {'query': "SELECT title FROM articles WHERE published_date LIKE '2023-04-17%' AND article_category = 'politics'"}
  - search_articles: {'query': 'Doo ley', 'filter_category': 'politics', 'filter_titles': ['Judge Dooley Ends Hartford Police Consent Decree Despite Concerns']}
DOCUMENTS SELECTED:
  - Tool Output: [('Judge Dooley Ends Hartford Police Consent Decree Despite Concerns',)]
  - Tool Output: No documents found matching the factors.

And the second query

QUERY: summarize articles about Dooley in politics on seventeenth apr 2023
TOOL SELECTION:
  - search_database: {'query': "SELECT title FROM articles WHERE published_date LIKE '2023-04-17%' AND article_category = 'politics'"}
  - search_articles: {'query': 'Dooley', 'filter_category': 'politics', 'filter_titles': ['Judge Dooley Ends Hartford Police Consent Decree Despite Concerns']}
DOCUMENTS SELECTED:
  - Tool Output: [('Judge Dooley Ends Hartford Police Consent Decree Despite Concerns',)]
  - Tool Output: --- Document 1 ---
Source: https://www.nbcconnecticut.com/news/local/judge-ends-hartford-police-consent-decree-despite-concerns/3015203/
Title: Judge Dooley Ends Hartford Police Consent Decree Despite Concerns
Category: politics
Date: 2023-04-17 05:36:24
Content: Judge Dooley has ended the nearly 50 years of federal oversight of police in Hartford, despite continued concerns the department still has not hired enough minority officers to reflect town's large Black and Hispanic populations.

And the explanation just isn’t only a weaker embedding as a consequence of incorrect spelling. It’s because (and Chroma etc) perform post-filtering — first do a worldwide seek for the query, after which filter the outcomes for the metadata (= the title list). On this case, the proper article doesn’t feature within the articles after semantic search. A pre-filtering database, however, would have done the semantic search only on the articles within the title list and located the proper article even with the wrong spelling.

Can all metadata filters be faraway from the Vector Tool?: Yes, it is feasible, but its a higher cost option, as easy semantic queries with a metadata filter (resembling category or creator), will turn into a hybrid query, requiring two tool calls, adding to token usage and latency. A practical middle ground can be to maintain dates (and possibly other numeric metadata resembling word counts on this case) within the SQL only, and mirror all text and categorical metadata within the vector database.

Conclusion

Constructing RAG on top of SQL just isn’t about adding embeddings. It’s about designing the proper retrieval strategy.

When structured metadata and long-form content live in the identical table, the true challenge is orchestration — deciding when to compute with SQL, when to semantically search, and when to mix each. Subtle details like metadata filtering and gear routing could make the difference between a reliable system and one which silently fails.

With a well-designed Agentic RAG layer, legacy SQL databases can power semantic applications without schema changes, costly migrations, or performance trade-offs.

Reference

Social Animal 10K Articles with NLP — Dataset by Alex P (Owner) (CC BY-SA 4.0)

_{Images utilized in this text are generated using Google Gemini. Dataset used under} _{CC-BY-SA 4.0 license. Figures and underlying code created by me.}

Constructing Cost-Efficient Agentic RAG on Long-Text Documents in SQL Tables

System setup

Architecture

Results

Key Considerations

Conclusion

Reference

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Speed up Token Production in AI Factories Using Unified Services and Real-Time AI

How Can A Model 10,000× Smaller Outsmart ChatGPT?

NVIDIA Extreme Co-Design Delivers Latest MLPerf Inference Records

The Inversion Error: Why Secure AGI Requires an Enactive Floor and State-Space Reversibility

CUDA Tile Programming Now Available for BASIC!

Constructing Cost-Efficient Agentic RAG on Long-Text Documents in SQL Tables

System setup

Architecture

Results

Key Considerations

Conclusion

Reference

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.