AM on a Tuesday (well, technically Wednesday, I suppose), when my phone buzzed with that familiar, dreaded PagerDuty notification.
I didn’t even must open my laptop to know that the daily_ingest.py script had failed. Again.
It keeps failing because our data provider at all times changes their file format suddenly. I mean, they may randomly switch from commas to pipes and even mess up the dates overnight.
Normally, the actual fix takes me nearly thirty seconds: I simply open the script, swap sep=',' for sep='|', and hit run.
I do know that was quick, but in all honesty, the true cost isn’t the coding time, but reasonably the interrupted sleep and the way hard it’s to get your brain working at 2 AM.
This routine got me pondering: if the answer is so obvious that I can figure it out just by glancing on the raw text, why couldn’t a model do it?
We regularly hear hype about “Agentic AI” replacing software engineers, which, to me, truthfully feels somewhat overblown.
But then, the concept of using a small, cost-effective LLM to act as an on-call junior developer handling boring pandas exceptions?
Now that gave the impression of a project value trying.
So, I built a “Self-Healing” pipeline. Even though it isn’t magic, it has successfully shielded me from a minimum of three late-night wake-up calls this month.
And personally, anything (irrespective of how little) that may improve my sleep health is unquestionably an enormous win for me.
Here is the breakdown of how I did it so you may construct it yourself.
The Architecture: A “Try-Heal-Retry” Loop
The core concept of this is comparatively easy. Most data pipelines are fragile because they assume the world is ideal, and when the input data changes even barely, they fail.
As an alternative of accepting that crash, I designed my script to catch the exception, capture the “crime scene evidence”, which is largely the traceback and the primary few lines of the file, after which pass it all the way down to an LLM.
Pretty neat, right?
The LLM now acts as a diagnostic tool, analyzing the evidence to return the parameters, which the script then uses to mechanically retry the operation.
To make this technique robust, I relied on three specific tools:
- Pandas: For the actual data loading (obviously).
- Pydantic: To make sure the LLM returns structured JSON reasonably than conversational filler.
- Tenacity: A Python library that makes writing complex retry logic incredibly clean.
Step 1: Defining the “Fix”
The first challenge with using Large Language Models for code generation is their tendency to hallucinate. From my experience, if you happen to ask for an easy parameter, you frequently receive a paragraph of conversational text in return.
To stop that, I leveraged structured outputs via Pydantic and OpenAI’s API.
This forces the model to finish a strict form, acting as a filter between the messy AI reasoning and our clean Python code.

Here is the schema I settled on, focusing strictly on the arguments that the majority commonly cause read_csv to fail:
from pydantic import BaseModel, Field
from typing import Optional, Literal
# We want a strict schema so the LLM doesn't just yap at us.
# I'm only including the params that really cause crashes.
class CsvParams(BaseModel):
sep: str = Field(description="The delimiter, e.g. ',' or '|' or ';'")
encoding: str = Field(default="utf-8", description="File encoding")
header: Optional[int | str] = Field(default="infer", description="Row for col names")
# Sometimes the C engine chokes on separators, so we let the AI switch engines
engine: Literal["python", "c"] = "python"
By defining this BaseModel, we’re effectively telling the LLM:
Step 2: The Healer Function
This function is the center of the system, designed to run only when things have already gone improper.
Getting the prompt right took some trial and error. And that’s because initially, I only provided the error message, which forced the model to guess blindly at the issue.
I quickly realized that to accurately discover issues like delimiter mismatches, the model needed to really “see” a sample of the raw data.
Now here is the massive catch. You can not actually read the entire file.
When you attempt to pass a 2GB CSV into the prompt, you’ll blow up your context window and apparently your wallet.
Fortunately, I discovered that just pulling the primary few lines gives the model barely enough info to repair the issue 99% of the time.
import openai
import json
client = openai.OpenAI()
def ask_the_doctor(fp, error_trace):
"""
The 'On-Call Agent'. It looks on the file snippet and error,
and suggests latest parameters.
"""
print(f"🔥 Crash detected on {fp}. Calling LLM...")
# Hack: Just grab the primary 4 lines. No must read 1GB.
# We use errors='replace' so we do not crash while attempting to fix a crash.
try:
with open(fp, "r", errors="replace") as f:
head = "".join([f.readline() for _ in range(4)])
except Exception:
head = "<>"
# Keep the prompt easy. No need for complex "persona" injection.
prompt = f"""
I'm attempting to read a CSV with pandas and it failed.
Error Trace: {error_trace}
Data Snippet (First 4 lines):
---
{head}
---
Return the proper JSON params (sep, encoding, header, engine) to repair this.
"""
# We force the model to make use of our Pydantic schema
completion = client.chat.completions.create(
model="gpt-4o", # gpt-4o-mini can also be advantageous here and cheaper
messages=[{"role": "user", "content": prompt}],
functions=[{
"name": "propose_fix",
"description": "Extracts valid pandas parameters",
"parameters": CsvParams.model_json_schema()
}],
function_call={"name": "propose_fix"}
)
# Parse the result back to a dict
args = json.loads(completion.decisions[0].message.function_call.arguments)
print(f"💊 Prescribed fix: {args}")
return args
I’m type of glossing over the API setup here, but you get the concept. It takes the “symptoms” and prescribes a “pill” (the arguments).
Step 3: The Retry Loop (Where the Magic Happens)
Now we want to wire this diagnostic tool into our actual data loader.
Previously, I wrote ugly while True loops with nested try/except blocks that were a nightmare to read.
Then I discovered tenacity, which permits you to decorate a function with clean retry logic.
And the most effective part is that tenacity also permits you to define a custom “callback” that runs attempts.
This is strictly where we inject our Healer function.
import pandas as pd
from tenacity import retry, stop_after_attempt, retry_if_exception_type
# A grimy global dict to store the "fix" between retries.
# In an actual class, this could be self.state, but for a script, this works.
fix_state = {}
def apply_fix(retry_state):
# This runs right after the crash, before the following attempt
e = retry_state.end result.exception()
fp = retry_state.args[0]
# Ask the LLM for brand new params
suggestion = ask_the_doctor(fp, str(e))
# Update the state so the following run uses the suggestion
fix_state[fp] = suggestion
@retry(
stop=stop_after_attempt(3), # Give it 3 strikes
retry_if_exception_type(Exception), # Catch every part (dangerous, but fun)
before_sleep=apply_fix # <--- That is the hook
)
def tough_loader(fp):
# Check if we've a suggested fix for this file, otherwise default to comma
params = fix_state.get(fp, {"sep": ","})
print(f"🔄 Attempting to load with: {params}")
df = pd.read_csv(fp, **params)
return df
Does it actually work?
To check this, I created a purposefully broken file called messy_data.csv. I made it pipe-delimited (|) but didn’t tell the script.
Once I ran tough_loader('messy_data.csv'), the script crashed, paused for a moment while it “thought,” after which fixed itself mechanically.

It feels surprisingly satisfying to observe the code fail, diagnose itself, and get well with none human intervention.
The “Gotchas” (Because Nothing is Perfect)
I don’t need to oversell this solution, as there are definitely risks involved.
The Cost
First, do not forget that each time your pipeline breaks, you're making an API call.
That may be advantageous for a couple of errors, but when you will have an enormous job processing, let’s say about 100,000 files, and a nasty deployment causes of them to interrupt without delay, you would wake as much as a really nasty surprise in your OpenAI bill.
When you’re running this at scale, I highly recommend implementing a circuit breaker or switching to an area model like Llama-3 via Ollama to maintain your costs down.
Data Safety
While I'm only sending the primary 4 lines of the file to the LLM, you might want to be very careful about what's those lines. In case your data incorporates Personally Identifiable Information (PII), you're effectively sending that sensitive data to an external API.
When you work in a regulated industry like healthcare or finance, please use an area model.
Seriously.
Don't send patient data to GPT-4 simply to fix a comma error.
The “Boy Who Cried Wolf”
Finally, there are occasions when data should fail.
If a file is empty or corrupt, you don’t want the AI to hallucinate a method to load it anyway, potentially filling your DataFrame with garbage.
Pydantic filters the bad data, nevertheless it isn’t magic. You will have to watch out not to cover real errors that you just really want to repair yourself.
Conclusion and takeaway
You would argue that using an AI to repair CSVs is overkill, and technically, you may be right.
But in a field as fast-moving as data science, the most effective engineers aren’t those clinging to the methods they learned five years ago; they're those always experimenting with latest tools to unravel old problems.
Truthfully, this project was only a reminder to remain flexible.
We are able to’t just keep guarding our old pipelines; we've to maintain finding ways to enhance them. On this industry, the most precious skill isn’t writing code faster; reasonably, it’s having the curiosity to try a complete latest way of working.
