Beyond Code Generation: AI for the Full Data Science Workflow

feeling a relentless sense of AI FOMO. Daily, I see people sharing AI suggestions, recent agents and skills they built, and vibe-coded apps. I’m increasingly realizing that adapting quickly to AI is becoming a requirement for staying competitive as a knowledge scientist today.

But I’m not only talking about brainstorming with ChatGPT, generating code with Cursor, or polishing a report with Claude. The larger shift is that AI can now take part in a way more end-to-end data science workflow.

To make the thought concrete, I attempted it on an actual project using my Apple Health data.

A Easy Example — Apple Health Evaluation

Context

I actually have been wearing an Apple Watch each day since 2019 to trace my health data, resembling heart rate, energy burned, sleep quality, etc. This data incorporates years of behavioral signals about my day by day life, however the Apple Health app mostly surfaces it with easy trend views.

I attempted to research a two-year Apple Health export six years ago. But it surely ended up becoming considered one of those side projects that you simply never finished… My goal this time is to extract more insights from the raw data quickly with the assistance of AI.

What I needed to work with

Listed below are the relevant resources I actually have:

Raw Apple Health export data: 1.85GB in XML, uploaded to my Google Drive.
Sample code to parse the raw export to structured datasets in my GitHub repo from six years ago. However the code could possibly be outdated.

Raw XML data screenshot by the writer

Workflow without AI

A normal workflow without AI would look lots like what I attempted six years ago: Inspect the XML structure, write Python to parse it into structured local datasets, conduct EDA with Pandas and Numpy, and summarize the insights.

I’m sure every data scientist is accustomed to this process — it is just not rocket science, but it surely takes time to construct. To get to a sophisticated insights report, it will take not less than a full day. That’s why that 6-year-old repo continues to be marked as WIP…

AI end-to-end workflow

My updated workflow with AI is:

AI locates the raw data in my Google Drive and downloads it.
AI references my old GitHub code and writes a Python script to parse the raw data.
AI uploads the parsed datasets to Google BigQuery. After all, the evaluation is also done locally without BigQuery, but I set it up this approach to higher resemble an actual work environment.
AI runs SQL queries against BigQuery to conduct the evaluation and compile an evaluation report.

Essentially, AI handles nearly every step from data engineering to evaluation, with me acting more as a reviewer and decision-maker.

AI-generated report

Now, let’s see what Codex was capable of generate with my guidance and a few back-and-forth in half-hour, excluding the time to establish the environment and tooling.

I selected Codex because I mainly use Claude Code at work, so I desired to explore a unique tool. I used this likelihood to establish my Codex environment from scratch so I can higher evaluate all the hassle required.

You’ll be able to see that this report is well structured and visually polished. It summarized invaluable insights into annual trends, exercise consistency, and the impact of travel on activity levels. It also provided recommendations and stated limitations and assumptions. What impressed me most was not only the speed, but how quickly the output began to appear like a stakeholder-facing evaluation as an alternative of a rough notebook.

Please note that the report is sanitized for my data privacy.

Codex-generated report (numbers adjusted for data privacy, screenshot by the writer)

How I Actually Did It

Now that we now have seen the impressive work AI can generate in half-hour, let me break it down and show you all of the steps I took to make it occur. I used Codex for this experiment. Like Claude Code, it might run within the desktop app, an IDE, or the CLI.

1. Arrange MCP

To enable Codex to access tools, including Google Drive, GitHub, and Google BigQuery, the subsequent step was to establish Model Context Protocol (MCP) servers.

The simplest approach to arrange MCP is to ask Codex to do it for you. For instance, once I asked it to establish Google Drive MCP, it configured my local files quickly with clear next steps on easy methods to create an OAuth client within the Google Cloud Console.

It doesn’t at all times succeed on the primary try, but persistence helps. After I asked it to establish BigQuery MCP, it failed not less than 10 times before the connection succeeded. But every time, it provided me with clear instructions on easy methods to test it and what info was helpful for troubleshooting.

Codex MCP arrange screenshots by the writer

2. Make a plan with the Plan Mode

After organising the MCPs, I moved to the actual project. For an advanced project that involves multiple data sources/tools/questions, I often start with the Plan Mode to choose the implementation steps. In each Claude Code and Codex, you may enable Plan Mode with /plan. It really works like this: you outline the duty and your rough plan, the model asks clarifying questions and proposes a more detailed implementation plan so that you can review and refine. Within the screenshots below, you could find my first iteration with it.

Plan Mode screenshots by the writer – Part 1

Plan Mode screenshots by the writer – Part 2

Plan Mode screenshots by the writer – Part 3

3. Execution and iteration

After I hit “Yes, implement this plan”, Codex began executing by itself, following the steps. It worked for 13 minutes and generated the primary evaluation below. It moved fast across different tools, but it surely did the evaluation locally because it encountered more issues with the BigQuery MCP. After one other round of troubleshooting, it was capable of upload the datasets and run queries in BigQuery properly.

First evaluation output screenshot by the writer

Nevertheless, the first-pass output was still shallow, so I guided it to go deeper with follow-up questions. For instance, I actually have flight tickets and travel plans from past travels in my Google Drive. I asked it to seek out them and analyze my activity patterns during trips. It successfully positioned those files, extracted my travel days, and ran the evaluation.

After a couple of iterations, it was capable of generate a way more comprehensive report, as I shared firstly, inside half-hour. You’ll find its code here. That was probably some of the vital lessons from the exercise: AI moved fast, but depth still got here from iteration and higher questions.

Codex locating my past travel dates (screenshot by the writer)

Takeaways for Data Scientists

What AI Changes

Above is a small example of how I used Codex and MCPs to run an end-to-end evaluation without manually writing a single line of code. What are the takeaways for data scientists at work?

Think beyond coding assistance. Slightly than using AI just for coding and writing, it’s value expanding its role across the complete data science lifecycle. Here, I used AI to locate raw data in Google Drive and upload parsed datasets to BigQuery. There are numerous more AI use cases related to data pipelining and model deployment.
Context becomes a force multiplier. MCPs are what made this workflow way more powerful. Codex scanned my Google Drive to locate my travel dates and skim my old GitHub code to seek out sample parsing code. Similarly, you may enable other company-approved MCPs to assist your AI (and yourself) higher understand the context. For instance:
– Connect with Slack MCP and Gmail MCP to look for past relevant conversations.
– Use Atlassian MCP to access the table documentation on Confluence.
– Arrange Snowflake MCP to explore the info schema and run queries.
Rules and reusable skills matter. Although I didn’t reveal it explicitly in this instance, you must customize rules and create skills to guide your AI and extend its capabilities. These topics are value their very own article next time 🙂

How the Role of Data Scientists Will Evolve

But does this mean AI will replace data scientists? This instance also sheds light on how data scientists’ roles will pivot in the long run.

Less manual execution, more problem-solving. In the instance above, the initial evaluation Codex generated was very basic. The standard of AI-generated evaluation depends heavily on the standard of your problem framing. You have to define the query clearly, break it into actionable tasks, discover the best approach, and push the evaluation deeper.
Domain knowledge is critical. Domain knowledge continues to be very much required to interpret results accurately and supply recommendations. For instance, AI noticed my activity level had declined significantly since 2020. It couldn’t discover a convincing explanation, but said: “.” But the true reason behind it, as you may have realized, is the pandemic. I began working from home in early 2020, so naturally, I burned fewer calories. It is a quite simple example of why domain knowledge still matters — even when AI can access all of the past docs in your organization, it doesn’t mean it’ll understand all of the business nuances, and that’s your competitive advantage.
This instance was relatively straightforward, but there are still many classes of labor where I’d not trust AI to operate independently today, especially projects that require stronger technical and statistical judgment, resembling causal inference.

Necessary Caveats

Last but not least, there are some considerations you may have to take into account while using AI:

Data security. I’m sure you’ve heard this over and over already, but let me repeat it over again. The information security risk of using AI is real. For a private side project, I can set things up nonetheless I would like and take my very own risk (truthfully, granting AI full access to Google Drive looks like a dangerous move, so that is more for illustration purposes). But at work, at all times follow your organization’s guidance on which tools are secure to make use of and the way. And be certain to read through each command before clicking “approve”.
Double-check the code. For my easy project, AI can write accurate SQL without difficulty. But in additional complicated business settings, I still see AI make mistakes in its code every now and then. Sometimes, it joins tables with different granularities, causing fanning out and double-counting. Other times, it misses critical filters and conditions.
AI is convenient, but it surely might accomplish your ask with unexpected negative effects… Let me inform you a comic story to finish this text. This morning, I turned on my laptop and saw an alert of no disk storage left — I actually have a 512GB SSD MacBook Pro, and I used to be pretty sure I had only used around half of the storage. Since I used to be fiddling with Codex last night, it became my first suspect. So I actually asked it, “”. It responded, “”. Then I dug up my files and saw a 142GB “bigquery-mcp-wrapper.log”… Likely, Codex arrange this log when it was troubleshooting the BigQuery MCP setup. Later within the actual evaluation task, it exploded into a large file. So yes, this magical wishing machine comes at a value.

This experience summed up the tradeoff well for me: AI can dramatically compress the gap between raw data and useful evaluation, but getting essentially the most out of it still requires judgment, oversight, and a willingness to debug the workflow itself.

Beyond Code Generation: AI for the Full Data Science Workflow

A Easy Example — Apple Health Evaluation

Context

What I needed to work with

Workflow without AI

AI end-to-end workflow

AI-generated report

How I Actually Did It

1. Arrange MCP

2. Make a plan with the Plan Mode

3. Execution and iteration

Takeaways for Data Scientists

What AI Changes

How the Role of Data Scientists Will Evolve

Necessary Caveats

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

How you can Make Your AI App Faster and More Interactive with Response Streaming

Google’s latest AI audio model

Protecting People from Harmful Manipulation — Google DeepMind

What the Bits-over-Random Metric Modified in How I Think About RAG and Agents

ARC-AGI-3 resets frontier AI scoreboard

Beyond Code Generation: AI for the Full Data Science Workflow

A Easy Example — Apple Health Evaluation

Context

What I needed to work with

Workflow without AI

AI end-to-end workflow

AI-generated report

How I Actually Did It

1. Arrange MCP

2. Make a plan with the Plan Mode

3. Execution and iteration

Takeaways for Data Scientists

What AI Changes

How the Role of Data Scientists Will Evolve

Necessary Caveats

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.