Plan–Code–Execute: Designing Agents That Create Their Own Tools

today deal with how multiple agents coordinate while choosing tools from a predefined toolbox. While effective, this design quietly assumes that the tools required for a task are known prematurely.

Let’s challenge that assumption in this text.

Modern LLMs can already decompose complex goals into multi-step plans, reason about execution order, and generate correct, executable code. In other words, they don’t just tools — they will create them.

This text explores using a plan–code–execute agentic architecture to construct an end-to-end Explainability Agent, where tools are generated on demand, illustrated through an actual world scenario.

The architecture proposed here is well adaptable to other scenarios as well.

The Scenario

We can be explaining the outcomes of a Graph Neural Network (GNN) model for demand forecasting, that I even have explored in my previous article. We now have the next data signals to clarify:

The Spatial features of the nodes (SKUs).
The Temporal features (Sales Order, Production etc) added to the graph nodes.
Results from a trained to forecast demand for a SKU.
Results from a GNN trained to detect anomalies in production.

All of those generated a series of analyses, results and charts which should be explained holistically, from a planner’s perspective, that’s grounded in facts and domain understanding.

Explainability Agent Architecture

We’ll create a plan-code-execute agentic workflow. To have granular control, interpretability and suppleness in tool generation, we’ll construct fully custom agents fairly than working through the abstractions of one among the well-known orchestration frameworks. Following agents are created:

1. The Analyst Agent

We want an analyst agent which can be the eyes of the appliance. Its job is to the environment () which the remainder of the appliance will use. It prevents hallucinations reminiscent of the planner agent assuming the sales file is called when the right file name is within the dataset.

Along with just scanning what’s there, it also opens the files and creates a dynamic schema. As an illustration, has columns [Date, Amount] and demand_vs_production.csv has [Predicted, Actual]. This forms the context for the planning agent.

2. The Planner

The Planner receives the user’s high-level goal and the Analyst’s schema. It functions because the Brain of the appliance. Its job is to the request right into a dependency graph of steps. To do this, it first infers intent from the user request. If the user asks for “Accuracy,” it knows to focus on the Forecast Results folder. In the event that they ask for “Sales history,” it targets Raw Data. The output is a JSON plan which has steps clearly describing the duty and likewise as well as a flag called CODE or TEXT. Tools are generated by the Coder agent just for the CODE steps.

Sample output: A JSON plan.

(CODE)
(CODE)
(CODE)

3. The Coder

The Coder is the hands of the appliance, which does the majority of the work. The Coder receives each step from the plan and the schema. For each plan CODE step, it writes a standalone, executable Python script.

It handles library imports (pandas, networkx), absolute paths, and error catching. It creates self reliant software, not only code snippets.

4. The Executor

The Executor runs the generated script in a controlled environment. With a purpose to do this, it captures stdout and stderr. If a script fails (e.g., FileNotFound), the logs provide the feedback loop for potential self-correction (though because of the grounding provided by the Analyst’s schema this doesn’t occur).

5. The Reporter

Finally, the Reporter reads the logs and artifacts (CSVs, PNGs) produced by the scripts and answers the user’s original query, within the tone, depth and persona required by the user. It functions because the Voice of the appliance.

The agents are called in sequence using an easy orchestration script (). A comprises the trail of the dataset folder containing the nodes and edges information, and to the outcomes folders of the demand forecasting and production anomaly detection models. Each of the agents and the orchestration script is lower than 100 lines of code. The LLM used is .

The most important errors encountered during testing was the coder agent referring an information source that shouldn’t be present (hallucination), for which I strengthened the coder prompt to strictly follow the plan step context. Also, the analyst agent was initially not there, and was added as a grounding step before the planner to forestall hallucination.

Here is the for reference

import json
from ..llm.client import GeminiClient

class Planner:
    def __init__(self):
        self.llm = GeminiClient()

    def create_plan(self, user_prompt, dataset_context=""):
        """Decomposes user prompt into execution steps."""
        system_prompt = f"""
        You're a Senior Project Manager for a Data Science team.
        Your goal is to interrupt down a high-level user request into a listing of specific, executable steps.
        
        --- DATASET AVAILABLE (Use EXACT Filenames) ---
        {dataset_context}
        -----------------------------------------------
        
        --- DATASET GUIDELINES ---
        - The dataset comprises each 'Raw Data' (Historical) and 'Model Results' (Predictions/Anomalies).
        - CAREFULLY infer which source is required based on the user's goal. 
        - Example: "Sales Evaluation" -> Raw Data. "Forecast Accuracy" -> Model Results.
        
        Output format: JSON list of objects with fields:
        - step_id: int
        - name: str (short title)
        - description: str (detailed instruction for the coder)
        - type: "CODE" (if it requires writing a script) or "TEXT" (if only a summary)
        
        --- SCOPE RESTRICTION ---
        - DO NOT USE Machine Learning (LinearRegression, Training, Random Forest).
        - USE ONLY Statistical Evaluation (Mean, Median, Std Dev, Z-Rating, Rolling Averages).
        - FOCUS on Data Visualization (Line plots, Histograms).
        - If the user asks for "forecast" or "accuracy", PREFER WAPE (Weighted MAPE) over standard MAPE.
        
        Example:
        User: "Analyze SKU001"
        Output:
        [
            {{"step_id": 1, "name": "Load Data", "description": "Load Nodes.csv and filter for SKU001 to get basic info.", "type": "CODE"}},
            {{"step_id": 2, "name": "Temporal Analysis", "description": "Load Production.csv and Sales.csv. Calculate monthly volume and trend for SKU001.", "type": "CODE"}}
        ]
        """
        
        full_prompt = f"{system_prompt}nnUser Request: {user_prompt}nJSON Plan:"
        response = self.llm.generate_content(full_prompt)
        
        # basic cleanup for JSON markdown blocks
        clean_response = response.replace("```json", "").replace("```", "").strip()
        print(f"n[DEBUG] Raw Planner Response:n{response}n") # Added for debugging
        try:
            return json.loads(clean_response)
        except json.JSONDecodeError:
            print(f"Did not parse plan: {response}")
            return []

Experiment Results

We’ll deal with the end-to-end explainability for SKU . That is the node with the utmost sales volume. Planners and management can be enthusiastic about understanding issues early to be able to fulfill demand and maintain service quality.

We’ll construct the explainability in 4 steps:

Analyse the shared relations of this SKU with its neighbors in the availability network.
Analyse the trend in sales volume.
Correlate with the production signal, note anomalies.
Create a comprehensive report combining with demand forecast quality to clarify the health of this SKU.

Spatial Evaluation

The goal set for the agent was

The planner agent created a 5-step plan for loading node and edge data, extracting neighbors and edges attributes, preparing a graph visualization with labels and making a report for the spatial structure.

Here is the plan

[
    {
        "step_id": 1,
        "name": "Load Node and Edge Data",
        "description": "Load 'Nodes.csv' to confirm 'SOS001L12P' as a valid node. Load all edge datasets: 'Edges (Plant).csv', 'Edges (Product Group).csv', 'Edges (Product Sub-Group).csv', and 'Edges (Storage Location).csv'. These files contain the structural information needed for spatial analysis.",
        "type": "CODE"
    },
    {
        "step_id": 2,
        "name": "Extract Direct Neighbors and Edge Attributes",
        "description": "Iterate through each loaded edge dataset. For each dataset, filter rows where 'SOS001L12P' is present in either the 'node1' or 'node2' column. For each matching record, identify the neighboring node and extract the specific edge attribute that defines the relationship (e.g., 'Plant' from 'Edges (Plant).csv', 'GroupCode' from 'Edges (Product Group).csv', 'SubGroupCode' from 'Edges (Product Sub-Group).csv', 'Storage Location' from 'Edges (Storage Location).csv'). Consolidate all unique neighbors and their corresponding labeled relationships into a single structure.",
        "type": "CODE"
    },
    {
        "step_id": 3,
        "name": "Prepare Graph for Visualization",
        "description": "Construct a graph object (e.g., using NetworkX) with 'SOS001L12P' as the central node. Add all identified direct neighbors as nodes and create edges between 'SOS001L12P' and each neighbor. Assign the extracted edge attributes (e.g., 'Plant: Plant A', 'GroupCode: GRP1') as labels to these edges for clarity in the visualization.",
        "type": "CODE"
    },
    {
        "step_id": 4,
        "name": "Visualize Graph Edges with Labels",
        "description": "Generate a clear graph visualization of the spatial structure around 'SOS001L12P'. The plot should display 'SOS001L12P' prominently in the center, with its direct neighbors connected by edges. Each edge must be explicitly labeled with the type of relationship and its specific attribute value (e.g., 'Plant: P001', 'Product Group: PG002'). Ensure appropriate layout and styling for readability.",
        "type": "CODE"
    },
    {
        "step_id": 5,
        "name": "Summarize Spatial Connectivity",
        "description": "Provide a textual summary of the spatial structure around 'SOS001L12P'. Describe the types of entities it is connected to (Plants, Product Groups, Product Sub-Groups, Storage Locations), list the specific neighbors identified, and briefly explain the nature of these connections based on the edge labels.",
        "type": "TEXT"
    }
]

The coder agent created 4 tools, one for every of the CODE steps

And the first output is the graph of this SKU’s relations with its neighbors with all of the relations labeled, reminiscent of shared plants, storage locations, product groups etc.

And a temporary report is generated of the findings which notes:

Full report here

An intensive evaluation of the spatial structure around node `SOS001L12P` has been accomplished.

**Evaluation Findings:**
1.  **Node Validation:** The node `SOS001L12P` was confirmed as a legitimate node throughout the master data.
2.  **Neighbor Relationships:** The evaluation identified 39 distinct relationships connecting `SOS001L12P` to 39 unique neighboring nodes. These relationships are categorized by `RelationshipType` and `RelationshipValue`:
    *   **Plant:** `SOS001L12P` is connected to multiple neighbors via various 'Plant' codes (e.g., 2103, 2111, 2112, 2114, 2115, 2116, 2117, 2118, 2119, 2120, 2121, 2122).
    *   **GroupCode:** `SOS001L12P` shares 'GroupCode' 'S' with neighbors like `SOS008L02P`, `SOS005L04P`, `SOS003L04P`, `SOS002L09P`, `SOS500M24P`, `SOS250M48P`, `SOP001L12P`, `SO0005L04P`, `SO0002L09P`, `SO0001L12P`, `SO0500M24P`, `SE500G24P`, and `SE200G24P`.
    *   **SubGroupCode:** `SOS001L12P` is linked by 'SubGroupCode' 'SOS' to nodes reminiscent of `SOS008L02P`, `SOS005L04P`, `SOS003L04P`, `SOS002L09P`, `SOS500M24P`, and `SOS250M48P`.
    *   **Storage Location:** `SOS001L12P` is related to its neighbors through various 'Storage Location' codes (e.g., 330.0, 1130.0, 1230.0, 1430.0, 1530.0, 1630.0, 1730.0, 1830.0, 1930.0, 2030.0, 2130.0, 2230.0).

The total list of neighbors and their specific relationships might be present in `C:/Auto-agent/workspace/outputSOS001L12P_neighbors_relationships.csv`.

**Visualization:**
A graph visualization has been generated, centering on `SOS001L12P` and displaying all its direct neighbors and their connecting edges with labels indicating the `RelationshipType` and `RelationshipValue`. The resulting graph comprises 40 nodes (including `SOS001L12P`) and 39 edges.

The visualization is accessible at: `C:/Auto-agent/workspace/outputSOS001L12P_neighbor_graph.png`

Evaluation of Sales Order (Demand) signal

Since we’re enthusiastic about the accuracy of the demand forecasting model, it is vital to analyse the sales trends. The goal was set to:

“

The planner generated a 6-step plan to load and process sales data, create 7-day and 30-day rolling averages, calculate outliers, visualize trends and prepare a report.

The plan is here

[
    {
        "step_id": 1,
        "name": "Load Sales Order Data for SOS001L12P",
        "description": "Load the 'Sales Order.csv' file from C:AnomalyGNNdatasetRaw DatasetTemporal DataUnitSales Order.csv. Focus on the 'Date' and 'SOS001L12P' columns, treating 'SOS001L12P' as the sales volume.",
        "type": "CODE"
    },
    {
        "step_id": 2,
        "name": "Preprocess Temporal Data",
        "description": "Convert the 'Date' column to datetime objects and set it as the DataFrame index. Ensure the sales volume column (SOS001L12P) is numerical. Handle any missing values, potentially by filling with 0 or forward/backward fill if appropriate for sales data (specify chosen method).",
        "type": "CODE"
    },
    {
        "step_id": 3,
        "name": "Analyze Sales Volume Trends",
        "description": "Calculate 7-day and 30-day rolling averages for the 'SOS001L12P' sales volume. These will help identify short-term and long-term trends in the sales signal. Store these as new columns in the DataFrame.",
        "type": "CODE"
    },
    {
        "step_id": 4,
        "name": "Detect Outliers using Z-Score",
        "description": "Calculate the Z-score for the 'SOS001L12P' sales volume. Identify potential outliers by marking data points where the absolute Z-score exceeds a predefined threshold (e.g., 2 or 3 standard deviations). Store the Z-score and an 'Is_Outlier' boolean column.",
        "type": "CODE"
    },
    {
        "step_id": 5,
        "name": "Visualize Sales Order Temporal Signal",
        "description": "Create a time-series line plot showing the 'SOS001L12P' sales volume over time. Overlay the 7-day and 30-day rolling averages. Highlight the identified outliers on the plot to visually represent abnormalities in volume trends.",
        "type": "CODE"
    },
    {
        "step_id": 6,
        "name": "Summarize Trends and Abnormalities",
        "description": "Provide a textual summary of the observed trends (e.g., seasonality, growth/decline), the frequency and magnitude of identified outliers, and any other notable abnormalities or shifts in the sales volume patterns for SOS001L12P. This summary should interpret the visualizations and statistical measures.",
        "type": "TEXT"
    }
]

The coder agent created 5 tools and the first output of the pipeline is the next trend. The notable remark is a downward trend within the second half of the duration which could possibly be a cause for concern that should be investigated.

The evaluation report also notes this declining trend.

“

Here is the total report

**Evaluation of SOS001L12P Sales Order Temporal Signal**

This evaluation covers sales data for product SOS001L12P from January 1, 2023, to August 9, 2023.

**1. Volume Trend Through the Duration:**
The sales volume for SOS001L12P shows a transparent and protracted declining trend throughout the analyzed period.
*   In early January 2023, each day sales volumes were high, with individual days reaching as much as 21,394 units. The 30-day rolling average was roughly 12,629 by January tenth.
*   By early August 2023, the sales volume had significantly decreased. The 30-day rolling average plummeted to roughly 5,165 by August ninth, indicating a considerable drop in consistent sales activity over the eight-month period.

**2. Persistent Declines and Other Issues:**
*   **Persistent Decline:** The consistent decrease in each each day sales and the rolling averages (7-day and 30-day) from January to August clearly indicates a persistent decline in sales volume for SOS001L12P.
*   **Anomalies:** The evaluation identified 14 specific days classified as anomalies. These were characterised by unusually low sales volumes, either because of a Z-score below -2.0 or three consecutive days of negative percentage change. Notable severe drops include:
    *   January 4, 2023: Sales dropped to 10,920.08.
    *   March 8, 2023: Sales recorded an exceptionally low value of 200.00.
*   **Zero Sales:** Towards the tip of the analyzed period, there have been instances of zero sales recorded (e.g., August eighth and ninth), further exacerbating the declining trend and highlighting critical issues in sales performance.

Evaluation of Production Signal

It’s important to correlate the production anomalies report back to see if there’s a causal indication as to why the sales is perhaps declining. To do that, we set the next goal for the agent:

“

A 7-step plan was generated much like the sales evaluation to calculate trends and outliers, but in addition, to visualize anomalies generated by the anomaly detection model.

Here is the plan

[
    {
        "step_id": 1,
        "name": "Load Production Data for SOS001L12P",
        "description": "Load the 'Production .csv' file from 'C:AnomalyGNNdatasetRaw DatasetTemporal DataUnit' into a pandas DataFrame. Select only the 'Date' and 'SOS001L12P' columns for analysis.",
        "type": "CODE"
    },
    {
        "step_id": 2,
        "name": "Clean and Prepare Production Data",
        "description": "Convert the 'Date' column to datetime objects and set it as the DataFrame index. Ensure the 'SOS001L12P' column is numeric. Handle any missing values in 'SOS001L12P' by filling with 0, assuming missing production data means zero units produced.",
        "type": "CODE"
    },
    {
        "step_id": 3,
        "name": "Calculate Rolling Averages for Trend",
        "description": "Compute a 7-day and a 30-day rolling average for the 'SOS001L12P' production volume. These rolling averages will help to visualize and identify short-term and long-term trends in the data.",
        "type": "CODE"
    },
    {
        "step_id": 4,
        "name": "Visualize Production Volume and Trends",
        "description": "Create a time series line plot showing the daily 'SOS001L12P' production volume. Overlay the 7-day and 30-day rolling averages on the same plot to visually represent the trend. Ensure proper labeling for axes and a clear title.",
        "type": "CODE"
    },
    Z-score,
    {
        "step_id": 6,
        "name": "Visualize Outliers and Abnormalities",
        "description": "Generate a time series line plot of the 'SOS001L12P' production volume. On this plot, distinctly mark or color the data points that were identified as outliers in the previous step to highlight abnormalities in volume trends.",
        "type": "CODE"
    },
    {
        "step_id": 7,
        "name": "Summarize Trend, Outliers, and Abnormalities",
        "description": "Provide a textual summary of the observed overall trend (e.g., increasing, decreasing, stable, seasonal), describe the characteristics of the identified outliers (e.g., magnitude, frequency, specific dates), and comment on any notable abnormalities in the production volume patterns based on the visualizations and statistical analysis.",
        "type": "TEXT"
    }
]

Appropriate tools were created, executed and outputs produced. Following are the production trend chart and more importantly, the chart showing a big variety of anomalies within the second half of the period, which is perhaps impacting sales.

Anomaly chart showing significant anomalies within the second half

That is noted within the report generated:

“

Full report here

Here's an evaluation of the production temporal signal for SOS001L12P:

**1. Data Overview:**
The production data for SOS001L12P was successfully loaded and preprocessed, covering 221 days from January 1, 2023, to August 9, 2023. The 'Date' column was accurately parsed, and the 'SOS001L12P' production volume column was confirmed to be numeric with no missing values.

**2. Trend Evaluation:**
*   **Rolling Averages:** 7-day and 30-day rolling averages were computed to smooth out each day fluctuations and highlight underlying trends.
*   The plot `SOS001L12P_production_trends.png` (saved to `C://Auto-agent/workspace/output`) visually represents the each day production volume alongside these rolling averages, allowing for remark of the general temporal trend.
*   Taking a look at the tail of the info, the 7-day and 30-day rolling averages show a general decline in production volume towards the tip of the observed period (early August), with the 30-day average dropping from 5582.97 on August seventh to 5075.73 on August ninth, and the 7-day average dropping from 3724.71 to 2911.00 in the identical period.

**3. Outliers (Z-score Method):**
*   Using a Z-score threshold of |Z-score| > 2.5, **no potential outliers were identified** within the SOS001L12P production data. This implies that while there is perhaps variations, no individual data points significantly deviated from the mean (7480.90) by greater than 2.5 standard deviations (4729.55).

**4. Abnormalities in Volume Trends:**
*   Despite the dearth of Z-score outliers, a more sophisticated anomaly detection method **identified 17 anomalous data points** within the SOS001L12P production.
*   These anomalies are visualized within the plot `SOS001L12P_production_with_anomalies_plot.png` (saved to `C://Auto-agent/workspace/output`), which might highlight specific dates where production volume exhibited unusual behavior in comparison with its typical pattern. A big drop to 0 on August ninth, as an illustration, is more likely to be one such anomaly, suggesting a halt or severe reduction in production.

End-to-End Supply Planner’s report

Now that we have now performed spatio-temporal evaluation on the SKU SOS001L12P, the following logical step is to correlate the findings to the demand forecast model to see the way it is adapting to the trends. And prepare a full report for this SKU health that the planner can take to the management.

To do that, we set the next goal:

“

The planner generated a plan to correlate production, sales and demand forecast, evaluate forecasting model performance using the first metric WAPE (Weighted Absolute Percentage Error), then prepare a comprehensive report.

Here is the plan

[
    {
        "step_id": 1,
        "name": "Load and Consolidate SOS001L12P Data",
        "description": "Load the following datasets for SOS001L12P: n1. Production rolling averages: C:Auto-agentworkspaceoutputproduction analysis - 2production_rolling_averages_SOS001L12P.csv (columns: Date, SOS001L12P, SOS001L12P_7_day_avg, SOS001L12P_30_day_avg)n2. Production Z-Scores: C:Auto-agentworkspaceoutputproduction analysis - 2SOS001L12P_production_z_scores.csv (columns: Date, Actual_Production, Z_Score)n3. Anomaly detection results: C:AnomalyGNNresultstop_sku_SOS001L12P_predictions.csv (columns: Date, Is_Anomaly)n4. Demand vs. Production Forecast: C:GNNresultsdemand_vs_production_SOS001L12P.csv (columns: Date, Predicted_Demand, Actual_Sales, Actual_Production)nnMerge all these dataframes on the 'Date' column, ensuring 'Date' is converted to a datetime format for accurate time-series analysis.",
        "type": "CODE"
    },
     > 2 or 3) to visually confirm the user's observation of a downward trend and anomalies in the second half of the duration. Identify the start date of the 'second half' based on the data's time range.",
        "type": "CODE"
    ,
    {
        "step_id": 3,
        "name": "Correlate Production, Sales, and Demand Forecast",
        "description": "Generate a single time-series line plot comparing 'Actual_Production', 'Actual_Sales', and 'Predicted_Demand' for SOS001L12P. Analyze the visual correlation and note periods of significant divergence or alignment, especially during the identified 'second half' where production anomalies and downward trends were observed.",
        "type": "CODE"
    },
    {
        "step_id": 4,
        "name": "Evaluate Forecasting Model Performance (WAPE)",
        "description": "Calculate the Weighted Average Percentage Error (WAPE) for the 'Predicted_Demand' against 'Actual_Sales' for SOS001L12P using the consolidated data. Calculate WAPE for the entire duration and separately for the 'second half' of the duration (as identified in Step 2) to assess how the model performed during the period of observed production issues. Present the WAPE values and interpret their meaning regarding forecast accuracy.",
        "type": "CODE"
    },
    {
        "step_id": 5,
        "name": "Generate Comprehensive End-to-End Report for SOS001L12P",
        "description": "Provide a narrative report summarizing all findings. n1. Confirm the observed downward trend in production for SOS001L12P during the second half of the duration, citing specific dates or periods. n2. Validate the user's observation of many anomalies, providing a count or highlighting key anomaly dates. n3. Discuss the correlation between actual production, actual sales, and predicted demand, explaining how they interact, especially during the problematic second half. n4. Comment on the forecasting model's performance based on the calculated WAPE values, noting any significant differences in accuracy between the overall duration and the 'second half'. n5. Offer insights into potential reasons for observed trends or anomalies and their impact on supply chain planning.",
        "type": "TEXT"
    }
]

The combined evaluation produces a planner-ready assessment, which notes in its conclusion:

“

The total report is here

Here is an end-to-end report for SKU SOS001L12P, considering your observations and the recent analyses.

---

### **End-to-End Report: SKU SOS001L12P Evaluation (2023-01-01 to 2023-08-09)**

**1. Overview**
This report synthesizes recent production evaluation, anomaly detection, and demand forecasting data for SKU SOS001L12P, covering the period from January 1, 2023, to August 9, 2023. The evaluation specifically focuses on trends and model performance through the second half of this duration, starting roughly April 21, 2023.

**2. Production Trends and Anomalies**
The evaluation confirms your remark regarding production trends within the second half of the duration (from 2023-04-21).

*   **Downward Trend and Volatility**: While the time series generally shows a positive visual correlation between Actual Production, Actual Sales, and Predicted Demand, Actual Production appears to be more volatile and exhibits sharper drops in comparison with Actual Sales and Predicted Demand in certain segments of the second half.
*   **Production vs. Sales**: Within the second half, Actual Production (average 4866 units) was notably lower than Actual Sales (average 5136 units).
*   **Anomalies**: A complete of 17 combined anomaly points (based on 'Is_Anomaly=True' or |Z_Score| > 2.5) were identified across all the dataset. Several of those production anomalies were detected within the second half, specifically on dates reminiscent of 2023-04-21, 2023-05-05, 2023-05-21, 2023-05-22, and 2023-05-23. These anomaly periods often coincide with significant divergences where Actual Production deviates sharply from Actual Sales and Predicted Demand.

**3. Correlation: Production, Demand Forecast, and Sales**
The correlation evaluation within the second half of the duration (from 2023-04-21) reveals distinct relationships:

*   **Strong Sales-Demand Correlation**: Actual Sales and Predicted Demand show a robust positive correlation of **0.85**. This implies the forecasting model is usually effective in tracking actual sales patterns.
*   **Weak Production Correlation**: Actual Production's correlation with each Actual Sales and Predicted Demand is significantly weaker:
    *   Actual Production vs. Actual Sales: **0.35**
    *   Actual Production vs. Predicted Demand: **0.38**
*   **Divergence**: Despite a general visual correlation across all the period, the second half exhibits noticeable periods where Actual Production diverges from Actual Sales and Predicted Demand. This is especially evident through the observed anomalies, indicating that production output didn't consistently align with market demand or sales in these instances.

**4. Demand Forecasting Model Performance**
The forecasting model's performance was assessed using the Weighted Average Percentage Error (WAPE):

*   **Overall Performance**: The general WAPE for SOS001L12P across all the duration is **35.44%**.
*   **Second Half Performance**: For the 'second half' of the duration (from 2023-04-21), the WAPE is **29.70%**.
*   **Interpretation**: The WAPE within the second half is lower than the general WAPE. This means that, despite the observed production issues and anomalies during this era, the demand forecasting model's accuracy actually **improved** within the second half. The model's Predicted Demand consistently tracks Actual Sales closely, suggesting it within reason effective in capturing demand patterns for planning purposes.

**Conclusion**
For SKU SOS001L12P, the second half of the analyzed duration (April 21, 2023 - August 9, 2023) was characterised by increased volatility and a downward trend in Actual Production, accompanied by quite a few anomalies. Actual Production during this era was also lower than Actual Sales. While the demand forecasting model effectively captured Actual Sales trends (WAPE of 29.70% within the second half, showing an improvement from the general 35.44%), production struggled to align with these demands, resulting in weak correlations between production and each sales and predicted demand. These divergences, often coinciding with identified production anomalies, suggest potential supply constraints or operational disruptions impacting our ability to fulfill demand consistently.

This can be a production-grade report consisting of facts, figures and KPIs to support the interpretations derived.

And all of this was achieved with a totally flexible agentic framework that may adapt to any explainability goal desired.

Conclusion

The longer term of agentic systems shouldn’t be larger tool catalogs or more elaborate orchestration frameworks. It’s agents that may determine what must exist in the primary place.

When planning, coding, and execution are treated as reasoning steps, tools stop being static assets and develop into disposable artifacts—created, used, and discarded as the issue demands. This shifts agent design away from prompt engineering attempting to bind the agents and tools to the orchestration framework toward software generation as a core capability.

The plan–code–execute architecture demonstrates that production-grade, evidence-based evaluation can emerge from small, transparent agents with minimal glue code and minimal cost. As LLMs proceed to enhance, the most respected agents can be those who can autonomously turn intent into working software—without being told prematurely the way to do it.

In that sense, auto-planning and auto-coding agents aren’t only a convenience. They’re a natural evolution of agentic systems.

For more on the GNN model for demand forecasting referred in this text, read my article .

Reference

SupplyGraph: A Benchmark Dataset for Supply Chain Planning using Graph Neural Networks : Authors: Azmine Toushik Wasi, MD Shafikul Islam, Adipto Raihan Akib

_{Images utilized in this text are generated using Google Gemini. Charts and underlying code created by me.}

Plan–Code–Execute: Designing Agents That Create Their Own Tools

The Scenario

Explainability Agent Architecture

1. The Analyst Agent

2. The Planner

3. The Coder

4. The Executor

5. The Reporter

Experiment Results

Spatial Evaluation

Evaluation of Sales Order (Demand) signal

Evaluation of Production Signal

End-to-End Supply Planner’s report

Conclusion

Reference

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

From guardrails to governance: A CEO’s guide for securing agentic systems

Raising the Bar for Multimodal Retrieval with ViDoRe V3’s Top Model

Claude is an area to think

So yeah, I vibe-coded a log colorizer—and I be ok with it

Zero-shot image segmentation with CLIPSeg

Plan–Code–Execute: Designing Agents That Create Their Own Tools

The Scenario

Explainability Agent Architecture

1. The Analyst Agent

2. The Planner

3. The Coder

4. The Executor

5. The Reporter

Experiment Results

Spatial Evaluation

Evaluation of Sales Order (Demand) signal

Evaluation of Production Signal

End-to-End Supply Planner’s report

Conclusion

Reference

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.