Time Series Isn’t Enough: How Graph Neural Networks Change Demand Forecasting

in supply-chain planning has traditionally been treated as a time-series problem.

Each SKU is modeled independently.
A rolling time window (say, last 14 days) is used to predict tomorrow’s sales.
Seasonality is captured, promotions are added, and forecasts are reconciled downstream.

And yet, despite increasingly sophisticated models, the standard problems persist:

Chronic over-and under-stocking
Emergency production changes
Excess inventory sitting within the flawed place
High forecast accuracy on paper, but poor planning outcomes in practice

The difficulty is that demand in a supply chain is just not independent. It’s networked. For example, that is what just 12 SKUs from a typical supply chain appear to be once you map their shared plants, product groups, subgroups, and storage locations.

So when demand shifts in a single corner of the network, the consequences are felt throughout the network.

In this text, we step outside the model-first considering and have a look at the issue the way in which a supply chain actually behaves — as a connected operational system. Using an actual FMCG dataset, we show why even an easy graph-based neural network(GNN) fundamentally outperforms traditional approaches, and what meaning for each business leaders and data scientists.

An actual supply chain experiment

We tested this concept on an actual FMCG dataset (SupplyGraph) that mixes two views of the business:

Static supply-chain relationships

The dataset has 40 energetic SKUs, 9 plants, 21 product groups, 36 sub-groups and 13 storage locations. On average, each SKU has ~41 edge connections, implying a densely connected graph where most SKUs are linked to many others through shared plants or product groups..

From a planning standpoint, this network encodes institutional knowledge that always lives only in planners’ heads:

Temporal operational signals and sales outcomes

The dataset has temporal data for 221 days. For every SKU and every day, the dataset includes:

Sales orders (the demand signal)
Deliveries to distributors
Factory goods issues
Production volumes

Here is an summary of the 4 temporal signals driving the availability chain model:

Feature	Total Volume (Units)	Day by day Avg	Sparsity (Zero-Activity Days)	Max Single Day
Sales Order	7,753,184	35,082	46.14%	115,424
Delivery To Distributor	7,653,465	34,631	35.79%	66,470
Factory Issue	7,655,962	34,642	43.94%	75,302
Production	7,660,572	34,663	61.96%	74,082

As will be observed, almost half of the SKU-Day combos have zero sales. The implication being a small fraction of SKUs drives a lot of the volume. It is a classic “Intermittent Demand” problem.

Also, manufacturing occurs in infrequent, large batches (lumpy production). Downstream delivery is way smoother and more frequent (low sparsity) implying the availability chain uses significant inventory buffers.

To stabilize GNN learning and handle extreme skew, all values are transformed using, a typical practice in intermittent demand forecasting.

Key Business Metrics

What does a superb demand forecast appear to be ? We evaluate the model based on two metrics; WAPE and Bias

WAPE —Weighted Absolute Percentage Error

WAPE measures how much of your total demand volume is being mis-allocated. As an alternative of asking ““, WAPE asks the query supply-chain planners actually care about within the scenario of intermittent demand: ““

This matters because errors on high-volume SKUs cost way over errors on long-tail items. A ten% miss on a top seller is dearer than a 50% miss on a slow mover. So WAPE weights the SKU-days by volume sold, and aligns more naturally with revenue impact, inventory exposure, plant and logistics utilization (and will be further weighted by price/SKU if required).

That’s why WAPE is widely preferred over MAPE for intermittent, high-skew demand.

[
text{WAPE} =
frac{sum_{s=1}^{S}sum_{t=1}^{T} left| text{Actual}_{s,t} – text{Forecast}_{s,t} right|}
{sum_{s=1}^{S}sum_{t=1}^{T} text{Actual}_{s,t}}
]

WAPE will be calculated at different levels — product group, region or total business — and over different durations, similar to weekly or monthly.

It is necessary to notice that here, WAPE is computed at the toughest possible level — per-SKU, per-day, on intermittent demand — not after aggregating volumes across products or time. In FMCG planning practice, micro-level SKU-daily WAPE of 60–70% is usually considered acceptable for intermittent demand, whereas <60% is taken into account production-grade forecasting.

Forecast Bias — Directional Error

Bias measures whether your forecasts systematically push inventory up or down. While WAPE tells you the forecast is, Bias tells you it’s. It answers an easy but critical query: “As we’ll see in the following section, it is feasible to have zero bias while being flawed more often than not. In practice, positive bias leads to excess inventory, higher holding costs and write-offs whereas negative bias results in stock-outs, lost sales and repair penalties. In practice, a bit positive bias (2-5%) is taken into account production-safe.

[ text{Bias} = frac{1}{S} sum_{s=1}^{S} (text{Forecast}_s – text{Actual}_s) ]

Together, WAPE and Bias determine whether a model is just not just accurate, its forecasts are operationally and financially usable.

The Baseline: Forecasting Without Structure

To determine a ground floor, we start with a naïve baseline, which is “tomorrow’s sales equal today’s sales”.

[ hat{y}_{t+1} = y_t ]

This approach has:

Zero bias
No network awareness
No understanding of operational context

Despite its simplicity, it’s a powerful benchmark, especially over the short term. If a model cannot beat this baseline, it is just not learning anything meaningful.

In our experiments, the naïve approach produces a WAPE of 0.86, meaning nearly 86% of total volume is misallocated.

The bias of zero is just not a superb indicator on this case, since errors cancel out statistically while creating chaos operationally.

This results in:

Firefighting
Emergency production changes
Expediting costs

This aligns with what many practitioners experience: Easy forecasts are stable — but flawed where it matters.

Adding the Network: Spatio-Temporal GraphSAGE

We use GraphSAGE, a graph neural network that permits each SKU to aggregate information from its neighbors.

Key characteristics:

All relationships are treated uniformly.
Information is shared across connected SKUs.
Temporal dynamics are captured using a time series encoder.

This model doesn’t yet distinguish between plants, product groups, or storage locations. It simply answers the important thing query:

Implementation

While I’ll dive deeper into the information science behind the feature engineering, training, and evaluation of GraphSAGE in a subsequent article, listed below are a few of the key principles to grasp:

The graph with its nodes and edges forms the static spatial features.
The spatial encoder component of GraphSAGE, with its convolutional layers, generates spatial embeddings of the graph.
The temporal encoder (LSTM) processes the sequence of spatial embeddings, capturing the evolution of the graph over the past 14 days (using a sliding window approach).
Finally, a regressor predicts the -transformed sales for the following day.

An intuitive analogy

Imagine you’re attempting to predict the worth of your home next month. The value isn’t just influenced by the history of your personal house — like its age, maintenance, or ownership records. It’s also influenced by what’s happening in your neighborhood.

For instance:

The condition and costs of homes much like yours (similar construction quality),
How well-maintained other houses in your area are,
The provision and quality of shared services like schools, parks, or local law enforcement.

On this analogy:

Your own home’s history is just like the temporal features of a specific SKU (e.g., sales, production, delivery history).
Your neighborhood represents the graph structure (the sides connecting SKUs with shared attributes, like plants, product groups, etc.).
The history of nearby houses is just like the neighboring SKUs’ features — it’s how the behavior of other similar houses/SKUs influences yours.

The aim of coaching the GraphSAGE model is for it to learn the function f that will be applied to every SKU based on its own historical features (like sales, production, factory issues, etc.) and the historical behavior of its connected SKUs, as determined by the sting relationships (e.g., shared plant, product group, etc.). To depict it more precisely:

embedding_i(t) =
  f( own_features_i(t),
     neighbors’ features(t),
     relationships )

where those features come from the SKU’s own operational history and the history of its connected neighbors.

The Result: A Structural Step-Change

The impact is sort of remarkable:

Model	WAPE
Naïve baseline	0.86
GraphSAGE	~0.62

In practical terms:

The naïve approach misallocates nearly 86% of total demand volume
GraphSAGE reduces this error by ~27%

The next chart shows the actual vs predicted sales on the dimensions. The diagonal red line depicts perfect forecast, where predicted = actual. As will be seen, a lot of the high volume SKUs are clustered across the diagonal which depicts good accuracy.

From a business perspective, this translates into:

Fewer emergency production changes
Higher plant-level stability
Less manual firefighting
More predictable inventory positioning

Importantly, this improvement comes with none additional business rules — only by allowing information to flow across the network.

And the bias comparison is as follows:

Model	Mean Forecast	Bias (Units)	Bias %
GraphSAGE	~733	+31	~4.5%
Naïve	~701	0	0%

At under 5%, the mild forecasting bias GraphSAGE introduces is well inside production-grade limits. The next chart depicts the error within the predictions.

It may possibly be observed that:

Error is negligible for a lot of the forecasts. Recall from the temporal evaluation that sparsity in sales is 46%. This shows that the model has learned this, and is accurately predicting “Zero” (or very near it) for those SKU-days, creating the height at the middle.
The form of the bell curve is tall and narrow, which indicates high precision. Most errors are tiny and clustered around zero.
There’s little skew of the bell curve from the middle line, confirming the low bias of we calculated.

In practice, many organizations already bias forecasts deliberately to guard service levels, fairly than risk stock-outs.

Let’s have a look at the impact on the SKU level. The next chart shows the forecasts for the highest 4 SKUs by volume, denoted by red dotted lines, against the actuals.

A number of observations:

The forecast is reactive in nature. As marked in green circles in the primary chart, the forecast follows the actual on the way in which up, and in addition down without anticipating the following peak well. It is because GraphSAGE considers all relations to be homogeneous (equally essential), which is just not true in point of fact.
The model under-predicts extreme spikes and compresses the upper tail aggressively. GraphSAGE prefers stability and smoothing.

Here’s a chart showing the performance across SKUs with non-zero volumes. Two threshold lines are marked at WAPE of 60% and 75%. 3 of the 4 highest volume SKUs have a WAPE < 60% with the fourth one just above. From a planning perspective, this is a strong and balanced forecast.

Takeaway

Graph neural networks do greater than improve forecasts — they modify how demand is known. While not perfect, GraphSAGE demonstrates that structure matters greater than model complexity.

As an alternative of treating each SKU as an independent problem, it allows planners to reason over the availability chain as a connected system.

In manufacturing, that shift — from isolated accuracy to network-aware decision-making — is where forecasting begins to create real economic value.

What’s next? From Connections to Meaning

GraphSAGE showed us something powerful: SKUs don’t live in isolation — they live in networks.
But in our current model, every relationship is treated as equal.

In point of fact, that is just not how supply chains work.

A shared plant creates very different dynamics than a shared product group. A shared warehouse matters in another way from a shared brand family. Some relationships propagate demand shocks. Others dampen them.

GraphSAGE can see that SKUs are connected — nevertheless it cannot learn how or why they’re connected.

That’s where Heterogeneous Graph Transformers (HGT) are available in.

HGT allows the model to learn different behaviors for several types of relationships — letting it weigh, for instance, whether plant capability, product substitution, or logistics constraints should matter more for a given forecast.

In the following article, I’ll show how moving from “all edges are equal” to relationship-aware learning unlocks the following level of forecasting accuracy — and improves the standard of forecast by adding meaning to the network.

That’s where graph-based demand forecasting becomes truly operational.

Reference

SupplyGraph: A Benchmark Dataset for Supply Chain Planning using Graph Neural Networks : Authors: Azmine Toushik Wasi, MD Shafikul Islam, Adipto Raihan Akib

_{Images utilized in this text are synthetically generated. Charts and underlying code created by me.}

Time Series Isn’t Enough: How Graph Neural Networks Change Demand Forecasting