Demand Forecasting with Darts: A Tutorial

-

Have you ever gathered all of the relevant data?

Let’s assume your organization has provided you with a transactional database with sales of various products and different sale locations. This data is named panel data, which suggests that you just can be working with many time series concurrently.

The transactional database will probably have the next format: the date of the sale, the situation identifier where the sale took place, the product identifier, the amount, and possibly the monetary cost. Depending on how this data is collected, it should be aggregated in a different way, by time (each day, weekly, monthly) and by group (by customer or by location and product).

But is that this all the information you wish for demand forecasting? Yes and no. In fact, you possibly can work with this data and make some predictions, and if the relations between the series should not complex, a straightforward model might work. But if you happen to are reading this tutorial, you might be probably fascinated with predicting demand when the information will not be as easy. On this case, there’s additional information that is usually a gamechanger if you have got access to it:

  • Historical stock data: It’s crucial to pay attention to when stockouts occur, because the demand could still be high when sales data doesn’t reflect it.
  • Promotions data: Discounts and promotions can even affect demand as they affect the purchasers’ shopping behavior.
  • Events data: As discussed later, one can extract time features from the date index. Nevertheless, holiday data or special dates can even condition consumption.
  • Other domain data: Every other data that might affect the demand for the products you might be working with will be relevant to the duty.

Let’s code!

For this tutorial, we are going to work with monthly sales data aggregated by product and sale location. This instance dataset is from the Stallion Kaggle Competition and records beer products (SKU) distributed to retailers through wholesalers (Agencies). Step one is to format the dataset and choose the columns that we wish to make use of for training the models. As you possibly can see within the code snippet, we’re combining all of the events columns into one called ‘special days’ for simplicity. As previously mentioned, this dataset misses stock data, so if stockouts occurred we may very well be misinterpreting the realdemand.

# Load data with pandas
sales_data = pd.read_csv(f'{local_path}/price_sales_promotion.csv')
volume_data = pd.read_csv(f'{local_path}/historical_volume.csv')
events_data = pd.read_csv(f'{local_path}/event_calendar.csv')

# Merge all data
dataset = pd.merge(volume_data, sales_data, on=['Agency','SKU','YearMonth'], how='left')
dataset = pd.merge(dataset, events_data, on='YearMonth', how='left')

# Datetime
dataset.rename(columns={'YearMonth': 'Date', 'SKU': 'Product'}, inplace=True)
dataset['Date'] = pd.to_datetime(dataset['Date'], format='%Y%m')

# Format discounts
dataset['Discount'] = dataset['Promotions']/dataset['Price']
dataset = dataset.drop(columns=['Promotions','Sales'])

# Format events
special_days_columns = ['Easter Day','Good Friday','New Year','Christmas','Labor Day','Independence Day','Revolution Day Memorial','Regional Games ','FIFA U-17 World Cup','Football Gold Cup','Beer Capital','Music Fest']
dataset['Special_days'] = dataset[special_days_columns].max(axis=1)
dataset = dataset.drop(columns=special_days_columns)

Image by writer

Have you ever checked for improper values?

While this part is more obvious, it’s still value mentioning, as it could possibly avoid feeding improper data into our models. In transactional data, search for zero-price transactions, sales volume larger than the remaining stock, transactions of discontinued products, and similar.

Are you forecasting sales or demand?

This can be a key distinction we must always make when forecasting demand, because the goal is to foresee the demand for products to optimize re-stocking. If we take a look at sales without observing the stock values, we may very well be underestimating demand when stockouts occur, thus, introducing bias into our models. On this case, we will ignore transactions after a stockout or attempt to fill those values accurately, for instance, with a moving average of the demand.

Let’s code!

Within the case of the chosen dataset for this tutorial, the preprocessing is kind of easy as we don’t have stock data. We’d like to correct zero-price transactions by filling them with the right value and fill the missing values for the discount column.

# Fill prices
dataset.Price = np.where(dataset.Price==0, np.nan, dataset.Price)
dataset.Price = dataset.groupby(['Agency', 'Product'])['Price'].ffill()
dataset.Price = dataset.groupby(['Agency', 'Product'])['Price'].bfill()

# Fill discounts
dataset.Discount = dataset.Discount.fillna(0)

# Sort
dataset = dataset.sort_values(by=['Agency','Product','Date']).reset_index(drop=True)

Do you’ll want to forecast all products?

Depending on some conditions resembling budget, cost savings and the models you might be using you would possibly not wish to forecast the entire catalog of products. Let’s say after experimenting, you select to work with neural networks. These are frequently costly to coach, and take more time and a lot of resources. For those who decide to train and forecast the whole set of products, the prices of your solution will increase, possibly even making it not value investing in in your company. On this case, alternative is to segment the products based on specific criteria, for instance using your model to forecast just the products that produce the best volume of income. The demand for remaining products may very well be predicted using an easier and cheaper model.

Are you able to extract any more relevant information?

Feature extraction will be applied in any time series task, as you possibly can extract some interesting variables from the date index. Particularly, in demand forecasting tasks, these features are interesting as some consumer habits may very well be seasonal.Extracting the day of the week, the week of the month, or the month of the 12 months may very well be interesting to assist your model discover these patterns. It is vital to encode these features accurately, and I counsel you to examine cyclical encoding because it may very well be more suitable in some situations for time features.

Let’s code!

The very first thing we’re doing on this tutorial is to segment our products and keep only those which can be high-rotation. Doing this step before performing feature extraction will help reduce performance costs when you have got too many low-rotation series that you just should not going to make use of. For computing rotation, we’re only going to make use of train data. For that, we define the splits of the information beforehand. Notice that now we have 2 dates for the validation set, VAL_DATE_IN indicates those dates that also belong to the training set but will be used as input of the validation set, and VAL_DATE_OUT indicates from which point the timesteps can be used to judge the output of the models. On this case, we tag as high-rotation all series which have sales 75% of the 12 months, but you possibly can mess around with the implemented function within the source code. After that, we perform a second segmentation, to be sure that now we have enough historical data to coach the models.

# Split dates 
TEST_DATE = pd.Timestamp('2017-07-01')
VAL_DATE_OUT = pd.Timestamp('2017-01-01')
VAL_DATE_IN = pd.Timestamp('2016-01-01')
MIN_TRAIN_DATE = pd.Timestamp('2015-06-01')

# Rotation
rotation_values = rotation_tags(dataset[dataset.Datedataset = dataset.merge(rotation_values, on=['Agency','Product'], how='left')
dataset = dataset[dataset.Rotation=='high'].reset_index(drop=True)
dataset = dataset.drop(columns=['Rotation'])

# History
first_transactions = dataset[dataset.Volume!=0].groupby(['Agency','Product'], as_index=False).agg(
First_transaction = ('Date', 'min'),
)
dataset = dataset.merge(first_transactions, on=['Agency','Product'], how='left')
dataset = dataset[dataset.Date>=dataset.First_transaction]
dataset = dataset[MIN_TRAIN_DATE>=dataset.First_transaction].reset_index(drop=True)
dataset = dataset.drop(columns=['First_transaction'])

As we’re working with monthly aggregated data, there aren’t many time features to be extracted. On this case, we include the position, which is only a numerical index of the order of the series. Time features will be computed on train time by specifying them to Darts via encoders. Furthermore, we also compute the moving average and exponential moving average of the previous 4 months.

dataset['EMA_4'] = dataset.groupby(['Agency','Product'], group_keys=False).apply(lambda group: group.Volume.ewm(span=4, adjust=False).mean())
dataset['MA_4'] = dataset.groupby(['Agency','Product'], group_keys=False).apply(lambda group: group.Volume.rolling(window=4, min_periods=1).mean())

# Darts' encoders
encoders = {
"position": {"past": ["relative"], "future": ["relative"]},
"transformer": Scaler(),
}

Have you ever defined a baseline set of predictions?

As in other use cases, before training any fancy models, you’ll want to establish a baseline that you desire to overcome.Normally, when selecting a baseline model, you must aim for something easy that hardly has any costs. A standard practice on this field is using the moving average of demand over a time window as a baseline. This baseline will be computed without requiring any models, but for code simplicity, on this tutorial, we are going to use the Darts’ baseline model, NaiveMovingAverage.

Is your model local or global?

You’re working with multiple time series. Now, you possibly can decide to train a neighborhood model for every of those time series or train only one global model for all of the series. There will not be a ‘right’ answer, each work depending in your data. If you have got data that you just consider has similar behaviors when grouped by store, varieties of products, or other categorical features, you would possibly profit from a world model. Furthermore, if you have got a really high volume of series and you desire to use models which can be more costly to store once trained, you might also prefer a world model. Nevertheless, if after analyzing your data you suspect there are not any common patterns between series, your volume of series is manageable, otherwise you should not using complex models, selecting local models could also be best.

What libraries and models did you select?

There are a lot of options for working with time series. On this tutorial, I suggest using Darts. Assuming you might be working with Python, this forecasting library may be very easy to make use of. It provides tools for managing time series data, splitting data, managing grouped time series, and performing different analyses. It offers a wide range of world and native models, so you possibly can run experiments without switching libraries. Examples of the available options are baseline models, statistical models like ARIMA or Prophet, Scikit-learn-based models, Pytorch-based models, and ensemble models. Interesting options are models like Temporal Fusion Transformer (TFT) or Time Series Deep Encoder (TiDE), which might learn patterns between grouped series, supporting categorical covariates.

Let’s code!

Step one to begin using the several Darts models is to show the Pandas Dataframes into the time series Darts objects and split them accurately. To achieve this, I even have implemented two different functions that use Darts’ functionalities to perform these operations. The features of costs, discounts, and events can be known when forecasting occurs, while for calculated features we are going to only know past values.

# Darts format
series_raw, series, past_cov, future_cov = to_darts_time_series_group(
dataset=dataset,
goal='Volume',
time_col='Date',
group_cols=['Agency','Product'],
past_cols=['EMA_4','MA_4'],
future_cols=['Price','Discount','Special_days'],
freq='MS', # first day of every month
encode_static_cov=True, # in order that the models can use the specific variables (Agency & Product)
)

# Split
train_val, test = split_grouped_darts_time_series(
series=series,
split_date=TEST_DATE
)

train, _ = split_grouped_darts_time_series(
series=train_val,
split_date=VAL_DATE_OUT
)

_, val = split_grouped_darts_time_series(
series=train_val,
split_date=VAL_DATE_IN
)

The primary model we’re going to use is the NaiveMovingAverage baseline model, to which we are going to compare the remainder of our models. This model is actually fast because it doesn’t learn any patterns and just performs a moving average forecast given the input and output dimensions.

maes_baseline, time_baseline, preds_baseline = eval_local_model(train_val, test, NaiveMovingAverage, mae, prediction_horizon=6, input_chunk_length=12)

Normally, before jumping into deep learning, you’d try using simpler and more cost effective models, but on this tutorial, I desired to concentrate on two special deep learning models which have worked well for me. I used each of those models to forecast the demand for lots of of products across multiple stores by utilizing each day aggregated sales data and different static and continuous covariates, in addition to stock data. It can be crucial to notice that these models work higher than others specifically in long-term forecasting.

The primary model is the Temporal Fusion Transformer. This model permits you to work with a lot of time series concurrently (i.e., it’s a world model) and may be very flexible in the case of covariates. It really works with static, past (the values are only known previously), and future (the values are known in each the past and future) covariates. It manages to learn complex patterns and it supports probabilistic forecasting. The one drawback is that, while it’s well-optimized, it could possibly be costly to tune and train. In my experience, it could possibly give superb results however the technique of tuning the hyperparameters takes an excessive amount of time if you happen to are short on resources. On this tutorial, we’re training the TFT with mostlythe default parameters, and the identical input and output windows that we used for the baseline model.

# PyTorch Lightning Trainer arguments
early_stopping_args = {
"monitor": "val_loss",
"patience": 50,
"min_delta": 1e-3,
"mode": "min",
}

pl_trainer_kwargs = {
"max_epochs": 200,
#"accelerator": "gpu", # uncomment for gpu use
"callbacks": [EarlyStopping(**early_stopping_args)],
"enable_progress_bar":True
}

common_model_args = {
"output_chunk_length": 6,
"input_chunk_length": 12,
"pl_trainer_kwargs": pl_trainer_kwargs,
"save_checkpoints": True, # checkpoint to retrieve one of the best performing model state,
"force_reset": True,
"batch_size": 128,
"random_state": 42,
}

# TFT params
best_hp = {
'optimizer_kwargs': {'lr':0.0001},
'loss_fn': MAELoss(),
'use_reversible_instance_norm': True,
'add_encoders':encoders,
}

# Train
start = time.time()
## COMMENT TO LOAD PRE-TRAINED MODEL
fit_mixed_covariates_model(
model_cls = TFTModel,
common_model_args = common_model_args,
specific_model_args = best_hp,
model_name = 'TFT_model',
past_cov = past_cov,
future_cov = future_cov,
train_series = train,
val_series = val,
)
time_tft = time.time() - start

# Predict
best_tft = TFTModel.load_from_checkpoint(model_name='TFT_model', best=True)
preds_tft = best_tft.predict(
series = train_val,
past_covariates = past_cov,
future_covariates = future_cov,
n = 6
)

The second model is the Time Series Deep Encoder. This model is a little bit bit more moderen than the TFT and is built with dense layers as a substitute of LSTM layers, which makes the training of the model much less time-consuming. The Darts implementation also supports all sorts of covariates and probabilistic forecasting, in addition to multiple time series. The paper on this model shows that it could possibly match or outperform transformer-based models on forecasting benchmarks. In my case, because it was much more cost effective to tune, I managed to acquire higher results with TiDE than I did with the TFT model in the identical period of time or less. Once more for this tutorial, we are only doing a primary run with mostly default parameters. Note that for TiDE the variety of epochs needed is normally smaller than for the TFT.

# PyTorch Lightning Trainer arguments
early_stopping_args = {
"monitor": "val_loss",
"patience": 10,
"min_delta": 1e-3,
"mode": "min",
}

pl_trainer_kwargs = {
"max_epochs": 50,
#"accelerator": "gpu", # uncomment for gpu use
"callbacks": [EarlyStopping(**early_stopping_args)],
"enable_progress_bar":True
}

common_model_args = {
"output_chunk_length": 6,
"input_chunk_length": 12,
"pl_trainer_kwargs": pl_trainer_kwargs,
"save_checkpoints": True, # checkpoint to retrieve one of the best performing model state,
"force_reset": True,
"batch_size": 128,
"random_state": 42,
}

# TiDE params
best_hp = {
'optimizer_kwargs': {'lr':0.0001},
'loss_fn': MAELoss(),
'use_layer_norm': True,
'use_reversible_instance_norm': True,
'add_encoders':encoders,
}

# Train
start = time.time()
## COMMENT TO LOAD PRE-TRAINED MODEL
fit_mixed_covariates_model(
model_cls = TiDEModel,
common_model_args = common_model_args,
specific_model_args = best_hp,
model_name = 'TiDE_model',
past_cov = past_cov,
future_cov = future_cov,
train_series = train,
val_series = val,
)
time_tide = time.time() - start

# Predict
best_tide = TiDEModel.load_from_checkpoint(model_name='TiDE_model', best=True)
preds_tide = best_tide.predict(
series = train_val,
past_covariates = past_cov,
future_covariates = future_cov,
n = 6
)

How are you evaluating the performance of your model?

While typical time series metrics are useful for evaluating how good your model is at forecasting, it is suggested to go a step further. First, when evaluating against a test set, you must discard all series which have stockouts, as you won’t be comparing your forecast against real data. Second, it’s also interesting to include domain knowledge or KPIs into your evaluation. One key metric may very well be how much money would you be earning along with your model, avoiding stockouts. One other key metric may very well be how much money are you saving by avoiding overstocking short shelf-life products. Depending on the steadiness of your prices, you can even train your models with a custom loss function, resembling a price-weighted Mean Absolute Error (MAE) loss.

Will your model’s predictions deteriorate with time?

Dividing your data in a train, validation, and test split will not be enough for evaluating the performance of a model that might go into production. By just evaluating a brief window of time with the test set, your model selection is biased by how well your model performs in a really specific predictive window. Darts provides an easy-to-use implementation of backtesting, allowing you to simulate how your model would perform over time by forecasting moving windows of time. With backtesting you can too simulate the retraining of the model every N steps.

Let’s code!

If we take a look at our models’ results by way of MAE across all series we will see that the clear winner is TiDE, because it manages to cut back the baseline’s error probably the most while keeping the time cost fairly low. Nevertheless, let’s say that our beer company’s best interest is to cut back the monetary cost of stockouts and overstocking equally. In that case, we will evaluate the predictions using a price-weighted MAE.

Image by writer

After computing the price-weighted MAE for all series, the TiDE continues to be one of the best model, even though it might have been different. If we compute the advance of using TiDE w.r.t the baseline model, by way of MAE is 6.11% but by way of monetary costs, the advance increases a little bit bit. Reversely, when the advance when using TFT, the advance is bigger when just sales volume somewhat than when taking prices into the calculation.

Image by writer

For this dataset, we aren’t using backtesting to match predictions due to the limited amount of information as a consequence of it being monthly aggregated. Nevertheless, I encourage you to perform backtesting along with your projects if possible. Within the source code, I include this function to simply perform backtesting with Darts:

def backtesting(model, series, past_cov, future_cov, start_date, horizon, stride):
historical_backtest = model.historical_forecasts(
series, past_cov, future_cov,
start=start_date,
forecast_horizon=horizon,
stride=stride, # Predict every N months
retrain=False, # Keep the model fixed (no retraining)
overlap_end=False,
last_points_only=False
)
maes = model.backtest(series, historical_forecasts=historical_backtest, metric=mae)

return np.mean(maes)

How will you provide the predictions?

On this tutorial, it’s assumed that you just are already working with a predefined forecasting horizon and frequency. If this wasn’t provided, it’s also a separate use case by itself, where delivery or supplier lead times must also be taken into consideration. Knowing how often your model’s forecast is required is essential because it could require a unique level of automation. If your organization needs predictions every two months, possibly investing time, money, and resources within the automation of this task isn’t vital. Nevertheless, if your organization needs predictions twice per week and your model takes longer to make these predictions, automating the method can save future efforts.

Will you deploy the model in the corporate’s cloud services?

Following the previous advice, if you happen to and your organization resolve to deploy the model and put it into production, it’s idea to follow MLOps principles. This might allow anyone to simply make changes in the long run, without disrupting the entire system. Furthermore, it’s also necessary to observe the model’s performance once in production, as concept drift or data drift could occur. Nowadays quite a few cloud services offer tools that manage the event, deployment, and monitoring of machine learning models. Examples of those are Azure Machine Learning and Amazon Web Services.

ASK DUKE

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x