Model Evaluation in Time Series Forecasting

Introducing backtesting for time series using the Skforecast library

Time-series forecasting consists of constructing predictions based on historical time data to drive future strategic decision-making in a big selection of applications.

When evaluating a model, we split our data right into a training and a test set. While the training set is used to coach the model and determine the optimal hyperparameters, the test set is used to judge it. To have a more robust evaluation of the model performance, it’s common to make use of cross-validation. Cross-validation is a resampling method that uses different data sets to check and train a model on several iterations.

Nevertheless, it shouldn’t be possible to implement straightforward cross-validation on time series data because it ignores the temporal components between the observations. Due to this fact, this text presents different methods used to judge time series models, generally known as backtesting.

Backtesting is a term utilized in modeling that refers to assessing a model using existing historic data. It involves choosing several training and test sets going step-by-step forward in time. The foremost idea behind backtesting is comparable to the one behind cross-validation, except that backtesting considers the temporal component of the information. This method enables us to (1) assess and visualize how the model error develops over time and (2) estimate the variance of the model error.

In production, it’s a standard practice to first determine the optimal parameters using a backtesting method after which retrain the model with the available data. But, this retraining doesn’t necessarily should be with all of the available data or each time recent data is on the market. Depending on our strategy, we will select a special backtesting method.

1. Backtesting with refit and increasing training size

The model is tested on a sequentially increased training set, at all times having a hard and fast origin and using all the information available. On this method, there’s a hard and fast origin and the scale of the training set increases for every iteration, as displayed in Figure 1.

***Fig. 1****. Time series backtesting diagram with an initial training size of ten observations, a prediction horizon of three steps, and retraining at each iteration.* ***Ref***: *Skforecast* *[1].*

2. Backtesting with refit and stuck training size

This method is comparable to the previous one except that it rolls the origin of the forecast. Due to this fact, the scale of the training set stays constant, as displayed in Figure 2. This method might be considered a time series analogous to cross-validation techniques.

In comparison with the previous method, this method is cheaper as the scale of the training set stays the identical for every iteration. It also allows for distinct error distribution by lead time and desensitizes the error measures to special events at any single origin [2]. An example where this approach is interesting is when there have been events or “abnormal” periods, corresponding to COVID, inside the historical data.

**Fig. 2**. *Time series backtesting diagram with an initial training size of ten observations, a prediction horizon of three steps, and a training set of constant size.* ***Ref***: *Skforecast* *[1].*

3. Backtesting without refit

The last backtesting approach consists of coaching the model with an initial training set and assessing it sequentially without updating it. This strategy has the advantage of being much faster for the reason that model is trained just once. Nevertheless, the model doesn’t incorporate the newest data available, so it might lose predictive capability over time.

This approach is interesting whether it is crucial to make predictions with a high frequency on recent data coming into the system.

**Fig. 3**. *Time series backtesting diagram with an initial training size of ten observations, a prediction horizon of three steps, and no retraining at each iteration.* ***Ref***: *Skforecast* *[1].*

Here is the implementation of backtesting using the Skforecast library. Skforecast is a python library that eases using scikit-learn regressors as multi-step forecasters. It also works with any regressor compatible with the scikit-learn API (pipelines, CatBoost, LightGBM, XGBoost, Ranger…).

For testing purposes, we now have used the publicly available h2o datastet under the GitHub MIT license, whose data goes from 1991–07–01 to 2008–06–01 monthly.

**Fig. 4**. *Visualization of the dataset, where blue data is used for training and orange for testing.* ***Ref****: Image by writer.*

Below, there are the three described backtesting methods with a random forest regressor used as autoregression.

When taking a look at the implementation, the difference between the backtesting methods relies on the next parameters:

initial_train_size: Variety of samples within the initial train split.
fixed_train_size: If True, train size doesn’t increase but moves by ‘steps’ in each iteration.
refit: Whether to re-fit the forecaster in each iteration.
steps: Variety of steps to predict.

1. Backtesting with refit and increasing training size

The model is first trained with the series length set until 2002–01–01, to then sequentially add ten recent data into the training. This process is repeated until the whole series has been run.

To set this method, the fixed_train_size and refit parameters are set to False and True, respectively.

As observed, the training set increases over time while the test set stays constant over time.

2. Backtesting with refit and stuck training size

Just like backtesting with refit and increasing training size, the model is first trained with the series length set until 2002–01–01 to then sequentially add ten recent data into the training. Nevertheless, on this method, the scale keeps constant over time, which suggests each the training and test sets have at all times the identical size.

To set this method, each the fixed_train_size and refit parameters are set to True.

3. Backtesting without refit

Just like backtesting with refit and increasing training size, the model is first trained with the series length set until 2002–01–01. Nevertheless, the training set doesn’t change over time where because the test set moves ten steps each iteration.

To set this method, each the fixed_train_size and refit parameters are set to False.

Model Evaluation in Time Series Forecasting

Introducing backtesting for time series using the Skforecast library

1. Backtesting with refit and increasing training size

2. Backtesting with refit and stuck training size

3. Backtesting without refit

1. Backtesting with refit and increasing training size

2. Backtesting with refit and stuck training size

3. Backtesting without refit

What are your thoughts on this topic?
Let us know in the comments below.

3 COMMENTS

Share this article

Recent posts

AI’s Growing Power Needs: Tech Industry’s Move Towards Nuclear Power

“Human Intelligence Created”… Human Intelligence Challenge Spreads Against ‘Made by AI’

What We Still Don’t Understand About Machine Learning

OpenAI Unveils SearchGPT: A Recent AI-Powered Search Engine

Public Release: Kling AI Video Generator

Model Evaluation in Time Series Forecasting

Introducing backtesting for time series using the Skforecast library

1. Backtesting with refit and increasing training size

2. Backtesting with refit and stuck training size

3. Backtesting without refit

1. Backtesting with refit and increasing training size

2. Backtesting with refit and stuck training size

3. Backtesting without refit

What are your thoughts on this topic? Let us know in the comments below.

3 COMMENTS

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.