train deep neural networks using several time series

This text is a follow-up to a previous one. There, we learned find out how to transform a time series for deep learning.

We proceed to explore deep neural networks for forecasting. On this post, we’ll:

  • Learn find out how to train a world forecasting model using deep learning, including basic preprocessing steps;
  • Explore keras callbacks to drive the training means of a neural network.

Deep neural networks tackle forecasting problems using auto-regression. Auto-regression is a modeling technique that involves using past observations to predict future ones.

Deep neural networks could be designed in alternative ways, reminiscent of recurrent or convolutional architectures. Recurrent neural networks are sometimes preferred for time series data. Amongst other reasons, such a network excels at modeling long-term dependencies. This feature can have a powerful impact on forecasting performance.

Here’s find out how to define a particular type of recurrent neural network called LSTM (Long Short-Term Memory). The comments provide a temporary description of every model element.

from keras.models import Sequential
from keras.layers import (Dense,

# Variety of variables within the time series.
# 1 means the time series is univariate
# Variety of lags within the auto-regressive model
N_LAGS = 24
# Variety of future steps to be predicted

# 'Sequential' instance is used to create a linear stack of layers
# ... each layer feeds into the subsequent one.
model = Sequential()
# Adding an LSTM layer with 32 units and relu activation
model.add(LSTM(32, activation='relu', input_shape=(N_LAGS, N_FEATURES)))
# Using dropout to avoid overfitting
# Repeating the input vector HORIZON times to match the form of the output.
# One other LSTM layer, this time with 16 units
# Also returning the output of every time step (return_sequences=True)
model.add(LSTM(16, activation='relu', return_sequences=True))
# Using dropout again with 0.2 dropout rate
# Adding an ordinary fully connected neural network layer
# And distributing the layer to every time step

# Compiling the model using ADAM and setting the target to reduce MSE
model.compile(optimizer='adam', loss='mse')

Before, we learned find out how to transform a time series to coach this model. But, sometimes you have got several time series available.

How do you handle such cases?

The rise of worldwide methods

Forecasting models are frequently created with the historical data of a time series. Such models could be known as local to that point series. In contrast, global methods pool the historical data of many time series to construct a model.

The interest in global models surged when a way called ES-RNN won the M4 contest — a forecasting competition featuring 100000 different time series.

When and why to make use of a world model

Global models can provide considerable value in forecasting problems involving many time series. For instance, in retail where the goal is to predict the sales of many products.

One other motivation for using this type of approach is to have more data. Machine learning algorithms are prone to perform higher with larger training sets. This is very so with methods with a lot of parameters, reminiscent of deep neural networks. These are known to be data-hungry.

Global forecasting models don’t assume that the underlying time series are dependent. That’s, the lags of 1 series could be used to forecast the longer term values of one other series.

Somewhat, these techniques exploit information from many time series to estimate the parameters of the model. When forecasting the longer term of a time series, the principal input to the model is the past recent lags of that series.

In the remaining of this text, we’ll explore find out how to train a deep neural network using many time series.


We’ll use an information set concerning the power consumption in 8 regions across the USA:

Day by day power consumption (log) in 8 regions across the USA. Data source in reference [1]. Image by creator.

The goal is to forecast power consumption in the next days. This problem is relevant for power systems operators. Accurate predictions help balance the provision and demand of energy.

We will read the information as follows:

import pandas as pd

data = pd.read_csv('data/daily_energy_demand.csv',


Preprocessing steps

When training a deep neural network with multiple time series you should apply some preprocessing steps. Here, we’ll explore the next two:

  • Mean-scaling
  • Log transformation

The available set of time series can have different scales. Thus, it’s vital to normalize each series into a typical value range. For global forecasting models, this is often done by dividing each statement by the mean value of the respective series.

from sklearn.model_selection import train_test_split

# leaving last 20% of observations for testing
train, test = train_test_split(data, test_size=0.2, shuffle=False)

# computing the common of every series within the training set
mean_by_series = train.mean()

# mean-scaling: dividing each series by its mean value
train_scaled = train / mean_by_series
test_scaled = test / mean_by_series

After mean-scaling, the log transformation may also be helpful.

In a previous article, we explore how taking the log of time series is a useful transformation to handle heteroskedasticity. The log transformation may also help avoid saturation areas of the neural network. Saturation occurs when the neural network becomes insensitive to different inputs. This hampers the educational process, resulting in a poor model.

import numpy as np

class LogTransformation:

def transform(x):
xt = np.sign(x) * np.log(np.abs(x) + 1)

return xt

def inverse_transform(xt):
x = np.sign(xt) * (np.exp(np.abs(xt)) - 1)

return x

# log transformation
train_scaled_log = LogTransformation.transform(train_scaled)
test_scaled_log = LogTransformation.transform(test_scaled)


After pre-processing every time series, we want to rework them from sequences right into a set of observations. For a single time series, you may check the previous article to learn the main points of this process.

For several time series, the concept is comparable. We create a set of observations for every series individually. Then, these are concatenated right into a single data set.

Here’s how you may do that:

# src module here:
from src.tde import time_delay_embedding

N_FEATURES = 1 # time series is univariate
N_LAGS = 3 # variety of lags
HORIZON = 2 # forecasting horizon

# transforming time series for supervised learning
train_by_series, test_by_series = {}, {}
# iterating over every time series
for col in data:
train_series = train_scaled_log[col]
test_series = test_scaled_log[col] = 'Series' = 'Series'

# creating observations using a sliding window method
train_df = time_delay_embedding(train_series, n_lags=N_LAGS, horizon=HORIZON)
test_df = time_delay_embedding(test_series, n_lags=N_LAGS, horizon=HORIZON)

train_by_series[col] = train_df
test_by_series[col] = test_df

After that, you mix the information of every time series by a row-wise concatenation:

train_df = pd.concat(train_by_series, axis=0)


Finally, we split the goal variables from the explanatory ones as described before:

# defining goal (Y) and explanatory variables (X)
predictor_variables = train_df.columns.str.incorporates('(t-|(t)')
target_variables = train_df.columns.str.incorporates('(t+')
X_tr = train_df.iloc[:, predictor_variables]
Y_tr = train_df.iloc[:, target_variables]

# transforming the information from matrix right into a 3-D format for deep learning
X_tr_3d = from_matrix_to_3d(X_tr)
Y_tr_3d = from_matrix_to_3d(Y_tr)

# defining the neural network
model = Sequential()
model.add(LSTM(32, activation='relu', input_shape=(N_LAGS, N_FEATURES)))
model.add(LSTM(16, activation='relu', return_sequences=True))
model.compile(optimizer='adam', loss='mse')

# spliting training right into a development and validation set
X_train, X_valid, Y_train, Y_valid =
train_test_split(X_tr_3d, Y_tr_3d, test_size=.2, shuffle=False)

# training the neural network, Y_train, validation_data=(X_valid,Y_valid), epochs=100)

Deep neural networks are iterative methods. They go over the training dataset several times in cycles called epochs.

Within the above example, we ran 100 epochs. But, it’s not clear what number of epochs one should run to coach a network. Too few epochs can result in underfitting; too many iterations result in overfitting.

A method to handle this problem is by monitoring the performance of the neural network after each epoch. Every time the model improves performance, you reserve it before continuing the training process. Then, after the training is over, you get the most effective model that was saved.

In keras, you should utilize callbacks to handle this process for you. A callback is a function that performs some motion through the training process. You’ll be able to check keras documentation for an entire list of the available callbacks. Or find out how to learn to write down your individual!

The callback that’s used to avoid wasting the model during training known as ModelCheckPoint:

from keras.callbacks import ModelCheckpoint

model_checkpoint = ModelCheckpoint(

model = Sequential()
model.add(LSTM(32, activation='relu', input_shape=(N_LAGS, N_FEATURES)))
model.add(LSTM(16, activation='relu', return_sequences=True))
model.compile(optimizer='adam', loss='mse')

history =, Y_train,

One other interesting callback you should utilize for training is EarlyStopping. It may well be used to stop training when performance has stopped improving.

Making predictions

After training, we will retrieve the most effective model and make predictions on the test set.

# The perfect model weights are loaded into the model.

# Inference on DAYTON region
test_dayton = test_by_series['DAYTON']

# spliting goal variables from explanatory ones
X_ts = test_df.iloc[:, predictor_variables]
Y_ts = test_df.iloc[:, target_variables]
X_ts_3d = from_matrix_to_3d(X_ts)

# predicting on normalized data
preds = model.predict_on_batch(X_ts_3d)
preds_df = from_3d_to_matrix(preds, Y_ts.columns)

# reverting log transformation
preds_df = LogTransformation.inverse_transform(preds_df)
# reverting mean scaling
preds_df *= mean_by_series['DAYTON']


