Neural Network (MLP) for Time Series Forecasting in Practice

-

Time series and more specifically time series forecasting is a really well-known data science problem amongst professionals and business users alike.

Several forecasting methods exist, which could also be grouped as statistical or machine learning methods for comprehension and a greater overview, but as a matter of fact, the demand for forecasting is so high that the available options are abundant.

Machine learning methods are considered state-of-the-art approach in time series forecasting and are increasing in popularity, resulting from the indisputable fact that they’re able to capture complex non-linear relationships inside the data and customarily yield higher accuracy in forecasting [1]. One popular machine learning field is the landscape of neural networks. Specifically for time series evaluation, recurrent neural networks have been developed and applied to resolve forecasting problems [2].

Data science enthusiasts might find the complexity behind such models intimidating and being one in all you I can tell that I share that feeling. Nevertheless, this text goals to indicate that

despite the newest developments in machine learning methods, it will not be necessarily price pursuing probably the most complex application when searching for an answer for a specific problem. Well-established methods enhanced with powerful feature engineering techniques could still provide satisfactory results.

More specifically, I apply a Multi-Layer Perceptron model and share the code and results, so you possibly can get a hands-on experience on engineering time series features and forecasting effectively.

More precisely what I aim at to supply for fellow self-taught professionals, might be summarized in the next points:

  1. forecasting based on real-world problem / data
  2. easy methods to engineer time series features for capturing temporal patterns
  3. construct an MLP model able to utilizing mixed variables: floats and integers (treated as categoricals via embedding)
  4. use MLP for point forecasting
  5. use MLP for multi-step forecasting
  6. assess feature importance using permutation feature importance method
  7. retrain model for a subset of grouped features (multiple groups, trained for individual groups) to refine the feature importance of grouped features
  8. evaluate the model by comparing to an UnobservedComponents model

Please note, that this text assumes the prior knowledge of some key technical terms and don’t intend to clarify them in details. Find those key terms below, with references provided, which might be checked for clarity:

  1. Time Series [3]
  2. Prediction [4] — on this context it’ll be used to differentiate model outputs within the training period
  3. Forecast [4] — on this context it’ll be used to differentiate model outputs within the test period
  4. Feature Engineering [5]
  5. Autocorrelation [6]
  6. Partial Autocorrelation [6]
  7. MLP (Multi-Layer Perceptron) [7]
  8. Input Layer [7]
  9. Hidden Layer [7]
  10. Output Layer [7]
  11. Embedding [8]
  12. State Space Models [9]
  13. Unobserved Components Model [9]
  14. RMSE (Root Mean Squared Error) [10]
  15. Feature Importance [11]
  16. Permutation Feature Importance [11]

The essential packages used throughout the evaluation are numpy and pandas for data manipulation, plotly for interactive charts, statsmodels for statistics and state space modeling and at last, tensorflow for MLP architcture.

Note: resulting from technical limitations, I’ll provide the code snippets for interactive plotting, however the figures will likely be static presented here.

import opendatasets as od
import numpy as np
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import tensorflow as tf

from sklearn.preprocessing import StandardScaler
from sklearn.inspection import permutation_importance
import statsmodels.api as sm
from statsmodels.tsa.stattools import acf, pacf
import datetime

import warnings
warnings.filterwarnings('ignore')

The information is loaded robotically using opendatasets.

dataset_url = "https://www.kaggle.com/datasets/robikscube/hourly-energy-consumption/"
od.download(dataset_url)
df = pd.read_csv(".hourly-energy-consumption" + "AEP_hourly.csv", index_col=0)
df.sort_index(inplace = True)

Keep in my mind, that data cleansing was a vital first step so as to progress with the evaluation. In the event you are fascinated by the main points and in addition in state space modeling, please discuss with my previous article here. ☚📰 In a nutshell, the next steps were conducted:

  1. Identifying gaps, when specific timestamps are missing (only single steps were identified)
  2. Perform imputation (using mean of previous and next records)
  3. Identifying and dropping duplicates
  4. Set timestamp column as index for dataframe
  5. Set dataframe index frequency to hourly, since it is a requirement for further processing

After preparing the information, let’s explore it by drawing 5 random timestamp samples and compare the time series at different scales.

fig = make_subplots(rows=5, cols=4, shared_yaxes=True, horizontal_spacing=0.01, vertical_spacing=0.04)

# drawing a random sample of 5 indices without repetition
sample = sorted([x for x in np.random.choice(range(0, len(df), 1), 5, replace=False)])

# zoom x scales for plotting
periods = [9000, 3000, 720, 240]

colours = ["#E56399", "#F0B67F", "#DE6E4B", "#7FD1B9", "#7A6563"]

# s for sample datetime start
for si, s in enumerate(sample):

# p for period length
for pi, p in enumerate(periods):
cdf = df.iloc[s:(s+p+1),:].copy()
fig.add_trace(go.Scatter(x=cdf.index,
y=cdf.AEP_MW.values,
marker=dict(color=colours[si])),
row=si+1, col=pi+1)

fig.update_layout(
font=dict(family="Arial"),
margin=dict(b=8, l=8, r=8, t=8),
showlegend=False,
height=1000,
paper_bgcolor="#FFFFFF",
plot_bgcolor="#FFFFFF")
fig.update_xaxes(griddash="dot", gridcolor="#808080")
fig.update_yaxes(griddash="dot", gridcolor="#808080")

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x