Time Series Data Evaluation with sARIMA and Dash

Artificial Intelligence

Time Series Data Evaluation with sARIMA and Dash

admin

May 7, 2023

Time Series Data Evaluation with sARIMA and Dash

1.1 The constructing blocks of the model

To grasp what sARIMA models are, let’s first introduce the constructing blocks of those models.

sARIMA is a composition of various sub-models (i.e. polynomials that we use to represent our time series data) which form the acronym: seasonal (s) autoregressive (AR) integrated (I) moving average (MA):

AR: the autoregressive component, governed by the hyperparameter “p”, assumes that the present value at a time “t” could be expressed as a linear combination of the previous “p” values:

AR | Image by creator

I: the integrated component is represented by the hyperparameter “d”, which is the degree of the differencing transformation applied to the information. Differencing is a method used to remove trend from the information (i.e. make the information stationary with respect to the mean, as we’ll see later), which helps the model fit the information because it isolates the trend component (we use d=1 for linear trend, d=2 for quadratic trend, …). Differencing the information with d=1 means working with the difference between consecutive data points:

I | Image by creator

MA: the moving average component, governed by the hyperparameter “q”, assumes that the present value at a time “t” could be expressed as a continuing term (often the mean) plus a linear combination of the errors of the previous “q” points:

MA | Image by creator

If we consider the components to date, we get “ARIMA”, which is the name of a model family to work with time series data with no seasonality. sARIMA models are a generalization to work with seasonal data with the addition of an S-component: the seasonal component, which consists of a recent set of AR, I, MA components with a seasonal lag. In other words, once identified a seasonality and defined its lag (represented by the hyperparameter “m” — e.g. m=12 signifies that every 12 months, on a monthly dataset, we see the identical behavior), we create a recent set of AR (P), I (D), MA (Q) components, with respect to the seasonal lag (m) (e.g. if D=1 and m=12, which means that we apply a 1-degree differencing to the series, with a lag of 12).

To sum up, the sARIMA model is defined by 7 hyperparameters: 3 for the non-seasonal a part of the model, and 4 for the seasonal part. They’re indicated as:

sARIMA (p,d,q) (P,D,Q)m

Because of the model flexibility, we will “switch off” the components that aren’t embodied in our data (i.e. if the information doesn’t have a trend or doesn’t have seasonality, the respective parameters could be set to 0) and still use the identical model framework to suit the information.

Then again, amongst sARIMA limitations, we’ve got that these models can capture just one seasonality. If a day by day dataset has a yearly plus a weekly seasonality, we’ll need to decide on the strongest one.

1.2 Methods to select the model hyperparameters: ACF and PACF

To discover the model hyperparameters, we normally have a look at the autocorrelation and partial-autocorrelation of the time series; since all of the above components use past data to model present and future points, we should always investigate how past and present data are correlated and define what number of past data points we’d like, to model the current.

For that reason, autocorrelation and partial-autocorrelation are two widely used functions:

ACF (autocorrelation): describes the correlation of the time series, with its lags. All data points are in comparison with their previous lag 1, lag 2, lag 3, … The resulting correlation is plotted on a histogram. This chart (also called “correlogram”) is used to visualise how much information is retained throughout the time series. The ACF helps us in selecting the sARIMA model because:

The ACF helps to discover the MA(q) hyperparameter.

PACF (partial autocorrelation): describes the partial correlation of the time series, with its lags. In a different way from the ACF, the PACF shows the correlation between some extent X_t and a lag, which just isn’t explained by common correlations with other lags at a lower order. In other words, the PACF isolates the direct correlation between two terms. The PACF helps us in selecting the sARIMA model because:

The PACF helps to discover the AR(p) hyperparameter.

Before using these tools, nevertheless, we’d like to say that ACF and PACF can only be used on a “stationary” time series.

1.3 Stationarity

A (weakly) stationary time series is a time series where:

The mean is constant over time (i.e. the series fluctuates around a horizontal line without positive or negative trends)
The variance is constant over time (i.e. there isn’t a seasonality or change within the deviation from the mean)

After all not all time series are natively stationary; nevertheless, we will transform them to make them stationary. The most typical transformations used to make a time series stationary are:

The natural log: by applying the log to every data point, we often manage to make the time series stationary with respect to the variance.
Differencing: by differencing a time series, we often manage to remove the trend and make the time series stationary with respect to the mean.

After transforming the time series, we will use two tools to substantiate that it’s stationary:

The Box-Cox plot: this can be a plot of the rolling mean (on the x-axis) vs the rolling standard deviation (on the y-axis) (or the mean vs variance of grouped points). Our data is stationary if we don’t observe any particular trends within the chart and we see little variation on each axes.
The Augmented Dickey–Fuller test (ADF): a statistical test by which we attempt to reject the null hypothesis stating that the time series is non-stationary.

Once a time series is stationary, we will analyze the ACF and PACF patterns, and find the SARIMA model hyperparameters.

Identifying the sARIMA model that matches our data consist of a series of steps, which we are going to perform on the AirPassenger dataset (available here).

Each step roughly corresponds to a “page” of the Dash web app.

2.1 Plot your data

Create a line chart of your raw data: among the features described above could be seen by the naked eye, especially stationarity, and seasonality.

Within the above chart, we see a positive linear trend and a recurrent seasonality pattern; considering that we’ve got monthly data, we will assume the seasonality to be yearly (lag 12). The info just isn’t stationary.

2.2 Transform the information to make it stationary

With a view to find the model hyperparameters, we’d like to work with a stationary time series. So, if the information just isn’t stationary, we’ll need to rework it:

Start with the log transformation, to make the information stationary with respect to the variance (the log is defined over positive values. So, if the information presents negative or 0 values, add a continuing to every datapoint).
Apply differencing to make the information stationary with respect to the mean. Normally start with differencing of order 1 and lag 1. Then, if data remains to be not stationary, try differencing with respect to the seasonal lag (e.g. 12 if we’ve got monthly data). (Using a reverse order won’t make a difference).

With our dataset, we’d like to perform the next steps to make it fully stationary:

Stationary transformations | Image by creator

After each step, by taking a look at the ADF test p-value and Box-Cox plot, we see that:

The Box-Cox plot gets progressively cleaned from any trend and all points catch up with and closer.
The p-value progressively drops. We will finally reject the null hypothesis of the test.

Stationary transformations (2) | Image by creator

2.3 Discover suitable model hyperparameters with the ACF and PACF

While transforming the information to stationary, we’ve got already identified 3 parameters:

Since we applied differencing, the model will include differencing components. We applied a differencing of 1 and 12: we will set d=1 and D=1 with m=12 (seasonality of 12).

For the remaining parameters, we will have a look at the ACF and PACF after the transformations.

Typically, we will apply the next rules:

We’ve an AR(p) process if: the PACF has a big spike at a certain lag “p” (and no significant spikes after) and the ACF decays or shows a sinusoidal behavior (alternating positive, negative spikes).
We’ve a MA(q) process if: the ACF has a big spike at a certain lag “q” (and no significant spikes after) and the PACF decays or shows a sinusoidal behavior (alternating positive, negative spikes).
Within the case of seasonal AR(P) or MA(Q) processes, we are going to see that the numerous spikes repeat on the seasonal lags.

By taking a look at our example, we see the next:

ACF and PACF after transformations | Image by creator

The closest rule to the above behavior, suggests some MA(q) process with “q” between 1 and three; the indisputable fact that we still have a big spike at 12, might also suggest an MA(Q) with Q=1 (since m=12).

We use the ACF and PACF to get a variety of hyperparameter values that may form model candidates. We will compare these different model candidates against our data, and pick the top-performing one.

In the instance, our model candidates appear to be:

SARIMA (p,d,q) (P,D,Q)m = (0, 1, 1) (0, 1, 1) 12
SARIMA (p,d,q) (P,D,Q)m = (0, 1, 3) (0, 1, 1) 12

2.4 Perform a model grid search to discover optimal hyperparameters

Grid search could be used to check several model candidates against one another: we fit each model to the information and pick the top-performing one.

To establish a grid search we’d like to:

create an inventory with all possible combos of model hyperparameters, given a variety of values for every hyperparameter.
fit each model and measure its performance using a KPI of alternative.
select the hyperparameters taking a look at the top-performing models.

In our case, we are going to compare model performances using the AIC (Akaike information criterion) rating. This KPI formula consists of a trade-off between the fitting error (accuracy) and model complexity. Typically, when the complexity is simply too low, the error is high, because we over-simplify the model fitting task; quite the opposite, when complexity is simply too high, the error remains to be high as a consequence of overfitting. A trade-off between these two will allow us to discover the “top-performing” model.

Practical note: with fitting a sARIMA model, we are going to need to make use of the unique dataset with the log transformation (if we’ve applied it), but we don’t wish to use the information with differencing transformations.

We will select to order a part of the time series (often essentially the most recent 20% observations) as a test set.

In our example, based on the below hyperparameter ranges, the very best model is:

SARIMA (p,d,q) (P,D,Q)m = (0, 1, 1) (0, 1, 1) 12

2.5 Final model: fit and predictions

We will finally predict data for train, test, and any future out-of-sample commentary. The ultimate plot is:

To verify that we captured all correlations, we will plot the model residuals ACF and PACF:

On this case, some signal from the strong seasonality component remains to be present, but a lot of the remaining lags have a 0 correlation.

The steps described above should work on any dataset which might be modeled through sARIMA. To recap :

1-Plot & explore your data

2-Apply transformations to make the information stationary (deal with the left-end charts and the ADF test)

3-Discover suitable hyperparameters by taking a look at the ACF and PACF (right-end charts)

4-Perform a grid search to pick optimal hyperparameters

5-Fit and predict using the very best model

Download the app locally, upload your personal datasets (by replacing the .csv file in the information folder) and take a look at to suit the very best model.

Thanks for reading!