Home Artificial Intelligence Understanding Time Series Trend Understanding Trend Easy methods to Detect Trend Easy methods to Take care of Trend Key Takeaways

Understanding Time Series Trend Understanding Trend Easy methods to Detect Trend Easy methods to Take care of Trend Key Takeaways

Understanding Time Series Trend
Understanding Trend
Easy methods to Detect Trend
Easy methods to Take care of Trend
Key Takeaways

Deterministic trends vs stochastic trends, and the right way to take care of them

Photo by Ali Abdul Rahman on Unsplash

Detecting and coping with the trend is a key step within the modeling of time series.

In this text, we’ll:

  • Describe what’s the trend of a time series, and its different characteristics;
  • Explore the right way to detect it;
  • Discuss ways of coping with trend;

Trend as a constructing block of time series

At any given time, a time series may be decomposed into three parts: trend, seasonality, and the rest.

Additive decomposition of a time series

The trend represents the long-term change in the extent of a time series. This variation may be either upward (increase in level) or downward (decrease in level). If the change is systematic in a single direction, then the trend is monotonic.

USA GDP time series with an upward and monotonic trend. Data source in reference [1]. Image by creator.

Trend as a reason behind non-stationarity

A time series is stationary if its statistical properties don’t change. This includes the extent of the time series, which is constant under stationary conditions.

So, when a time series exhibits a trend, the stationarity assumption shouldn’t be met. Modeling non-stationary time series is difficult. If untreated, statistical tests and forecasts may be misleading. That is why it’s vital to detect and take care of the trend before modeling time series.

A correct characterization of the trend affects modeling decisions. This, further down the road, impacts forecasting performance.

Deterministic Trends

A trend may be either deterministic or stochastic.

Deterministic trends may be modeled with a well-defined mathematical function. Which means the long-term behavior of the time series is predictable. Any deviation from the trend line is simply temporary.

Usually, deterministic trends are linear and may be written as follows:

The equation for a linear trend. The coefficient b is the expected change within the trend in consecutive periods. The coefficient a is the intercept.

But, trends also can follow an exponential or polynomial form.

Exponential trend equation. This trend may be made linear by taking the go browsing each side.

Within the economy, there are several examples of time series that increase exponentially, reminiscent of GDP:

USA GDP time series. The unique trend is exponential, but it surely becomes linear after the logarithm transformation. Data source in reference [1]. Image by creator.

A time series with a deterministic trend known as trend-stationary. This implies the series becomes stationary after removing the trend component.

Linear trends will also be modeled by including time as an explanatory variable. Here’s an example of how you possibly can do that:

import numpy as np
import pandas as pd
from statsmodels.tsa.arima.model import ARIMA

# https://github.com/vcerqueira/blog/blob/primary/data/gdp-countries.csv
series = pd.read_csv('data/gdp-countries.csv')['United States']
series.index = pd.date_range(start='12/31/1959', periods=len(series), freq='Y')

log_gdp = np.log(series)

linear_trend = np.arange(1, len(log_gdp) + 1)

model = ARIMA(endog=log_gdp, order=(1, 0, 0), exog=linear_trend)
result = model.fit()

Stochastic Trends

A stochastic trend can change randomly, which makes their behavior difficult to predict.

A random walk is an example of a time series with a stochastic trend:

rw = np.cumsum(np.random.alternative([-1, 1], size=1000))
A random walk time series whose trend changes suddenly and unpredictably. Image by creator.

Stochastic trends are related to unit roots, integration, and differencing.

Time series with stochastic trends are known as difference-stationary. Which means the time series may be made stationary by differencing operations. Differencing means taking the difference between consecutive values.

Difference-stationary time series are also called integrated. For instance, ARIMA (Auto-Regressive Integrated Moving Average) models contain a selected term (I) for integrated time series. This term involves applying differencing steps until the series becomes stationary.

Finally, difference-stationary or integrated time series are characterised by unit roots. Without going into mathematical details, a unit root is a characteristic of non-stationary time series.

Forecasting Implications

Deterministic and stochastic trends have different implications for forecasting.

Deterministic trends have a continuing variance throughout time. Within the case of a linear trend, this means that the slope won’t change. But, real-world time series show complex dynamics with the trend changing over long periods. So, long-term forecasting with deterministic trend models can result in poor performance. The belief of constant variance results in narrow forecasting intervals that underestimate uncertainty.

Many realizations of a random walk. Image by creator.

Stochastic trends are assumed to vary over time. Consequently, the variance of a time series increases across time. This makes stochastic trends higher for long-term forecasting because they supply more reasonable uncertainty estimations.

Stochastic trends may be detected using unit root tests. For instance, the augmented Dickey-Fuller test, or the KPSS test.

Augmented Dickey-Fuller (ADF) test

The ADF test checks whether an auto-regressive model accommodates a unit root. The hypotheses of the test are:

  • Null hypothesis: There may be a unit root (the time series shouldn’t be stationary);
  • Alternative hypothesis: There’s no unit root.

This test is out there in statsmodels:

from statsmodels.tsa.stattools import adfuller

pvalue_adf = adfuller(x=log_gdp, regression='ct')[1]

# 1.0

The parameter regression=‘ct’ is used to incorporate a continuing term and the deterministic trend within the model. As you’ll be able to check within the documentation, there are 4 possible alternative values to this parameter:

  • c: including a continuing term (default value);
  • ct: a continuing term plus linear trend;
  • ctt: constant term plus a linear and quadratic trend;
  • n: no constant or trend.

Selecting which terms ought to be included is vital. A flawed inclusion or exclusion of a term can substantially reduce the facility of the test. In our case, we used the ct option since the log GPD series shows a linear deterministic trend behavior.

KPSS test

The KPSS test will also be used to detect stochastic trends. The test hypotheses are opposite relative to ADF:

Null hypothesis: the time series is trend-stationary;

Alternative hypothesis: There may be a unit root.

from statsmodels.tsa.stattools import kpss

pvalue_kpss = kpss(x=log_gdp, regression='ct')[1]

# 0.01

The KPSS rejects the null hypothesis, while ADF doesn’t. So, each tests signal the presence of a unit root. Note that a time series can have a trend with each deterministic and stochastic components.

So, how are you going to take care of unit roots?

We’ve explored the right way to use time as an explanatory variable to account for a linear trend.

One other solution to take care of trends is by differencing. As a substitute of working with absolutely the values, you model how the time series changes in consecutive periods.

A single differencing operation will likely be enough to attain stationarity. Yet, sometimes you should do that process over and over. You need to use ADF or KPSS to estimate the required variety of differencing steps. The pmdarima library wraps this process within the function ndiffs:

from pmdarima.arima import ndiffs

# what number of differencing steps are needed for stationarity?
ndiffs(log_gdp, test='adf')
# 2

On this case, the log GPD series needs 2 differencing steps for stationarity:

diff_log_gdp = log_gdp.diff().diff()
Second differences of the log GDP time series. Image by creator.



Please enter your comment!
Please enter your name here