## Deterministic trends vs stochastic trends, and the right way to take care of them

Detecting and coping with the trend is a key step within the modeling of time series.

In this text, we’ll:

- Describe what’s the trend of a time series, and its different characteristics;
- Explore the right way to detect it;
- Discuss ways of coping with trend;

## Trend as a constructing block of time series

At any given time, a time series may be decomposed into three parts: trend, seasonality, and the rest.

The trend represents the long-term change in the extent of a time series. This variation may be either upward (increase in level) or downward (decrease in level). If the change is systematic in a single direction, then the trend is monotonic.

## Trend as a reason behind non-stationarity

A time series is stationary if its statistical properties don’t change. This includes the extent of the time series, which is constant under stationary conditions.

So, when a time series exhibits a trend, the stationarity assumption shouldn’t be met. Modeling non-stationary time series is difficult. If untreated, statistical tests and forecasts may be misleading. That is why it’s vital to detect and take care of the trend before modeling time series.

A correct characterization of the trend affects modeling decisions. This, further down the road, impacts forecasting performance.

## Deterministic Trends

A trend may be either deterministic or stochastic.

Deterministic trends may be modeled with a well-defined mathematical function. Which means the long-term behavior of the time series is predictable. Any deviation from the trend line is simply temporary.

Usually, deterministic trends are linear and may be written as follows:

But, trends also can follow an exponential or polynomial form.

Within the economy, there are several examples of time series that increase exponentially, reminiscent of GDP:

A time series with a deterministic trend known as trend-stationary. This implies the series becomes stationary after removing the trend component.

Linear trends will also be modeled by including time as an explanatory variable. Here’s an example of how you possibly can do that:

`import numpy as np`

import pandas as pd

from statsmodels.tsa.arima.model import ARIMA# https://github.com/vcerqueira/blog/blob/primary/data/gdp-countries.csv

series = pd.read_csv('data/gdp-countries.csv')['United States']

series.index = pd.date_range(start='12/31/1959', periods=len(series), freq='Y')

log_gdp = np.log(series)

linear_trend = np.arange(1, len(log_gdp) + 1)

model = ARIMA(endog=log_gdp, order=(1, 0, 0), exog=linear_trend)

result = model.fit()

## Stochastic Trends

A stochastic trend can change randomly, which makes their behavior difficult to predict.

A random walk is an example of a time series with a stochastic trend:

`rw = np.cumsum(np.random.alternative([-1, 1], size=1000))`

Stochastic trends are related to unit roots, integration, and differencing.

Time series with stochastic trends are known as difference-stationary. Which means the time series may be made stationary by differencing operations. Differencing means taking the difference between consecutive values.

Difference-stationary time series are also called integrated. For instance, ARIMA (Auto-Regressive Integrated Moving Average) models contain a selected term (I) for integrated time series. This term involves applying differencing steps until the series becomes stationary.

Finally, difference-stationary or integrated time series are characterised by unit roots. Without going into mathematical details, a unit root is a characteristic of non-stationary time series.

## Forecasting Implications

Deterministic and stochastic trends have different implications for forecasting.

Deterministic trends have a continuing variance throughout time. Within the case of a linear trend, this means that the slope won’t change. But, real-world time series show complex dynamics with the trend changing over long periods. So, long-term forecasting with deterministic trend models can result in poor performance. The belief of constant variance results in narrow forecasting intervals that underestimate uncertainty.

Stochastic trends are assumed to vary over time. Consequently, the variance of a time series increases across time. This makes stochastic trends higher for long-term forecasting because they supply more reasonable uncertainty estimations.

Stochastic trends may be detected using unit root tests. For instance, the augmented Dickey-Fuller test, or the KPSS test.

## Augmented Dickey-Fuller (ADF) test

The ADF test checks whether an auto-regressive model accommodates a unit root. The hypotheses of the test are:

- Null hypothesis: There may be a unit root (the time series shouldn’t be stationary);
- Alternative hypothesis: There’s no unit root.

This test is out there in *statsmodels*:

`from statsmodels.tsa.stattools import adfuller`pvalue_adf = adfuller(x=log_gdp, regression='ct')[1]

print(pvalue_adf)

# 1.0

The parameter *regression=‘ct’ *is used to incorporate a continuing term and the deterministic trend within the model. As you’ll be able to check within the documentation, there are 4 possible alternative values to this parameter:

*c*: including a continuing term (default value);*ct*: a continuing term plus linear trend;*ctt*: constant term plus a linear and quadratic trend;*n*: no constant or trend.

Selecting which terms ought to be included is vital. A flawed inclusion or exclusion of a term can substantially reduce the facility of the test. In our case, we used the *ct *option since the log GPD series shows a linear deterministic trend behavior.

## KPSS test

The KPSS test will also be used to detect stochastic trends. The test hypotheses are opposite relative to ADF:

Null hypothesis: the time series is trend-stationary;

Alternative hypothesis: There may be a unit root.

`from statsmodels.tsa.stattools import kpss`pvalue_kpss = kpss(x=log_gdp, regression='ct')[1]

print(pvalue_kpss)

# 0.01

The KPSS rejects the null hypothesis, while ADF doesn’t. So, each tests signal the presence of a unit root. Note that a time series can have a trend with each deterministic and stochastic components.

So, how are you going to take care of unit roots?

We’ve explored the right way to use time as an explanatory variable to account for a linear trend.

One other solution to take care of trends is by differencing. As a substitute of working with absolutely the values, you model how the time series changes in consecutive periods.

A single differencing operation will likely be enough to attain stationarity. Yet, sometimes you should do that process over and over. You need to use ADF or KPSS to estimate the required variety of differencing steps. The *pmdarima *library wraps this process within the function *ndiffs*:

`from pmdarima.arima import ndiffs`# what number of differencing steps are needed for stationarity?

ndiffs(log_gdp, test='adf')

# 2

On this case, the log GPD series needs 2 differencing steps for stationarity:

`diff_log_gdp = log_gdp.diff().diff()`

Your article gave me a lot of inspiration, I hope you can explain your point of view in more detail, because I have some doubts, thank you.