## Hacking Granger Causality Test with ML Approaches

In time series forecasting is usually helpful to examine graphically the info at disposal. This helps us understand the dynamics of the phenomena we’re analyzing and take decisions accordingly. Despite having a colourful plot with our time series could also be fascinating, it might result in incorrect conclusions. .

As rational individuals, we are able to easily negate any form of relationship between the *number of people that died by becoming tangled of their bedsheets* and *per capita cheese consumption*. , though we will not be experts in each fields.

Those that work with data know that these patterns may occur often, also where we now have difficulties interpreting the context and discriminating between true and incorrect correlations. For that reason, the necessity for methodologies that assist in discriminating against these situations is crucial.

.

Granger-causality is built on the intuition that along with the data contained in past observations of Y2.

Testing for Granger causality doesnâ€™t mean Y1 have to be a cause for Y2. It simply signifies that past values of Y1 are ok to enhance the forecast of Y2â€™s future values. From this implication, we may derive a naive definition of causality.

The adoption of the Granger causality test implies strict assumptions on the underlying data (i.e. stationarity and linear dependency), which could also be difficult to meet in real-world applications. For that reason, on this post, .

For the scope of this post, we simulate two different time series consequently of autoregressive processes.

Each series are correlated with a few of their past timesteps (autocorrelation).

The time series exhibit an overall Pearson correlation of 0.637 with a discrete positive relationship preserved over time.

At first sight, it seems we’re within the presence of two events which have a positive connection. . It’s essentially the most commonly used statistic to measure linear relationships between variables. It’s so common that usually people wrongly interpret it trying to offer it a causal meaning. That could be a mistake! .

In our simulated scenario, the positive relationship is merely a mathematical result since we all know the 2 series are related in just one direction. More precisely, past values of Y1 are linearly related to actual values of Y2 (vice-versa shouldn’t be valid). Our scope is to make a practical demonstration of this statement.

. This is completed by running a linear model on the lagged series values.

The null hypothesis of the test states that the coefficients corresponding to past values of Y1 are zero. We reject the null hypothesis if the p-values are below a particular threshold. In that case, Y1 doesn’t Granger cause Y2.

In other words, .

As step one, we fit two autoregressive models, on each Y1 and Y2, without additional exogenous variables and store the predictions obtained on test data.

`forecaster = ForecastingCascade(`

RandomForestRegressor(30, random_state=42, n_jobs=-1),

lags=lags,

use_exog=False,

)model_y1 = clone(forecaster).fit(None, df_train['y1'])

model_y2 = clone(forecaster).fit(None, df_train['y2'])

y1_pred = np.concatenate([

model_y1.predict(

[[0.]],

last_y=df['y1'].iloc[:i]

) for i in range(len(df_train), len(df_train) + len(df_test))

])

y2_pred = np.concatenate([

model_y2.predict(

[[0.]],

last_y=df['y2'].iloc[:i]

) for i in range(len(df_train), len(df_train) + len(df_test))

])

Secondly, we repeat the identical forecasting procedure but add lagged exogenous variables (i.e. when forecasting Y1 we use past values of Y2 plus past values of Y1).

`forecaster = ForecastingCascade(`

make_pipeline(

FunctionTransformer(

lambda x: x[:,1:] # remove current values of exog series

),

RandomForestRegressor(30, random_state=42, n_jobs=-1)

),

lags=lags,

use_exog=True,

exog_lags=lags,

)model_y1y2 = clone(forecaster).fit(df_train[['y2']], df_train['y1'])

model_y2y1 = clone(forecaster).fit(df_train[['y1']], df_train['y2'])

y1y2_pred = np.concatenate([

model_y1y2.predict(

pd.DataFrame({'y2': [0.]}),

last_y=df['y1'].iloc[:i],

last_X=df[['y2']].iloc[:i]

) for i in range(len(df_train), len(df_train) + len(df_test))

])

y2y1_pred = np.concatenate([

model_y2y1.predict(

pd.DataFrame({'y1': [0.]}),

last_y=df['y2'].iloc[:i],

last_X=df[['y1']].iloc[:i]

) for i in range(len(df_train), len(df_train) + len(df_test))

])

At the tip of the forecasting phase, we store the predictions of 4 different models (two for forecasting Y1 and the opposite two for forecasting Y2). Itâ€™s time for a results comparison.

Squared residuals are computed on the sample level for all of the prediction types. The distributions of the squared residuals are analyzed together for a similar prediction goal. We use the usual Kolmogorov-Smirnov test to ascertain for distribution divergencies.

The forecasts for Y1 look like the identical with and without the addition of Y2’s features.

Quite the opposite, the forecasts of Y2 are significative different with and without the addition of Y1â€™s features. That signifies that Y1 has a positive impact in predicting Y2, i.e. Y1 Granger cause Y2 (the vice-versa shouldn’t be true).

On this post, we proposed an alternative choice to the usual Granger causality test to confirm causation dynamics within the time series domain. We didnâ€™t stop looking only on the Pearson correlation coefficient to return to conclusions on the info. We analyzed, in an empirical way, the possible presence of reciprocal influences of events at our disposal spotting spurious relationships. The benefit of use of the proposed methodology and its adaptability, with low assumptions, make it suitable to be adopted in any time series analytic journey.