Home Artificial Intelligence Causal Machine Learning: What Can We Accomplish with a Single Theorem?

Causal Machine Learning: What Can We Accomplish with a Single Theorem?

0
Causal Machine Learning: What Can We Accomplish with a Single Theorem?

Exploring and exploiting the seemingly innocent theorem behind Double Machine Learning

27 min read

23 hours ago

North Carolina. Image by Creator.

Causal inference, and specifically causal machine learning, is an indispensable tool that can assist us make decisions by understanding cause and effect. Optimizing prices, reducing customer churn, running targeted ad campaigns, and deciding which patients would profit most from medical treatment are all example use cases for causal machine learning.

There are a lot of techniques for causal machine learning problems, however the technique that seems to face out most is often called Double Machine Learning (DML) or Debiased/Orthogonal Machine Learning. Beyond the empirical success of DML, this method stands out due to its wealthy theoretical backing rooted in a straightforward theorem from econometrics.

In this text, we’ll unpack the theory that grounds DML through hands-on examples. We’ll discuss the intuition for DML and empirically confirm its generality on increasingly complex examples. This text is just not a tutorial on DML, as an alternative it serves as motivation for the way DML models see past mere correlation to grasp and predict cause and effect.

Causal inference is all about measuring the effect of a treatment (T) on an consequence (Y). Examples include measuring the effect of exercise on weight reduction, marketing on customer conversion, price on sales, or a medical intervention on a health consequence.

When T is randomly assigned to observations, resembling in randomized control trials (RCTs), we will directly estimate the causal relationship between T and Y, at the least in aggregate, by analyzing how Y varies with T. In other words, if T is randomly assigned, we don’t need some other details about our observations to estimate the combination effect of T on Y. In practice, we estimate this effect with techniques like linear regression where the coefficient on T, say θ1, tells us the typical change in Y for a one-unit increase in T:

If we run this regression, and T is randomly assigned, then we call θ1 the average treatment effect (ATE). On the whole, for randomly assigned…

LEAVE A REPLY

Please enter your comment!
Please enter your name here