Sensitivity Evaluation for Unobserved Confounding

Artificial Intelligence

Sensitivity Evaluation for Unobserved Confounding

admin

February 13, 2024

Sensitivity Evaluation for Unobserved Confounding

know the unknowable in observational studies

Introduction
Problem Setup
2.1. Causal Graph
2.2. Model With and Without Z
2.3. Strength of Z as a Confounder
Sensitivity Evaluation
3.1. Goal
3.2. Robustness Value
PySensemakr
Conclusion
Acknowledgements
References

The specter of unobserved confounding (aka omitted variable bias) is a notorious problem in observational studies. In most observational studies, unless we are able to reasonably assume that treatment project is as-if random as in a natural experiment, we are able to never be truly certain that we controlled for all possible confounders in our model. In consequence, our model estimates could be severely biased if we fail to manage for a vital confounder–and we wouldn’t even realize it for the reason that unobserved confounder is, well, unobserved!

Given this problem, it will be significant to evaluate how sensitive our estimates are to possible sources of unobserved confounding. In other words, it’s a helpful exercise to ask ourselves: how much unobserved confounding would there should be for our estimates to drastically change (e.g., treatment effect now not statistically significant)? Sensitivity evaluation for unobserved confounding is an lively area of research, and there are several approaches to tackling this problem. On this post, I’ll cover an easy linear method [1] based on the concept of partial R² that’s widely applicable to a big spectrum of cases.

2.1. Causal Graph

Allow us to assume that we now have 4 variables:

Y: final result
D: treatment
X: observed confounder(s)
Z: unobserved confounder(s)

This can be a common setting in lots of observational studies where the researcher is eager about knowing whether the treatment of interest has an effect on the final result after controlling for possible treatment-outcome confounders.

In our hypothetical setting, the connection between these variables are such that X and Z each affect D and Y, but D has no effect on Y. In other words, we’re describing a scenario where the true treatment effect is null. As will develop into clear in the following section, the aim of sensitivity evaluation is having the ability to reason about this treatment effect when we now have no access to Z, as we normally won’t because it’s unobserved. Figure 1 visualizes our setup.

Figure 1: Problem Setup

2.2. Model With and Without Z

To exhibit the issue that our unobserved Z could cause, I simulated some data in step with the issue setup described above. You may check with this notebook for the small print of the simulation.

Since Z can be unobserved in real life, the one model we are able to normally fit to data is Y~D+X. Allow us to see what results we get if we run that regression.

Based on these results, it looks as if D has a statistically significant effect of 0.2686 (p<0.001) per one unit change on Y, which we all know isn’t true based on how we generated the information (no D effect).

Now, let’s see what happens to our D estimate after we control for Z as well. (In real life, we in fact won’t have the ability to run this extra regression since Z is unobserved but our simulation setting allows us to peek behind the scenes into the true data generation process.)

As expected, controlling for Z appropriately removes the D effect by shrinking the estimate towards zero and giving us a p-value that isn’t any longer statistically significant on the 𝛼=0.05 threshold (p=0.059).

2.3. Strength of Z as a Confounder

At this point, we now have established that Z is robust enough of a confounder to eliminate the spurious D effect for the reason that statistically significant D effect disappears after we control for Z. What we haven’t discussed yet is precisely how strong Z is as a confounder. For this, we are going to utilize a useful statistical concept called partial R², which quantifies the proportion of variation that a given variable of interest can explain that may’t already be explained by the present variables in a model. In other words, partial R² tells us the added explanatory power of that variable of interest, above and beyond the opposite variables which might be already within the model. Formally, it could actually be defined as follows

where RSS_reduced is the residual sum of squares from the model that doesn’t include the variable(s) of interest and RSS_full is the residual sum of squares from the model that features the variable(s) of interest.

In our case, the variable of interest is Z, and we would really like to know what quantity of the variation in Y and D that Z can explain that may’t already be explained by the present variables. More precisely, we’re eager about the next two partial R² values

where (1) quantifies the proportion of variance in Y that could be explained by Z that may’t already be explained by D and X (so the reduced model is Y~D+X and the total model is Y~D+X+Z), and (2) quantifies the proportion of variance in D that could be explained by Z that may’t already be explained by X (so the reduced model is D~X and the total model is D~X+Z).

Now, allow us to see how strongly associated Z is with D and Y in our data by way of partial R².

It seems that Z explains 16% of the variation in Y that may’t already be explained by D and X (that is partial R² equation #1 above), and 20% of the variation in D that may’t already be explained by X (that is partial R² equation #2 above).

3.1. Goal

As we discussed within the previous section, unobserved confounding poses an issue in real research settings precisely because, unlike in our simulation setting, Z can’t be observed. In other words, we’re stuck with the model Y~D+X, having no strategy to know what our results would have been if we could run the model Y~D+X+Z as an alternative. So, what can we do?

Intuitively, an inexpensive sensitivity evaluation approach should have the ability to inform us that if a Z reminiscent of the one we now have in our data were to exist, it might nullify our results. Do not forget that our Z explains 16% of the variation in Y and 20% of the variation in D that may’t be explained by observed variables. Subsequently, we expect sensitivity evaluation to inform us that a hypothetical Z-like confounder of comparable strength can be enough to eliminate the statistically significant D effect.

But how can we calculate that the unobserved confounder’s strength needs to be on this 16–20% range within the partial R² scale without ever getting access to it? Enter robustness value.

3.2. Robustness Value

Robustness value (RV) formalizes the thought we mentioned above of determining the obligatory strength of a hypothetical unobserved confounder that would nullify our results. The usefulness of RV emanates from the indisputable fact that we only need our observable model Y~D+X and never the unobservable model Y~D+X+Z to have the ability to calculate it.

Formally, we are able to write down as follows the RV that quantifies how strong unobserved confounding must be to alter our observed statistical significance of the treatment effect (if the notation is simply too much to follow, just remember the important thing concept that the RV is a measure of the strength of confounding needed to alter our results)

Image by creator, equations based on [1], see pages 49–52

where

𝛼 is our chosen significance level (generally set to 0.05 or 5%),
q determines the percent reduction q*100% in significance that we care about (generally set to 1, since we normally care about confounding that would scale back statistical significance by 1*100%=100% hence rendering it not statistically significant),
t_betahat_treat is the observed t-value of our treatment from the model Y~D+X (which is 8.389 on this case as could be seen from the regression results above),
df is our degrees of freedom (which is 1000–3=997 on this case since we simulated 1000 samples and are estimating 3 parameters including the intercept), and
t*_alpha,df-1 is the t-value threshold related to a given 𝛼 and df-1 (1.96 if 𝛼 is about to 0.05).

We are actually able to calculate the RV in our own data using only the observed model Y~D+X (res_ydx).

It’s by no struck of luck that our RV (18%) falls right within the range of the partial R² values we calculated for Y~Z|D,X (16%) and D~Z|X (20%) above. What the RV is telling us here is that, even with none explicit knowledge of Z, we are able to still reason that any unobserved confounder needs, on average, at the very least 18% strength within the partial R² scale vis-à-vis each the treatment and the final result to have the ability to nullify our statistically significant result.

The rationale why the RV isn’t 16% or 20% but falls somewhere in between (18%) is that it’s designed to be a single number that summarizes the obligatory strength of the confounder with each the final result and the treatment, so 18% makes perfect sense given what we all know in regards to the data. You may give it some thought like this: for the reason that method doesn’t have access to the actual numbers 16% and 20% when calculating the RV, it’s doing its best to quantify the strength of the confounder by assigning 18% to each partial R² values (Y~Z|D,X and D~Z|X), which isn’t too far off from the reality in any respect and really does an excellent job summarizing the strength of the confounder.

In fact, in real life we won’t have the Z variable to double check that our RV is correct, but seeing how the 2 results align here should at the very least provide you with some confidence in the strategy. Finally, once we calculate the RV, we should always take into consideration whether an unobserved confounder of that strength is plausible. In our case, the reply is ‘yes’ because we now have access to the information generation process, but in your specific real-life application, the existence of such a powerful confounder may be an unreasonable assumption. This may be excellent news for you since no realistic unobserved confounder could drastically change your results.

The sensitivity evaluation technique described above has already been implemented with all of its bells and whistles as a Python package under the name PySensemakr (R, Stata, and Shiny App versions exist as well). For instance, to get the very same result that we manually calculated within the previous section, we are able to simply run the next code chunk.

Note that “Robustness Value, q = 1 alpha = 0.05” is 0.184, which is precisely what we calculated above. Along with the RV for statistical significance, the package also provides the RV that is required for the coefficient estimate itself to shrink to 0. Not surprisingly, unobserved confounding must be even larger for this to occur (0.233 vs 0.184).

The package also provides contour plots for the 2 partial R² values, which allows for an intuitive visual display of sensitivity to possible levels of confounding with the treatment and the final result (on this case, it shouldn’t be surprising to see that the x/y-axis value pairs that meet the red dotted line include 0.18/0.18 in addition to 0.20/0.16).

One may even add benchmark values to the contour plot as proxies for possible amounts of confounding. In our case, since we only have one observed covariate X, we are able to set our benchmarks to be 0.25x, 0.5x and 1x as strong as that observed covariate. The resulting plot tells us that a confounder that’s half as strong as X needs to be enough to nullify our statistically significant result (for the reason that “0.5x X” value falls right on the red dotted line).

Finally, I would really like to notice that while the simulated data in this instance used a continuous treatment variable, in practice the strategy works for any sort of treatment variable including binary treatments. Then again, the final result variable technically must be a continuous one since we’re operating within the OLS framework. Nonetheless, the strategy can still be used even with a binary final result if we model it using OLS (this is known as a LPM [2]).

The likelihood that our effect estimate could also be biased because of unobserved confounding is a typical danger in observational studies. Despite this potential danger, observational studies are an important tool in data science because randomization simply isn’t feasible in lots of cases. Subsequently, it will be significant to understand how we are able to address the difficulty of unobserved confounding by running sensitivity analyses to see how robust our estimates are to potential such confounding.

The robustness value method by Cinelli and Hazlett discussed on this post is a straightforward and intuitive approach to sensitivity evaluation formulated in a well-recognized linear model framework. Should you are eager about learning more in regards to the method, I highly recommend taking a take a look at the unique paper and the package documentation where you possibly can study many more interesting applications of the strategy reminiscent of ‘extreme scenario’ evaluation.

There are also many other approaches to sensitivity evaluation for unobserved confounding, and I would really like briefly mention a few of them here for readers who would really like to proceed learning more on this topic. One versatile technique is the E-value developed by VanderWeele and Ding that formulates the issue by way of risk ratios [3] (implemented in R here). One other technique is the Austen plot developed by Veitch and Zaveri based on the concepts of partial R² and propensity rating [4] (implemented in Python here), and yet one more recent approach is by Chernozhukov et al [5] (implemented in Python here).

I would really like to thank Chad Hazlett for answering my query related to using the strategy with binary outcomes and Xinyi Zhang for providing numerous worthwhile feedback on the post. Unless otherwise noted, all images are by the creator.

[1] C. Cinelli and C. Hazlett, Making Sense of Sensitivity: Extending Omitted Variable Bias (2019), Journal of the Royal Statistical Society

[2] J. Murray, Linear Probability Model, Murray’s personal website

[3] T. VanderWeele and P. Ding, Sensitivity Evaluation in Observational Research: Introducing the E-Value (2017), Annals of Internal Medicine

[4] V. Veitch and A. Zaveri, Sense and Sensitivity Evaluation: Easy Post-Hoc Evaluation of Bias As a result of Unobserved Confounding (2020), NeurIPS

[5] V. Chernozhukov, C. Cinelli, W. Newey, A. Sharma, and V. Syrgkanis, Long Story Short: Omitted Variable Bias in Causal Machine Learning (2022), NBER