Home Artificial Intelligence Demystifying Bayesian Models: Unveiling Explanability through SHAP Values The Gap between Bayesian Models and Explainability Bayesian modelization with PyMC Explain the model with SHAP Conclusion

Demystifying Bayesian Models: Unveiling Explanability through SHAP Values The Gap between Bayesian Models and Explainability Bayesian modelization with PyMC Explain the model with SHAP Conclusion

0
Demystifying Bayesian Models: Unveiling Explanability through SHAP Values
The Gap between Bayesian Models and Explainability
Bayesian modelization with PyMC
Explain the model with SHAP
Conclusion

Exploring PyMC’s Insights with SHAP Framework via an Engaging Toy Example

SHAP values (SHapley Additive exPlanations) are a game-theory-based method used to extend the transparency and interpretability of machine learning models. Nevertheless, this method, together with other machine learning explainability frameworks, has rarely been applied to Bayesian models, which offer a posterior distribution capturing uncertainty in parameter estimates as an alternative of point estimates utilized by classical machine learning models.

While Bayesian models offer a versatile framework for incorporating prior knowledge, adjusting for data limitations, and making predictions, they’re unfortunately difficult to interpret using SHAP. SHAP regards the model as a game and every feature as a player in that game, however the Bayesian model shouldn’t be a game. It’s fairly an ensemble of games whose parameters come from the posterior distributions. How can we interpret a model when it’s greater than a game?

This text attempts to elucidate a Bayesian model using the SHAP framework through a toy example. The model is built on PyMC, a probabilistic programming library for Python that enables users to construct Bayesian models with an easy Python API and fit them using Markov chain Monte Carlo.

The foremost idea is to use SHAP to an ensemble of deterministic models generated from a Bayesian network. For every feature, we might obtain one sample of the SHAP value from a generated deterministic model. The explainability can be given by the samples of all obtained SHAP values. We are going to illustrate this approach with an easy example.

All of the implementations may be present in this notebook .

Dataset

Consider the next dataset created by the creator, which comprises 250 points: the variable y will depend on x1 and x2, each of which vary between 0 and 5. The image below illustrates the dataset:

Image by creator: Dataset

Let’s quickly explore the information using a pair plot. From this, we are able to observe the next:

  1. The variables x1 and x2 usually are not correlated.
  2. Each variables contribute to the output y to some extent. That’s, a single variable shouldn’t be enough to acquire y.
Image by creator: pair plot of the information

Modelization with PyMC

Let’s construct a Bayesian model with PyMC. Without going into the small print which you can find in any statistical book, we’ll simply recall that the training strategy of Bayesian machine learning models involves updating the model’s parameters based on observed data and prior knowledge using Bayesian rules.

We define the model’s structure as follows:

https://cdn-images-1.medium.com/max/1600/1*lgIcHZ58UOLWt46g8elHtw.gif
Image by creator: model structure

Defining the priors and likelihood above, we’ll use the PyMC standard sampling algorithm NUTS designed to routinely tune its parameters, resembling the step size and the variety of leapfrog steps, to realize efficient exploration of the goal distribution. It repeats a tree exploration to simulate the trajectory of the purpose within the parameter space and determine whether to simply accept or reject a sample. Such iteration stops either when the utmost variety of iterations is reached or the extent of convergence is achieved.

You possibly can see within the code below that we arrange the priors, define the likelihood, after which run the sampling algorithm using PyMC.

Let’s construct a Bayesian model using PyMC. Bayesian machine learning model training involves updating the model’s parameters based on observed data and prior knowledge using Bayesian rules. We won’t go into detail here, as you will discover it in any statistical book.

We will define the model’s structure as shown below:

https://cdn-images-1.medium.com/max/1600/1*lgIcHZ58UOLWt46g8elHtw.gif
Image by creator: model structure

For the priors and likelihood defined above, we’ll use the PyMC standard sampling algorithm NUTS. This algorithm is designed to routinely tune its parameters, resembling the step size and the variety of leapfrog steps, to realize efficient exploration of the goal distribution. It repeats a tree exploration to simulate the trajectory of the purpose within the parameter space and determine whether to simply accept or reject a sample. The iteration stops either when the utmost variety of iterations is reached or the extent of convergence is achieved.

Within the code below, we arrange the priors, define the likelihood, after which run the sampling algorithm using PyMC.

with pm.Model() as model:

# Set priors.
intercept=pm.Uniform(name="intercept",lower=-10, upper=10)
x1_slope=pm.Uniform(name="x1_slope",lower=-5, upper=5)
x2_slope=pm.Uniform(name="x2_slope",lower=-5, upper=5)
interaction_slope=pm.Uniform(name="interaction_slope",lower=-5, upper=5)
sigma=pm.Uniform(name="sigma", lower=1, upper=5)

# Set likelhood.
likelihood = pm.Normal(name="y", mu=intercept + x1_slope*x1+x2_slope*x2+interaction_slope*x1*x2,
sigma=sigma, observed=y)
# Configure sampler.
trace = pm.sample(5000, chains=5, tune=1000, target_accept=0.87, random_seed=SEED)

The trace plot below displays the posteriors of the parameters within the model.

Image by creator: posterior of the model

We now need to implement SHAP on the model described above. Note that for a given input (x1, x2), the model’s output y is a probability conditional on the parameters. Thus, we are able to obtain a deterministic model and corresponding SHAP values for all features by drawing one sample from the obtained posteriors. Alternatively, if we draw an ensemble of parameter samples, we are going to get an ensemble of deterministic models and, subsequently, samples of SHAP values for all features.

The posteriors may be obtained using the next code, where we draw 200 samples per chain:

with model: 
idata = pm.sample_prior_predictive(samples=200, random_seed=SEED)
idata.extend(pm.sample(200, tune=2000, random_seed=SEED)here

Here is the table of the information variables from the posteriors:

Image by creator: samples from posteriors

Next, we compute one pair of SHAP values for every drawn sample of model parameters. The code below loops over the parameters, defines one model for every parameter sample, and computes the SHAP values of x_test=(2,3) of interest.

background=np.hstack((x1.reshape((250,1)),x2.reshape((250,1))))
shap_values_list=[]
x_test=np.array([2,3]).reshape((-1,2))
for i in range(len(pos_intercept)):
model=SimpleModel(intercept=pos_intercept[i],
x1_slope=pos_x1_slope[i],
x2_slope=pos_x2_slope[i],
interaction_slope=pos_interaction_slope[i],
sigma=pos_sigma[i])
explainer = shap.Explainer(model.predict, background)
shap_values = explainer(x_test)
shap_values_list.append(shap_values.values)

The resulting ensemble of the two-dimensional SHAP values of the input is shown below:

Image by creator: SHAP values samples

From the plot above, we are able to infer the next:

  1. The SHAP values of each dimensions form roughly a traditional distribution.
  2. The primary dimension has a positive contribution (-1.75 as median) to the model, while the second has a negative contribution (3.45 as median). Nevertheless, the second dimension’s contribution has a much bigger absolute value.

This text explores the usage of SHAP values, a game-theory-based method for increasing transparency and interpretability of machine learning models, in Bayesian models. A toy example is used to exhibit how SHAP may be applied to a Bayesian network.

Please note that SHAP is model-agnostic. Due to this fact, with changes to its implementation, it could be possible to use SHAP on to the Bayesian model itself in the longer term.

LEAVE A REPLY

Please enter your comment!
Please enter your name here