Machine Learning, Illustrated: Opening Black Box Models with SHAP

Artificial Intelligence

Machine Learning, Illustrated: Opening Black Box Models with SHAP

admin

May 10, 2023

Machine Learning, Illustrated: Opening Black Box Models with SHAP

Now that we understand the underlying calculations of SHAP, we are able to apply it to our predictions by visualizing them. To visualise them, we’ll use from Python’s library and input our model.

import shap
explainer = shap.Explainer(model)
shap_values = explainer(X_test)

It will give us the SHAP values for all of the features (MedInc, HouseAge, AveRooms, Latitude, and Longitude) for every sample within the test set. Using these SHAP values, let’s get plotting.

1. Waterfall Plot

This plot helps us visualize the SHAP values of every sample in our data individually. Let’s visualize the SHAP values of the primary sample.

# visualize the SHAP values of the primary sample
shap.plots.waterfall(shap_values[0])

Notice that the expected predicted value, E[f(X)] = 2.07, which is the worth calculated in Step 0, and the expected house price of the primary house, f(X) = 1.596 is our prediction of the primary sample.

Observe that the SHAP value of MedInc is +0.22 (which we see in Step 5, the SHAP value of Longitude is -2.35, etc. If we add and subtract all of the above SHAP values to and from 2.07, respectively, we arrive at the expected value of 1.596 for the primary house.

Ignoring the signs, the magnitude of the SHAP value for Longitude, 2.35, is larger than that of the opposite features. This implied that Longitude has essentially the most significant impact on the prediction.

Identical to we visualized the SHAP values of the primary sample, we are able to visualize SHAP values for the second sample too.

# visualize the SHAP values of the second sample
shap.plots.waterfall(shap_values[0])

Comparing the SHAP values for the primary and second houses in our test data, we observe significant differences. In the primary house, Longitude had essentially the most significant impact on the expected price, while within the second house, MedInc has essentially the most distinguished influence.

NOTE: These differences within the SHAP values highlight the unique contributions of every feature to the model’s output for every sample. It’s essential to know these contributions to construct trust within the model’s decision-making process and be sure that it’s not biased or discriminatory.

2. Force Plot

One other method to visualize the above is a force plot.

shap.plots.force(shap_values[0])

3. Mean SHAP Plot

To find out which features are generally most vital for our model’s predictions, we are able to use a bar plot of the mean SHAP values across all observations. Taking the mean of absolutely the values ensures that positive and negative values don’t cancel one another out.

shap.plots.bar(shap_values)

Each feature has a corresponding bar, with the peak representing the mean SHAP value. As an example, in our plot, the feature with the most important mean SHAP value is Latitude, indicating that it has essentially the most substantial impact on our model’s predictions. This information will help us understand which features are critical to the model’s decision-making process.

4. Beeswarm Plot

The beeswarm plot is a useful visualization to look at all the SHAP values for every feature. The y-axis groups the SHAP values by feature, with the colour of the points indicating the corresponding feature value. Typically, redder points represent higher feature values.

The beeswarm plot will help discover vital relationships between features and the model’s predictions. On this plot, the features are ordered by their mean SHAP values.

shap.plots.beeswarm(shap_values)

By examining the SHAP values within the beeswarm plot, we are able to start to know the character of the relationships between the features and the expected house price. As an example, for MedInc, we observe that SHAP values increase because the feature value increases. This implies that higher values of MedInc contribute to higher predicted house prices.

In contrast, for the Latitude and Longitude, we notice the other trend, where higher feature values result in lower SHAP values. This remark implies that higher Latitude and Longitude values are related to lower predicted house prices.

5. Dependence Plots

To achieve a deeper understanding of the relationships between individual features and their corresponding SHAP values, we are able to create dependence plots. A dependence plot is a scatter plot that shows the connection between the SHAP value and the feature value for a single feature.

shap.plots.scatter(shap_values[:,"MedInc"])

shap.plots.scatter(shap_values[:,"Latitude"])

By analyzing dependence plots, we are able to confirm the observations made within the beeswarm plot. As an example, after we create a dependence plot for MedInc, we observe a positive relationship between MedInc values and SHAP values. In other words, higher MedInc values lead to higher predicted house prices.

Overall, the dependence plots provide a more detailed understanding of the complex relationships between individual features and predicted house prices.