Home Artificial Intelligence Deep Dive into PFI for Model Interpretability

Deep Dive into PFI for Model Interpretability

3
Deep Dive into PFI for Model Interpretability

One other interpretability tool on your toolbox

Photo by fabio on Unsplash

Knowing the best way to assess your model is crucial on your work as a knowledge scientist. Nobody will log off in your solution in the event you’re not capable of fully understand and communicate it to your stakeholders. For this reason knowing interpretability methods is so necessary.

The shortage of interpretability can kill a superb model. I haven’t developed a model where my stakeholders weren’t all for understanding how the predictions were made. Due to this fact, knowing the best way to interpret a model and communicate it to the business is a vital ability for a knowledge scientist.

On this post, we’re going to explore the Permutation Feature Importance (PFI), an model agnostic methodology that will help us discover what are crucial features of our model, and subsequently, communicate higher what the model is considering when doing its predictions.

The PFI method tries to estimate how necessary a feature is for model results based on what happens to the model when we modify the feature connected to the goal variable.

To try this, for every feature, we would like to investigate the importance, we random shuffle it while keeping all the opposite features and goal the identical way.

This makes the feature useless to predict the goal since we broke the connection between them by changing their joint distribution.

Then, we are able to use our model to predict our shuffled dataset. The quantity of performance reduction in our model will indicate how necessary that feature is.

The algorithm then looks something like this:

  • We train a model in a training dataset after which assess its performance on each the training and the testing dataset
  • For every feature, we create a recent dataset where the feature is shuffled
  • We then use the trained model to predict the output of the brand new dataset
  • The quotient of the brand new performance metric by the old one gives us the importance of the feature

Notice that if a feature isn’t necessary, the performance of the model shouldn’t vary quite a bit. Whether it is, then the performance must suffer quite a bit.

Now that we all know the best way to calculate the PFI, how will we interpret it?

It will depend on which fold we’re applying the PFI to. We often have two options: applying it to the training or the test dataset.

During training, our model learns the patterns of the info and tries to represent it. After all, during training, we have now no idea of how well our model generalizes to unseen data.

Due to this fact, by applying the PFI to the training dataset we’re going to see which features were essentially the most relevant for the educational of the representation of the info by the model.

In business terms, this means which features were crucial for the model construction.

Now, if we apply the tactic to the test dataset, we’re going to see the feature impact on the generalization of the model.

Let’s give it some thought. If we see the performance of the model go down within the test set after we shuffled a feature, it implies that that feature was necessary for the performance on that set. Because the test set is what we use to check generalization (in the event you’re doing every little thing right), then we are able to say that it is crucial for generalization.

The PFI analyzes the effect of a feature in your model performance, subsequently, it doesn’t state anything in regards to the raw data. In case your model performance is poor, then any relation you discover with PFI might be meaningless.

That is true for each sets, in case your model is underfitting (low prediction power on the training set) or overfitting (low prediction power on the test set) then you definately cannot take insights from this method.

Also, when two features are highly correlated the PFI can mislead your interpretation. If you happen to shuffle one feature however the required information is encoded into one other one, then the performance may not suffer in any respect, which might make you’re thinking that the feature is useless, which will not be the case.

To implement the PFI in Python we must first import our required libraries. For this, we’re going to use mainly the libraries numpy, pandas, tqdm, and sklearn:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from tqdm import tqdm
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes, load_iris
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.metrics import accuracy_score, r2_score

Now, we must load our dataset, which goes to be the Iris dataset. Then, we’re going to suit a Random Forest to the info.

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=12, shuffle=True
)

rf = RandomForestClassifier(
n_estimators=3, random_state=32
).fit(X_train, y_train)

With our model fitted, let’s analyze its performance to see if we are able to safely apply the PFI to see how the features impact our model:

print(accuracy_score(rf.predict(X_train), y_train))
print(accuracy_score(rf.predict(X_test), y_test))

We will see we achieved a 99% accuracy on the training set and a 95.5% accuracy on the test set. Looks good for now. Let’s get the unique error scores for a later comparison:

original_error_train = 1 - accuracy_score(rf.predict(X_train), y_train)
original_error_test = 1 - accuracy_score(rf.predict(X_test), y_test)

Now let’s calculate the permutation scores. For that, it is common to run the shuffle for every feature several times to realize a statistic of the feature scores to avoid any coincidences. In our case, let’s do 10 repetitions for every feature:

n_steps = 10

feature_values = {}
for feature in range(X.shape[1]):
# We are going to save each recent performance point for every feature
errors_permuted_train = []
errors_permuted_test = []

for step in range(n_steps):
# We grab the info again since the np.random.shuffle function shuffles in place
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=12, shuffle=True)
np.random.shuffle(X_train[:, feature])
np.random.shuffle(X_test[:, feature])

# Apply our previously fitted model on the brand new data to get the performance
errors_permuted_train.append(1 - accuracy_score(rf.predict(X_train), y_train))
errors_permuted_test.append(1 - accuracy_score(rf.predict(X_test), y_test))

feature_values[f'{feature}_train'] = errors_permuted_train
feature_values[f'{feature}_test'] = errors_permuted_test

Now we have now a dictionary with the performance for every shuffle we did. Now, let’s generate a table that has, for every feature in each fold, the typical and the usual deviation of the performance when put next to the unique performance of our model:

PFI = pd.DataFrame()
for feature in feature_values:
if 'train' in feature:
aux = feature_values[feature] / original_error_train
fold = 'train'
elif 'test' in feature:
aux = feature_values[feature] / original_error_test
fold = 'test'

PFI = PFI.append({
'feature': feature.replace(f'_{fold}', ''),
'pfold': fold,
'mean':np.mean(aux),
'std':np.std(aux),
}, ignore_index=True)

PFI = PFI.pivot(index='feature', columns='fold', values=['mean', 'std']).reset_index().sort_values(('mean', 'test'), ascending=False)

We are going to find yourself with something like this:

We will see that feature 2 appears to be crucial feature in our dataset for each folds, followed by feature 3. Since we’re not fixing the random seed for the shuffle function from numpy we are able to expect this number to differ.

We will then plot the importance in a graph to have a greater visualization of the importance:

The PFI is an easy methodology that may show you how to quickly discover crucial features. Go ahead and check out to use it to some model you’re developing to see the way it is behaving.

But in addition concentrate on the constraints of the tactic. Not knowing where a technique falls short will find yourself making you do an incorrect interpretation.

Also, notices that the PFI shows the importance of the feature but doesn’t states through which direction it’s influencing the model output.

So, tell me, how are you going to make use of this in your next models?

Stay tuned for more posts about interpretability methods that may improve your overall understanding of a model.

3 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here