Home Artificial Intelligence Which Features Are Harmful For Your Classification Model?

Which Features Are Harmful For Your Classification Model?

1
Which Features Are Harmful For Your Classification Model?

Feature importance is probably the most common tool for explaining a machine learning model. It’s so popular that many data scientists find yourself believing that feature importance equals feature goodness.

It just isn’t so.

When a feature is significant, it simply implies that the model found it useful within the training set. Nevertheless, this doesn’t say anything concerning the ability of the feature to generalize on latest data!

To account for that, we want to make a distinction between two concepts:

  • Prediction Contribution: the burden that a variable has within the predictions made by the model. This is decided by the patterns that the model found on the training set. That is comparable to feature importance.
  • Error Contribution: the burden that a variable has within the errors made by the model on a holdout dataset. This can be a higher proxy of the feature performance on latest data.

In this text, I’ll explain the logic behind the calculation of those two quantities on a classification model. I will even show an example wherein using Error Contribution for feature selection results in a much better result, in comparison with using Prediction Contribution.

When you are more all for regression somewhat than classification, you possibly can read my previous article “Your Features Are Vital? It Doesn’t Mean They Are Good”.

  1. Ranging from a toy example
  2. Which “error” should we use for classification models?
  3. How should we manage SHAP values in classification models?
  4. Computing “Prediction Contribution”
  5. Computing “Error Contribution”
  6. An actual dataset example
  7. Proving it really works: Recursive Feature Elimination with “Error Contribution”
  8. Conclusions

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here