Overfitting vs. Underfitting: Making Sense of the Bias-Variance Trade-Off

-

models is a bit like cooking: too little seasoning and the dish is bland, an excessive amount of and it’s overpowering. The goal? That perfect balance – simply enough complexity to capture the flavour of the info, but not a lot that it’s overwhelming.

On this post, we’ll dive into two of essentially the most common pitfalls in model development: overfitting and underfitting. Whether you’re training your first model or tuning your hundredth, keeping these concepts in check is essential to constructing models that truly work in the actual world.

Overfitting

What’s overfitting?

Overfitting is a typical issue with data science models. It happens when the model learns too well from trained data, meaning that it learns from patterns specific to trained data and noise. Subsequently, it will not be capable of predict well based on unseen data.

Why is overfitting a difficulty?

  1. Poor performance: The model will not be capable of generalise well. The patterns it has detected during training aren’t applicable to the remaining of the info. You get the impression that the model is working great based on training errors, when the truth is the test or real-world errors aren’t that optimistic.
  2. Predictions with high variance: The model performance is unstable and the predictions aren’t reliable. Small adjustments to the info cause high variance within the predictions being made.
  3. Training a posh and expensive model: Training and constructing a posh model in production is an expensive and high-resource job. If a less complicated model performs just as well, it’s more efficient to make use of it as an alternative.
  4. Risk of losing business trust: Data scientists who’re overly optimistic when experimenting with latest models may overpromise results to business stakeholders. If overfitting is discovered only after the model has been presented, it may possibly significantly damage credibility and make it difficult to regain trust within the model’s reliability.

Easy methods to discover overfitting

  1. Cross-validation: During cross-validation, the input data is split into several folds (sets of coaching and testing data). Different folds of the input data should give similar testing error results. A big gap in performance across folds may indicate model instability or data leakage, each of which may be symptoms of overfitting.
  2. Keep track of the training, testing and generalisation errors. The error when the model is deployed (generalisation error) shouldn’t deviate largely from the errors you already know of. If you must go the additional mile, consider implementing a monitoring alert if the deployed model’s performance deviates significantly from the validation set error.

Easy methods to mitigate/ prevent overfitting

  1. Remove features: Too many features might “guide” the model an excessive amount of, subsequently resulting to a model that will not be capable of generalise well.
  2. Increase training data: Providing more examples to learn from, the model learns to generalise higher and it’s less sensitive to outliers and noise.
  3. Increase regularisation: Regularisation techniques assist by penalising the already inflated coefficients. This protects the model from fitting too closely to the info.
  4. Adjust hyper-parameters: Certain hyper-parameters which can be fitted an excessive amount of, might lead to a model that will not be capable of generalise well.

Underfitting

What’s underfitting?

Underfitting happens when the character of the model or the features are too simplistic to capture the underlying data well. It also leads to poor predictions in unseen data.

Why is underfitting problematic?

  1. Poor performance: The model performs poorly on training data, subsequently poorly also on test and real-world data.
  2. Predictions with high bias: The model is incapable of creating reliable predictions.

Easy methods to discover underfitting

  1. Training and test errors shall be poor.
  2. Generalisation error shall be high, and possibly near the training error.

Easy methods to fix underfitting

  1. Enhance features: Introduce latest features, or add more sophisticated features (e.g.: add interaction effects/ polynomial terms/ seasonality terms) which is able to capture more complex patterns within the underlying data
  2. Increase training data: Providing more examples to learn from, the model learns to generalise higher and it’s less sensitive to outliers and noise.
  3. Reduce regularisation power: When applying a regularisation technique that is just too powerful, the features grow to be too uniform and the model doesn’t prioritise any feature, stopping it from learning vital patterns.
  4. Adjust hyper-parameters: An intrinsically complex model with poor hyper-parameters may not find a way to capture all of the complexity. Paying more attention to adjusting them could also be helpful (e.g. add more trees to a random forest).
  5. If all other options don’t fix the underlying issue, it could be worthwhile tossing the model and replacing it with one which is capable of capture more complex patterns in data.

Summary

Machine learning isn’t magic, it’s a balancing act between an excessive amount of and too little. Overfit your model, and it becomes a perfectionist that may’t handle latest situations. Underfit it, and it misses the purpose entirely.

The most effective models live within the sweet spot: generalising well, learning enough, but not an excessive amount of. By understanding and managing overfitting and underfitting, you’re not only improving metrics, you’re constructing trust, reducing risk, and creating solutions that last beyond the training set.

Resources

[1] https://medium.com/@SyedAbbasT/what-is-overfitting-underfitting-regularization-371b0afa1a2c

[2] https://www.datacamp.com/blog/what-is-overfitting

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x