Can Synthetic Data Boost Machine Learning Performance? Background — Imbalanced Datasets The Dataset The Model Generating Synthetic Data Assessing Performance with Precision Recall Charts Bootstrapping Holdout Dataset Conclusion

-

To acquire a strong view of performance on the holdout set, I created fifty bootstrapped holdout sets from the unique. Running the models related to each approach across all sets provides a distribution of performance. We are able to then determine whether each approach is statistically significantly different from the baseline using the Kolmogorov-Smirnov test.

: The weighted approach marginally underperformed across recall and AUC relative to the baseline. Along with this, the variance across each performance metric appears quite high relative to the opposite approaches.

Image by Writer: Model performance metrics over 50 bootstrapped holdout samples. Baseline vs Weighted Loss, KS stats — AUC 0.420 p-value < 0.000, precision 0.260 p-value 0.068, Recall 0.520 p-value < 0.000

: The oversampling approach improves model recall relative to baseline, but ends in a drastic deterioration of the precision.

Image by Writer: Model performance metrics over 50 bootstrapped holdout samples. Baseline vs Oversampling, KS stats — AUC 0.160 p-value 0.549, precision 1.0 p-value < 0.000, Recall 0.9 p-value < 0.000

: The approach performs worse than baseline across all metrics.

Image by Writer: Model performance metrics over 50 bootstrapped holdout samples. Baseline vs Oversampling, KS stats — AUC 0.880 p-value < 0.000, precision 0.6 p-value < 0.000, Recall 1.0 p-value < 0.000

: The synthetic method uplifts model recall, albeit at the associated fee of precision. While the impact on precision stays substantial, the synthetic approach provides a more resilient alternative for enhancing model recall with less of a detriment to precision in comparison to the oversampling approach. The robustness of the synthetic approach is further evidenced by the uplift in AUC-PR.

Image by Writer: Model performance metrics over 50 bootstrapped holdout samples. Baseline vs Synthetic, KS stats — AUC 0.620, Precision 0.560, Recall 0.360 all p-values ≤ 0.003

ASK DUKE

What are your thoughts on this topic?
Let us know in the comments below.

2 COMMENTS

0 0 votes
Article Rating
guest
2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

2
0
Would love your thoughts, please comment.x
()
x