Home Artificial Intelligence Can Synthetic Data Boost Machine Learning Performance? Background — Imbalanced Datasets The Dataset The Model Generating Synthetic Data Assessing Performance with Precision Recall Charts Bootstrapping Holdout Dataset Conclusion

Can Synthetic Data Boost Machine Learning Performance? Background — Imbalanced Datasets The Dataset The Model Generating Synthetic Data Assessing Performance with Precision Recall Charts Bootstrapping Holdout Dataset Conclusion

8
Can Synthetic Data Boost Machine Learning Performance?
Background — Imbalanced Datasets
The Dataset
The Model
Generating Synthetic Data
Assessing Performance with Precision Recall Charts
Bootstrapping Holdout Dataset
Conclusion

To acquire a strong view of performance on the holdout set, I created fifty bootstrapped holdout sets from the unique. Running the models related to each approach across all sets provides a distribution of performance. We are able to then determine whether each approach is statistically significantly different from the baseline using the Kolmogorov-Smirnov test.

: The weighted approach marginally underperformed across recall and AUC relative to the baseline. Along with this, the variance across each performance metric appears quite high relative to the opposite approaches.

Image by Creator: Model performance metrics over 50 bootstrapped holdout samples. Baseline vs Weighted Loss, KS stats — AUC 0.420 p-value < 0.000, precision 0.260 p-value 0.068, Recall 0.520 p-value < 0.000

: The oversampling approach improves model recall relative to baseline, but ends in a drastic deterioration of the precision.

Image by Creator: Model performance metrics over 50 bootstrapped holdout samples. Baseline vs Oversampling, KS stats — AUC 0.160 p-value 0.549, precision 1.0 p-value < 0.000, Recall 0.9 p-value < 0.000

: The approach performs worse than baseline across all metrics.

Image by Creator: Model performance metrics over 50 bootstrapped holdout samples. Baseline vs Oversampling, KS stats — AUC 0.880 p-value < 0.000, precision 0.6 p-value < 0.000, Recall 1.0 p-value < 0.000

: The synthetic method uplifts model recall, albeit at the associated fee of precision. While the impact on precision stays substantial, the synthetic approach provides a more resilient alternative for enhancing model recall with less of a detriment to precision compared to the oversampling approach. The robustness of the synthetic approach is further evidenced by the uplift in AUC-PR.

Image by Creator: Model performance metrics over 50 bootstrapped holdout samples. Baseline vs Synthetic, KS stats — AUC 0.620, Precision 0.560, Recall 0.360 all p-values ≤ 0.003

8 COMMENTS

  1. … [Trackback]

    […] Read More on that Topic: bardai.ai/artificial-intelligence/can-synthetic-data-boost-machine-learning-performancebackground-imbalanced-datasetsthe-datasetthe-modelgenerating-synthetic-dataassessing-performance-with-precision-recall-chartsboots-2/…

  2. … [Trackback]

    […] Find More to that Topic: bardai.ai/artificial-intelligence/can-synthetic-data-boost-machine-learning-performancebackground-imbalanced-datasetsthe-datasetthe-modelgenerating-synthetic-dataassessing-performance-with-precision-recall-chartsboots-2/…

  3. … [Trackback]

    […] Read More here to that Topic: bardai.ai/artificial-intelligence/can-synthetic-data-boost-machine-learning-performancebackground-imbalanced-datasetsthe-datasetthe-modelgenerating-synthetic-dataassessing-performance-with-precision-recall-chartsboot…

  4. … [Trackback]

    […] Read More on to that Topic: bardai.ai/artificial-intelligence/can-synthetic-data-boost-machine-learning-performancebackground-imbalanced-datasetsthe-datasetthe-modelgenerating-synthetic-dataassessing-performance-with-precision-recall-chartsboots-…

  5. … [Trackback]

    […] Find More Info here to that Topic: bardai.ai/artificial-intelligence/can-synthetic-data-boost-machine-learning-performancebackground-imbalanced-datasetsthe-datasetthe-modelgenerating-synthetic-dataassessing-performance-with-precision-recall-chart…

  6. … [Trackback]

    […] Read More on that Topic: bardai.ai/artificial-intelligence/can-synthetic-data-boost-machine-learning-performancebackground-imbalanced-datasetsthe-datasetthe-modelgenerating-synthetic-dataassessing-performance-with-precision-recall-chartsboots-2/…

LEAVE A REPLY

Please enter your comment!
Please enter your name here