10 Sklearn Treasure Features Neglected By 99% of Online Courses

Artificial Intelligence

10 Sklearn Treasure Features Neglected By 99% of Online Courses

admin

June 30, 2023

10 Sklearn Treasure Features Neglected By 99% of Online Courses

1️0️. PCA + tSNE/UMAP

More data doesn’t necessarily mean higher models. Some datasets are only too large, and you can do well without using them to the fullest. But should you aren’t comfortable setting aside a part of the information, I suggest using dimensionality reduction techniques to project the information to a lower space.

A rise in model performance shouldn’t be guaranteed, but in the long term, you get to run rather more experiments on a smaller dataset because you’ll have lower RAM usage, and computation times can be much shorter.

But the issue is, quality dimensionality reduction can take too long if there are numerous features within the dataset. You won’t get it right on the primary try, so more experimentation can be much more costly time-wise.

That’s why Sklearn documentation suggests combining dimensionality reduction algorithms with PCA (Principal Component Evaluation).

PCA works fast for any variety of dimensions, making it ideal for a first-stage reduction. It is strongly recommended to project the information to an affordable variety of dimensions, like 30–50 with PCA, after which use other algorithms to cut back much more, like tSNE or UMAP.

Below is the mixture of PCA and tSNE:

Reducing synthetically generated dataset with 300 features to simply 2, using a mixture of PCA and tSNE.

On an artificial dataset with 1M rows and ~300 features, projecting the information to the primary 30 dimensions after which to 2 dimensions took 4.5 hours. Unfortunately, the outcomes aren’t pretty:

10 Sklearn Treasure Features Neglected By 99% of Online Courses

1️0️. PCA + tSNE/UMAP

1 COMMENT

LEAVE A REPLY Cancel reply