Home Artificial Intelligence 10 Sklearn Treasure Features Neglected By 99% of Online Courses

10 Sklearn Treasure Features Neglected By 99% of Online Courses

1
10 Sklearn Treasure Features Neglected By 99% of Online Courses

1️0️. PCA + tSNE/UMAP

More data doesn’t necessarily mean higher models. Some datasets are only too large, and you can do well without using them to the fullest. But should you aren’t comfortable setting aside a part of the information, I suggest using dimensionality reduction techniques to project the information to a lower space.

A rise in model performance shouldn’t be guaranteed, but in the long term, you get to run rather more experiments on a smaller dataset because you’ll have lower RAM usage, and computation times can be much shorter.

But the issue is, quality dimensionality reduction can take too long if there are numerous features within the dataset. You won’t get it right on the primary try, so more experimentation can be much more costly time-wise.

That’s why Sklearn documentation suggests combining dimensionality reduction algorithms with PCA (Principal Component Evaluation).

PCA works fast for any variety of dimensions, making it ideal for a first-stage reduction. It is strongly recommended to project the information to an affordable variety of dimensions, like 30–50 with PCA, after which use other algorithms to cut back much more, like tSNE or UMAP.

Below is the mixture of PCA and tSNE:

Reducing synthetically generated dataset with 300 features to simply 2, using a mixture of PCA and tSNE.

On an artificial dataset with 1M rows and ~300 features, projecting the information to the primary 30 dimensions after which to 2 dimensions took 4.5 hours. Unfortunately, the outcomes aren’t pretty:

png
Image by me

That’s why I like to recommend using UMAP. It is far faster than tSNE and preserves the local structure of the information higher:

png
UMAP projection of synthetic data with 300 dimensions

UMAP managed to search out the clear distinction between goal classes, and it did it 20 times faster than tSNE.

link.

link.

link.

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here