Home Artificial Intelligence Announcing PyCaret 3.0 โ€” An open-source, low-code machine learning library in Python In this text: Introduction ๐Ÿ“ˆ Stable Time Series Forecasting Module ๐Ÿ’ป Object Oriented API ๐Ÿ“Š More options for Experiment Logging ๐Ÿงน Refactored Preprocessing Module โœ… Compatibility with the newest sklearn version ๐Ÿ”— Distributed Parallel Model Training ๐Ÿš€ Speed up Model Training on CPU โšฐ๏ธ ๏ธRIP: NLP and Arules module โ„น๏ธ More Information Contributors Liked the blog? Connect with Moez Ali

Announcing PyCaret 3.0 โ€” An open-source, low-code machine learning library in Python In this text: Introduction ๐Ÿ“ˆ Stable Time Series Forecasting Module ๐Ÿ’ป Object Oriented API ๐Ÿ“Š More options for Experiment Logging ๐Ÿงน Refactored Preprocessing Module โœ… Compatibility with the newest sklearn version ๐Ÿ”— Distributed Parallel Model Training ๐Ÿš€ Speed up Model Training on CPU โšฐ๏ธ ๏ธRIP: NLP and Arules module โ„น๏ธ More Information Contributors Liked the blog? Connect with Moez Ali

1
Announcing PyCaret 3.0 โ€” An open-source, low-code machine learning library in Python
In this text:
Introduction
๐Ÿ“ˆ Stable Time Series Forecasting Module
๐Ÿ’ป Object Oriented API
๐Ÿ“Š More options for Experiment Logging
๐Ÿงน Refactored Preprocessing Module
โœ… Compatibility with the newest sklearn version
๐Ÿ”— Distributed Parallel Model Training
๐Ÿš€ Speed up Model Training on CPU
โšฐ๏ธ ๏ธRIP: NLP and Arules module
โ„น๏ธ More Information
Contributors
Liked the blog? Connect with Moez Ali

Exploring the Latest Enhancements and Features of PyCaret 3.0

Generated by Moez Ali using Midjourney
  1. Introduction
  2. Stable Time Series Forecasting Module
  3. Recent Object Oriented API
  4. More options for Experiment Logging
  5. Refactored Preprocessing Module
  6. Compatibility with the newest sklearn version
  7. Distributed Parallel Model Training
  8. Speed up Model Training on CPU
  9. RIP: NLP and Arules module
  10. More Information
  11. Contributors

PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It’s an end-to-end machine learning and model management tool that exponentially hurries up the experiment cycle and makes you more productive.

Compared with the opposite open-source machine learning libraries, PyCaret is an alternate low-code library that could be used to switch tons of of lines of code with a couple of lines only. This makes experiments exponentially fast and efficient. PyCaret is actually a Python wrapper around several machine learning libraries and frameworks in Python.

The design and ease of PyCaret are inspired by the emerging role of citizen data scientists, a term first utilized by Gartner. Citizen Data Scientists are power users who can perform each easy and moderately sophisticated analytical tasks that might previously have required more technical expertise.

To learn more about PyCaret, try our GitHub or Official Docs.

Try our full Release Notes for PyCaret 3.0

PyCaretโ€™s Time Series module is now stable and available under 3.0. Currently, it supports forecasting tasks, but it surely is planned to have time-series anomaly detection and clustering algorithms available in the longer term.

# load dataset
from pycaret.datasets import get_data
data = get_data('airline')

# init setup
from pycaret.time_series import *
s = setup(data, fh = 12, session_id = 123)

# compare models
best = compare_models()

# forecast plot
plot_model(best, plot = 'forecast')
# forecast plot 36 days out in future
plot_model(best, plot = 'forecast', data_kwargs = {'fh' : 36})

Although PyCaret is a improbable tool, it doesn’t adhere to the standard object-oriented programming practices utilized by Python developers. To handle this issue, we needed to rethink among the initial design decisions we made for the 1.0 version. It is crucial to notice that it is a significant change that may require considerable effort to implement. Now, letโ€™s explore how it will affect you.

# Functional API (Existing)

# load dataset
from pycaret.datasets import get_data
data = get_data('juice')

# init setup
from pycaret.classification import *
s = setup(data, goal = 'Purchase', session_id = 123)

# compare models
best = compare_models()

It’s great to do experiments in the identical notebook, but when you desire to run a special experiment with different setup function parameters, this generally is a problem. Even though it is feasible, the previous experimentโ€™s settings can be replaced.

Nevertheless, with our latest object-oriented API, you may effortlessly conduct multiple experiments in the identical notebook and compare them with none difficulty. It’s because the parameters are linked to an object and could be related to various modeling and preprocessing options.

# load dataset
from pycaret.datasets import get_data
data = get_data('juice')

# init setup 1
from pycaret.classification import ClassificationExperiment

exp1 = ClassificationExperiment()
exp1.setup(data, goal = 'Purchase', session_id = 123)

# compare models init 1
best = exp1.compare_models()

# init setup 2
exp2 = ClassificationExperiment()
exp2.setup(data, goal = 'Purchase', normalize = True, session_id = 123)

# compare models init 2
best2 = exp2.compare_models()

exp1.compare_models
exp2.compare_models

After conducting experiments, you may utilize the get_leaderboard function to create leaderboards for every experiment, making it easier to check them.

import pandas as pd

# generate leaderboard
leaderboard_exp1 = exp1.get_leaderboard()
leaderboard_exp2 = exp2.get_leaderboard()
lb = pd.concat([leaderboard_exp1, leaderboard_exp2])

Output truncated
# print pipeline steps
print(exp1.pipeline.steps)
print(exp2.pipeline.steps)

PyCaret 2 can mechanically log experiments using MLflow . While it continues to be the default, there are more options for experiment logging in PyCaret 3. The newly added options in the newest version are wandb, cometml, dagshub .

To vary the logger from default MLflow to other available options, simply pass certainly one of the next within thelog_experiment parameter. โ€˜mlflowโ€™, โ€˜wandbโ€™, โ€˜cometmlโ€™, โ€˜dagshubโ€™.

The preprocessing module underwent an entire redesign to enhance its efficiency and performance, in addition to to make sure compatibility with the newest version of Scikit-Learn.

PyCaret 3 includes several latest preprocessing functionalities, corresponding to revolutionary categorical encoding techniques, support for text features in machine learning modeling, novel outlier detection methods, and advanced feature selection techniques.

A number of the latest features are:

  • Recent categorical encoding methods
  • Handling text features for machine learning modeling
  • Recent methods to detect outliers
  • Recent methods for feature selection
  • Guarantee to avoid goal leakage as the complete pipeline is now fitted at a fold level.

PyCaret 2 relies heavily on scikit-learn 0.23.2, which makes it unattainable to make use of the newest scikit-learn version (1.X) concurrently with PyCaret in the identical environment.

PyCaret is now compatible with the newest version of scikit-learn, and we would really like to maintain it that way.

To scale on large datasets, you may run compare_models function on a cluster in distributed mode. To try this, you should use the parallel parameter within the compare_models function.

This was made possible due to Fugue, an open-source unified interface for distributed computing that lets users execute Python, Pandas, and SQL code on Spark, Dask, and Ray with minimal rewrites

# load dataset
from pycaret.datasets import get_data
diabetes = get_data('diabetes')

# init setup
from pycaret.classification import *
clf1 = setup(data = diabetes, goal = 'Class variable', n_jobs = 1)

# create pyspark session
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

# import parallel back-end
from pycaret.parallel import FugueBackend

# compare models
best = compare_models(parallel = FugueBackend(spark))

You may apply Intel optimizations for machine learning algorithms and speed up your workflow. To coach models with Intel optimizations use sklearnex engine, installation of Intel sklearnex library is required:

# install sklearnex
pip install scikit-learn-intelex

To make use of the intel optimizations, simply pass engine = 'sklearnex' within the create_model function.

# Functional API (Existing)

# load dataset
from pycaret.datasets import get_data
data = get_data('bank')

# init setup
from pycaret.classification import *
s = setup(data, goal = 'deposit', session_id = 123)

Model training without intel accelerations:

%%time
lr = create_model('lr')

Model training with intel accelerations:

%%time
lr2 = create_model('lr', engine = 'sklearnex')

There are some differences in model performance (immaterial typically) but the development in timing is ~60% on a 30K rows dataset. The profit is far higher when coping with larger datasets.

NLP is changing fast, and there are numerous dedicated libraries and firms working exclusively to unravel end-to-end NLP tasks. Resulting from lack of resources, existing expertise within the team, and latest contributors willing to keep up and support NLP and Arules, we’ve got decided to drop them from PyCaret. PyCaret 3.0 doesnโ€™t have nlp and arules module. It has also been faraway from the documentation. You may still use them with the older version of PyCaret.

๐Ÿ“š Docs Getting began with PyCaret

๐Ÿ“ API Reference Detailed API docs

โญ Tutorials Recent to PyCaret? Try our official notebooks

๐Ÿ“‹ Notebooks created and maintained by the community

๐Ÿ“™ Blog Tutorials and articles by contributors

๐Ÿ“บ Videos Video tutorials and events

๐ŸŽฅ YouTube Subscribe our YouTube channel

๐Ÿค— Slack Join our slack community

๐Ÿ’ป LinkedIn Follow our LinkedIn page

๐Ÿ“ข Discussions Engage with the community and contributors

๐Ÿ› ๏ธ Release Notes

Because of all of the contributors who’ve participated in PyCaret 3.

@ngupta23
@Yard1
@tvdboom
@jinensetpal
@goodwanghan
@Alexsandruss
@daikikatsuragawa
@caron14
@sherpan
@haizadtarik
@ethanglaser
@kumar21120
@satya-pattnaik
@ltsaprounis
@sayantan1410
@AJarman
@drmario-gh
@NeptuneN
@Abonia1
@LucasSerra
@desaizeeshan22
@rhoboro
@jonasvdd
@PivovarA
@ykskks
@chrimaho
@AnthonyA1223
@ArtificialZeng
@cspartalis
@vladocodes
@huangzhhui
@keisuke-umezawa
@ryankarlos
@celestinoxp
@qubiit
@beckernick
@napetrov
@erwanlc
@Danpilz
@ryanxjhan
@wkuopt
@TremaMiguel
@IncubatorShokuhou
@moezali1

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here