Home Artificial Intelligence PandasAI: Super Easy Data Evaluation with AI (Powered by OpenAI)

PandasAI: Super Easy Data Evaluation with AI (Powered by OpenAI)

1
PandasAI: Super Easy Data Evaluation with AI (Powered by OpenAI)

Talk together with your dataframes using this Python library.

Photo by Yu Wang on Unsplash

Picture this: you’re in the course of something, and suddenly, a coworker asks you a few questions on one among your datasets. The questions aren’t that complex but you don’t need to break your concentration exploring this dataset.

What to do?

Well, now you’ll be able to simply ask PandasAI any query you’ve and get answers quickly due to the unreal intelligence capabilities this Python library has. It’s like talking together with your dataframe!

You simply need an OpenAI API key and follow the steps below.

In case you don’t feel like reading, you’ll be able to watch my YouTube video below

Establishing Pandas AI

To make use of this Python library, first, you’ve to put in it using pip.

pip install pandasai

Also, we’ll use our OpenAI API key. In case you don’t have one yet, you’ll be able to generate one by visiting this website and following the steps within the image below.

Image by writer

Working with PandasAI

Let’s start by importing the libraries we’ll use.

import pandas as pd
from pandasai import PandasAI
from pandasai.llm.openai import OpenAI

Now let’s read the dataframe we’re going to make use of for this demo. On this case, we’ll work with a supermarket sales dataset that you would be able to download on GitHub or Google Drive.

df = pd.read_csv("supermarket_sales.csv")
df = df[['Gender', 'Product line', 'Total']]

This dataset has many columns but we’ll only use the three columns shown below.

Image by writer

Now it’s time to instantiate a PandasAI object. To achieve this, first, now we have to instantiate the llm, which on this case we’ll do using the OpenAI API key.

OPENAI_API_KEY = "your-api-key"
llm = OpenAI(api_token=OPENAI_API_KEY)

pandas_ai = PandasAI(llm)

Great! Now we are able to use our pandas_ai object each time we wish to “talk” to our dataframe.

Say we wish to know which products are within the “Product line” column. Now we are able to simply add the prompt parameter and kind our query.

pandas_ai.run(df, prompt="Which products are in Product line")

Here’s the output I got:

‘The product line includes health and wonder, electronic accessories, home and lifestyle, sports and travel, food and beverages, and fashion accessories.’

Although we are able to simply get the identical output by utilizing the unique function, PandasAI shines relating to solving more complex questions.

Say we wish to calculate the whole amount of cash spent by each the female and male gender.

pandas_ai.run(df, prompt="Calculate the whole spent by each gender")

Here’s the output I got:

‘The entire amount spent by females is $167,882.93 and the whole amount spent by males is $155,083.82.’

In case you manually do the .groupby and .sum function, you’ll see that the reply is correct. Cool, isn’t it?

But that’s not all. You possibly can ask to generate plots. Say we wish to generate a barplot that shows the whole spent by each gender.

pandas_ai.run(df, prompt="Plot a barplot that shows the whole spent by each gender")
Image by writer

Unfortunately, PandasAI didn’t get the barplot right, which is weird since it successfully calculated the quantity for every gender. I don’t know if it has to do with the dataset or with the library itself, but PandasAI did not plot visualizations from datasets that weren’t so simple as the one shown in its documentation.

That said, that’s one among the few weak points I discovered and possibly that’s gonna be fixed in coming updates.

Now let’s attempt to make a pivot table with PandasAI, in order that I can see how much each gender spends on each product line.

pandas_ai.run(df, prompt="Calculate the whole spent on each product line by each the female and male gender")

Here’s the output I got.

For females, the best amount was spent on food and beverages with a complete of $33,170.92, followed by fashion accessories with $30,437.40 and residential and lifestyle with $30,036.88.n- For males, the best amount was spent on health and wonder with a complete of $30,632.75, followed by electronic accessories with $27,235.51 and sports and travel with $26,548.11.’

The numbers aren’t improper, but for some reason, it only calculated the expenses of three product lines for every gender.

I attempted to make it actually generate the pivot table, so I could make a visualization from it, but it surely didn’t work.

To indicate you that PandasAI is capable to do visualizations if the info provided is straightforward, I created the pivot table alone with the code below.

report_table = df.pivot_table(index='Gender',
columns='Product line',
values='Total',
aggfunc='sum').round(0)
Image by writer

Then I inserted this latest dataframe I named report_table within the pandas_ai object and kind the prompt to get a barplot from the previous pivot table.

pandas_ai.run(report_table, prompt="Make a barplot that shows how much money each gender spends on each product line")
Image by writer

Now we got the barplot with the best numbers!

That’s it! As you’ll be able to see PandasAI isn’t a substitute for Pandas, but it surely was created for use along with it. For more information about this library, check the official GitHub repository.

All of the code written in this text is out there here.

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here