Jupyter is a web-based IDE. Thus, at any time when we print a DataFrame, it’s rendered using HTML and CSS.
This enables us to format the output just like every other web page.
One interesting strategy to format that is by embedding inline plots which appear as a column of a dataframe, also called sparklines, to get something like this:
So the best way to create one, it’s possible you’ll ask? Let’s take a look at it below.
Let’s take a look at the imports first:
Next, let’s create a dummy dataset:
Corresponding to the 4 rows, we have now an inventory of randomly generated price histories.
Now, our objective is so as to add a line plot in each row. Thus, we will create a function and use the apply()
method.
Now, as mentioned above, Jupyter renders a DataFrame using HTML.
Thus, if we will work out a strategy to provide some HTML as a cell’s value which refers to a picture, Jupyter can render it and display the corresponding line plot.
Here’s the code which can try this for us:
While the plotting part is pretty obvious, let’s concentrate on what the last 4 lines of code (not including comments here) are meant for.
The target is to convert the plot into a picture that may be displayed on an online page.
Thus, The primary line creates a recent BytesIO object img
. BytesIO
is a category within the io
module that creates an in-memory bytes buffer.
The second line saves the plot generated by matplotlib to the img
object as a PNG image using the savefig
approach to the figure object fig
.
The third line encodes the content of the img
object as a base64 string using the b64encode
function from the base64
module. The resulting base64 string is then decoded right into a Unicode string using the decode
method with the utf-8
encoding.
Finally, the last line returns an HTML tag with the source attribute set to the base64-encoded image string. When this string is rendered on an online page, it would display the image generated by matplotlib.
Finally, we create the road by calling the strategy on each row of the dataframe.
Overall, sparklines are a superb tool for quickly conveying trends and patterns in data, and so they may be especially useful whenever you want to display loads of information in a small space.
At any time when we call an existing method on a Pandas DataFrame, say df.rename()
, it’s evident that the rename()
method is defined within the DataFrame class.
But what in the event you wish to connect a custom method to the DataFrame object, say, df.my_method()
. This is completely possible.
Fortunately, Pandas is a highly customizable library, and lots of ways exist to increase its functionality to satisfy your needs.
One popular approach is to make use of the pandas-flavor library. It means that you can define and fix custom Pandas methods to the DataFrame object.
You’ll be able to install it as follows:
Next, let’s write a custom method in a file my_pandas.py
.
Now consider that you’ve the next DataFrame:
Finally, we will import the custom methods file my_pandas.py
, and it would attach the brand new method to the DataFrame object:
That is super useful to streamline your Pandas workflow. With this, you’ll be able to create functions tailored to your specific use case and make your data evaluation tasks more efficient and intuitive.
A Pandas DataFrame is usually created from a Python list, dictionary, by reading files, etc.
Nonetheless, did you understand it’s also possible to create a DataFrame from an inventory of Dataclass objects?
Assume you’ve the next dataclass Point
:
Let’s create a bunch of objects from this class.
Now, if we pass this list of dataclass objects to the pd.DataFrame
method, we get a DataFrame as an output:
This approach may be super useful when working with data classes, because it provides a simple strategy to create a pandas DataFrame from a set of instances.
While applying a way to a DataFrame using apply()
, we don’t get to see the progress and an estimated remaining time.
Nonetheless, this may be vital when working with large datasets or complex operations. It’s because it gets difficult to understand how for much longer it would take to complete.
Furthermore, a progress bar could make it easier to determine whether to attend for the operation to finish or interrupt it and check out a special approach.
To resolve this, as an alternative of using the apply()
method, you need to use progress_apply()
from tqdm
.
First, integrate it with Pandas as follows:
Now, if we use df.progress_apply()
, we get:
When presenting data in a DataFrame, adding captions to your tables can provide additional context and make your data more comprehensible as an alternative of adding markdown cells in a jupyter notebook.
With Pandas’ styling API, you’ll be able to add captions to a DataFrame. Let’s take a look at an example below:
As an illustration, consider we have now the next DataFrame:
Next, we invoke the set_caption()
method on a DataFrame’s style
accessor, as shown below:
As shown above, the DataFrame now appears with a caption.
Overall, adding captions allows us to briefly describe the DataFrame, its purpose, and some other pertinent information that might help users understand the info more quickly and simply.
Once we print a DataFrame, it appears as a set of raw numbers (or strings).
As an illustration, consider the next DataFrame:
On this case, the columns of our data have some intrinsic unit of measurement, which is crucial for the reader to know. But that is nowhere to be seen in the info.
Yet again, with the styling API, you’ll be able to format the output preview of a DataFrame, as shown below:
Now, it’s rather more evident what the person values mean, which was missing within the default preview.
Moreover, it’s possible you’ll also explore the open-source package PrettyPandas, which extends the styler class with many more interesting utilities.