Python has a large number of visualization packages, the three best known of that are: Matplotlib (and seaborn), Plotly, and Hvplot. Each of those 3 packages has its strengths, but requires an entry cost to pay to learn the best way to use this package, sometimes quite substantial.
The thought for this text got here to me after I discovered the Mind Map of Pandas Methods offered by the Day by day Dose of Data science newsletter (a newsletter that I highly recommend). I used to be then discovering the Hvplot visualization package at the identical time. I assumed the concept of switching from one visualisation backend to a different as easily as with Hvplot was good (here is an example to change from Hvplot to Plotly from Hvplot). Seeing that we could do it with pandas too, I discovered the concept too interesting to not share it.
Pandas is at the center of information science in Python, and everyone knows the best way to use it. But Matplotlib integrated into Pandas is aging, and is being overtaken each in ease of use and in presentation by other packages. The facility of the Pandas visualization backend lets you reap the benefits of the newest visualization packages for data exploration and result rendering, without having to speculate time in learning these packages, that are nevertheless super powerful!
Pandas was built on 2 packages, Numpy and Matplotlib. This explains why we use Matplotlib scripts to generate graphs, and subsequently the generated graphs are matplotlib graphs.
Since its creation, Pandas has evolved and offers the user the chance to change the visualization backend utilized by Pandas.
The 6 available backends that I discovered during my research are:
- Plotnine (ggplot2)
- Plotly
- Altair
- Holoviews
- Hvplot
- Pandas_bokeh
- Matplotlib (default backend)
There are several methods available to vary a backend::
pd.set_option("plotting.backend", '')
# OR
pd.options.plotting.backend = ''
df.plot(backend='', x='...')
Note: Changing the backend requires Pandas >= 0.25, and sometimes requires specific dependencies to be necessary, equivalent to with Hvplot below.
Listed here are 2 examples:
import pandas as pd # Basic packagespd.options.plotting.backend = "plotly"
df = pd.DataFrame(dict(a=[1,3,2], b=[3,2,1]))
fig = df.plot()
fig.show()
import numpy as np
import pandas as pd # Basic packagesimport hvplot
import hvplot.pandas # ! Specific dependency to put in
pd.options.plotting.backend = 'hvplot' # Backend modification
data = np.random.normal(size=[50, 2])
df = pd.DataFrame(data, columns=['x', 'y'])
df.plot(kind='scatter', x='x', y='y') # Plotting
2.1. Matplotlib
Matplotlib is the default visualization backend of Pandas. In other words, in case you don’t specify a backend, Matplotlib shall be used. It’s an efficient package to quickly visualize your data to explore it or extract results, but it surely is aging and is being caught up in each ease of use and rendering power by other packages.
The advantage of Matplotlib is that since Pandas has been built on Matplotlib since its creation, the mixing of Matplotlib into pandas is ideal, all matplotlib functions may be utilized in Pandas.
As a reminder, listed here are the 11 Matplotlib display methods integrated into Pandas :
- “area” for area plots,
- “bar” for vertical bar charts,
- “barh” for horizontal bar charts,
- “box” for box plots,
- “hexbin” for hexbin plots,
- “hist” for histograms,
- “kde” for kernel density estimate charts,
- “density” an alias for “kde”,
- “line” for line graphs,
- “pie” for pie charts,
- “scatter” for scatter plots.
2.2. Plotly
Plotly is a visualization package developed by the corporate Plotly. The corporate has developed the framework Plotly.js, to permit interactive visualization of information inside Python. The corporate Plotly also offers the Python dashboarding package Dash.
To make use of Plotly from Pandas, simply import Plotly express and alter the backend:
import pandas as pd
import plotly.express as px # Import packagesdf = pd.read_csv("iris.csv")
# Modifying locally Pandas backend
df.plot.scatter(backend = "plotly", x = "sepal.length", y = "sepal.width")
Pandas returns an object with the identical type than Plotly:
df.plot.scatter(backend = "plotly", x = "sepal.length", y = "sepal.width")
# → px.scatter(x=df["sepal.length"], y = df["sepal.width"])
# →
The advantage is that you could directly integrate a graphic created in Pandas into the Plotly universe, especially Dash!
One limitation is that Plotly’s integration with Pandas shouldn’t be yet perfect as detailed on the Plotly website (details on the Plotly website).
2.3. Hvplot
Hvplot is an interactive visualization package based on bokeh.
It’s an exciting package, which I discovered a while ago and which continues to fascinate me, as much for Hvplot which integrates the notion of backend as in Pandas as for the Holoviz suite and related packages like Panel to create dynamic client-side web sites.
Without even the notion of Pandas backend, Hvplot doesn’t require over-learning to start out getting used, just replace .plot() of Pandas with .hvplot():
import pandas as pd
import hvplotdf = pd.read_csv("iris.csv")
# Plot with Pandas
df.plot.scatter(backend = "hvplot", x = "sepal.length", y = "sepal.width")
# Same plot with hvplot
df.hvplot.scatter(backend = "hvplot", x = "sepal.length", y = "sepal.width")
Using the Hvplot backend is finished in the identical way as for the Plotly backend, you simply have to import a dependency of the Hvplot package:
import numpy as np
import pandas as pd # Basic packagesimport hvplot
import hvplot.pandas # Specific dependency to put in
pd.options.plotting.backend = 'hvplot' # Backend modification
data = np.random.normal(size=[50, 2])
df = pd.DataFrame(data, columns=['x', 'y'])
df.plot(kind='scatter', x='x', y='y') # Plotting
Like Plotly, charts generated from Pandas with the hvplot backend are of type Hvplot :
df.plot.scatter(backend = "hvplot", x = "sepal.length", y = "sepal.width")
# → df.hvplot.scatter(backend = "hvplot", x = "sepal.length", y = "sepal.width")
# →
Hvplot is a component of the extremely powerful Holoviz suite with many other associated tools to push data evaluation very far, i.e. tools like Panel, geoviews, datashader and others. Such a concordance allows to create graphs from pandas and still have the option to reap the benefits of the Holoviz suite.
Pandas backends are a particularly efficient solution to find and reap the benefits of the newest Python visualization packages without having to speculate time: in 18 characters including spaces, it is feasible to locally transform a typical matplotlib graph into an interactive Plotly graph, and subsequently to reap the benefits of all the advantages of one of these visualization.
Nevertheless, this solution has certain limitations: it shouldn’t be suited to highly advanced visualisation objectives that require an amazing deal of customisation equivalent to advanced visualization in data journalism, because the mixing of packages in Pandas shouldn’t be yet perfect. As well as, this solution only covers visualization packages built on-top of Pandas, and excludes other visualization solutions equivalent to D3.js.
Hvplot is currently my favorite package for visualization: it is amazingly easy to start with at first, works with all the key data manipulation packages (Polars, Dask, Xray, …) and is a component of a continuum of applications that lets you go from graphs to dynamic full client-side web sites.
During my research, I didn’t find as much documentation as I expected. I believe the concept is great, so I expected a whole lot of articles. So be at liberty to inform me within the comments in case you find this solution really useful, or if it’s only a cool thing with no real use.
Thanks for reading!