I discovered a hidden gem in Matplotlib’s library: Packed Bubble Charts in Python

-

For my chart, I’m using an Olympic Historical Dataset from Olympedia.org which Joseph Cheng shared in Kaggle with a public domain license.

Screenshot of dataset

It comprises event to Athlete level Olympic Games Results from Athens 1896 to Beijing 2022. After an EDA (Exploratory Data Evaluation) I transformed it right into a dataset that details the variety of female athletes in each sport/event per yr. My bubble chart idea is to indicate which sports have a 50/50 female to male ratio athletes and the way it has evolved during time.

My plotting data consists of two different datasets, one for every year: 2020 and 1996. For every dataset I’ve computed the full sum of athletes that participated to every event (athlete_sum) and the way much that sum represents in comparison with the variety of total athletes (male + female) (difference). See a screenshot of the information below:

Screen shot of plotting dataset

That is my approach to visualise it:

  • Size proportion. Using radius of bubbles to check number athletes per sport. Greater bubbles will represent highly competitive events, akin to Athletics
  • Multi variable interpretation. Making use of colors to represent female representation. Light green bubbles will represent events with a 50/50 split, akin to Hockey.

Here is my place to begin (using the code and approach from above):

First result

Some easy fixes: increasing figure size and changing labels to empty if the dimensions isn’t over 250 to avoid having words outside bubbles.

fig, ax = plt.subplots(figsize=(12,8),subplot_kw=dict(aspect="equal"))

#Labels edited directly in dataset

Second result

Well, now a minimum of it’s readable. But, why is Athletics pink and Boxing blue? Let’s add a legend as an instance the connection between colors and feminine representation.

Since it’s not your regular barplot chart, plt.legend() doesn’t do the trick here.

Using matplotlib Annotation Bbox we are able to create rectangles (or circles) to indicate meaning behind each color. We may do the identical thing to indicate a bubble scale.

import matplotlib.pyplot as plt
from matplotlib.offsetbox import (AnnotationBbox, DrawingArea,
TextArea,HPacker)
from matplotlib.patches import Circle,Rectangle

# That is an example for one section of the legend

# Define where the annotation (legend) might be
xy = [50, 128]

# Create your coloured rectangle or circle
da = DrawingArea(20, 20, 0, 0)
p = Rectangle((10 ,10),10,10,color="#fc8d62ff")
da.add_artist(p)

# Add text

text = TextArea("20%", textprops=dict(color="#fc8d62ff", size=14,fontweight='daring'))

# Mix rectangle and text
vbox = HPacker(children=[da, text], align="top", pad=0, sep=3)

# Annotate each in a box (change alpha if you must see the box)
ab = AnnotationBbox(vbox, xy,
xybox=(1.005, xy[1]),
xycoords='data',
boxcoords=("axes fraction", "data"),
box_alignment=(0.2, 0.5),
bboxprops=dict(alpha=0)
)
#Add to your bubble chart
ax.add_artist(ab)

I’ve also added a subtitle and a text description under the chart just by utilizing plt.text()

Final visualisation

Straightforward and user friendly interpretations of the graph:

  • Majority of bubbles are light green → green means 50% females → majority of Olympic competitions have a fair 50/50 female to male split (yay🙌)
  • Just one sport (Baseball), in dark green color, has no female participation.
  • 3 sports have only female participation however the variety of athletes is fairly low.
  • The largest sports when it comes to athlete number (Swimming, Athletics and Gymnastics) are very near having a 50/50 split
ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x