4-Dimensional Data Visualization: Time in Bubble Charts

-


Bubble Charts elegantly compress large amounts of knowledge right into a single visualization, with bubble size adding a 3rd dimension. Nevertheless, comparing “before” and “after” states is commonly crucial. To handle this, we propose adding a transition between these states, creating an intuitive user experience.

Since we couldn’t discover a ready-made solution, we developed our own. The challenge turned out to be fascinating and required refreshing some mathematical concepts.

Unquestionably, probably the most difficult a part of the visualization is the transition between two circles — before and after states. To simplify, we deal with solving a single case, which might then be prolonged in a loop to generate the essential variety of transitions.

To construct such a figure, let’s first decompose it into three parts: two circles and a polygon that connects them (in gray).

Base element decomposition, image by Writer

Constructing two circles is kind of easy — we all know their centers and radii. The remaining task is to construct a quadrilateral polygon, which has the next form:

Polygon, image by Writer

The development of this polygon reduces to finding the coordinates of its vertices. That is probably the most interesting task, and we are going to solve it further.

From polygon to tangent lines, image by Writer

To calculate the gap from a degree  to the road , the formula is:

Distance from point to a line, image by Writer

In our case, distance () is the same as circle radius (). Hence,

Distance to radius, image by Writer

After multiplying either side of the equation by , we get:

Base math, image by Writer

After moving all the things to 1 side and setting the equation equal to zero, we get:

Base math, image by Writer

Since we have now two circles and wish to search out a tangent to each, we have now the next system of equations:

System of equations, image by Writer

This works great, but the issue is that we have now 4 possible tangent lines in point of fact:

All possible tangent lines, image by Writer

And we’d like to decide on just 2 of them — external ones.

To do that we’d like to ascertain each tangent and every circle center and determine if the road is above or below the purpose:

Check if line is above or below the purpose, image by Writer

We’d like the 2 lines that each pass above or each pass below the centers of the circles.

Now, let’s translate all these steps into code:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sympy as sp
from scipy.spatial import ConvexHull
import math
from matplotlib import rcParams
import matplotlib.patches as patches

def check_position_relative_to_line(a, b, x0, y0):
    y_line = a * x0 + b
    
    if y0 > y_line:
        return 1 # line is above the purpose
    elif y0 < y_line:
        return -1

    
def find_tangent_equations(x1, y1, r1, x2, y2, r2):
    a, b = sp.symbols('a b')

    tangent_1 = (a*x1 + b - y1)**2 - r1**2 * (a**2 + 1)  
    tangent_2 = (a*x2 + b - y2)**2 - r2**2 * (a**2 + 1) 

    eqs_1 = [tangent_2, tangent_1]
    solution = sp.solve(eqs_1, (a, b))
    parameters = [(float(e[0]), float(e[1])) for e in solution]

    # filter just external tangents
    parameters_filtered = []
    for tangent in parameters:
        a = tangent[0]
        b = tangent[1]
        if abs(check_position_relative_to_line(a, b, x1, y1) + check_position_relative_to_line(a, b, x2, y2)) == 2:
            parameters_filtered.append(tangent)

    return parameters_filtered

Now, we just need to search out the intersections of the tangents with the circles. These 4 points will likely be the vertices of the specified polygon.

Circle equation:

Circle equation, image by Writer

Substitute the road equation into the circle equation:

Base math, image by Writer

Solution of the equation is the  of the intersection.

Then, calculate  from the road equation:

Calculating y, image by Writer

The way it translates to the code:

def find_circle_line_intersection(circle_x, circle_y, circle_r, line_a, line_b):
    x, y = sp.symbols('x y')
    circle_eq = (x - circle_x)**2 + (y - circle_y)**2 - circle_r**2
    intersection_eq = circle_eq.subs(y, line_a * x + line_b)

    sol_x_raw = sp.solve(intersection_eq, x)[0]
    try:
        sol_x = float(sol_x_raw)
    except:
        sol_x = sol_x_raw.as_real_imag()[0]
    sol_y = line_a * sol_x + line_b
    return sol_x, sol_y

Now we would like to generate sample data to exhibit the entire chart compositions.

Imagine we have now 4 users on our platform. We know the way many purchases they made, generated revenue and activity on the platform. All these metrics are calculated for two periods (let’s call them pre and post period).

# data generation
df = pd.DataFrame({'user': ['Emily', 'Emily', 'James', 'James', 'Tony', 'Tony', 'Olivia', 'Olivia'],
                   'period': ['pre', 'post', 'pre', 'post', 'pre', 'post', 'pre', 'post'],
                   'num_purchases': [10, 9, 3, 5, 2, 4, 8, 7],
                   'revenue': [70, 60, 80, 90, 20, 15, 80, 76],
                   'activity': [100, 80, 50, 90, 210, 170, 60, 55]})
Data sample, image by Writer

Let’s assume that “activity” is the world of the bubble. Now, let’s convert it into the radius of the bubble. We can even scale the y-axis.

def area_to_radius(area):
    radius = math.sqrt(area / math.pi)
    return radius

x_alias, y_alias, a_alias="num_purchases", 'revenue', 'activity'

# scaling metrics
radius_scaler = 0.1
df['radius'] = df[a_alias].apply(area_to_radius) * radius_scaler
df['y_scaled'] = df[y_alias] / df[x_alias].max()

Now let’s construct the chart — 2 circles and the polygon.

def draw_polygon(plt, points):
    hull = ConvexHull(points)
    convex_points = [points[i] for i in hull.vertices]

    x, y = zip(*convex_points)
    x += (x[0],)
    y += (y[0],)

    plt.fill(x, y, color="#99d8e1", alpha=1, zorder=1)

# bubble pre
for _, row in df[df.period=='pre'].iterrows():
    x = row[x_alias]
    y = row.y_scaled
    r = row.radius
    circle = patches.Circle((x, y), r, facecolor="#99d8e1", edgecolor="none", linewidth=0, zorder=2)
    plt.gca().add_patch(circle)

# transition area
for user in df.user.unique():
    user_pre = df[(df.user==user) & (df.period=='pre')]
    x1, y1, r1 = user_pre[x_alias].values[0], user_pre.y_scaled.values[0], user_pre.radius.values[0]
    user_post = df[(df.user==user) & (df.period=='post')]
    x2, y2, r2 = user_post[x_alias].values[0], user_post.y_scaled.values[0], user_post.radius.values[0]

    tangent_equations = find_tangent_equations(x1, y1, r1, x2, y2, r2)
    circle_1_line_intersections = [find_circle_line_intersection(x1, y1, r1, eq[0], eq[1]) for eq in tangent_equations]
    circle_2_line_intersections = [find_circle_line_intersection(x2, y2, r2, eq[0], eq[1]) for eq in tangent_equations]

    polygon_points = circle_1_line_intersections + circle_2_line_intersections
    draw_polygon(plt, polygon_points)

# bubble post
for _, row in df[df.period=='post'].iterrows():
    x = row[x_alias]
    y = row.y_scaled
    r = row.radius
    label = row.user
    circle = patches.Circle((x, y), r, facecolor="#2d699f", edgecolor="none", linewidth=0, zorder=2)
    plt.gca().add_patch(circle)

    plt.text(x, y - r - 0.3, label, fontsize=12, ha="center")

The output looks as expected:

Output, image by Writer

Now we would like so as to add some styling:

# plot parameters
plt.subplots(figsize=(10, 10))
rcParams['font.family'] = 'DejaVu Sans'
rcParams['font.size'] = 14
plt.grid(color="gray", linestyle=(0, (10, 10)), linewidth=0.5, alpha=0.6, zorder=1)
plt.axvline(x=0, color="white", linewidth=2)
plt.gca().set_facecolor('white')
plt.gcf().set_facecolor('white')

# spines formatting
plt.gca().spines["top"].set_visible(False)
plt.gca().spines["right"].set_visible(False)
plt.gca().spines["bottom"].set_visible(False)
plt.gca().spines["left"].set_visible(False)
plt.gca().tick_params(axis="each", which="each", length=0)

# plot labels
plt.xlabel("Number purchases") 
plt.ylabel("Revenue, $")
plt.title("Product users performance", fontsize=18, color="black")

# axis limits
axis_lim = df[x_alias].max() * 1.2
plt.xlim(0, axis_lim)
plt.ylim(0, axis_lim)

Pre-post legend in the fitting bottom corner to offer viewer a touch, how you can read the chart:

## pre-post legend 
# circle 1
legend_position, r1 = (11, 2.2), 0.3
x1, y1 = legend_position[0], legend_position[1]
circle = patches.Circle((x1, y1), r1, facecolor="#99d8e1", edgecolor="none", linewidth=0, zorder=2)
plt.gca().add_patch(circle)
plt.text(x1, y1 + r1 + 0.15, 'Pre', fontsize=12, ha="center", va="center")
# circle 2
x2, y2 = legend_position[0], legend_position[1] - r1*3
r2 = r1*0.7
circle = patches.Circle((x2, y2), r2, facecolor="#2d699f", edgecolor="none", linewidth=0, zorder=2)
plt.gca().add_patch(circle)
plt.text(x2, y2 - r2 - 0.15, 'Post', fontsize=12, ha="center", va="center")
# tangents
tangent_equations = find_tangent_equations(x1, y1, r1, x2, y2, r2)
circle_1_line_intersections = [find_circle_line_intersection(x1, y1, r1, eq[0], eq[1]) for eq in tangent_equations]
circle_2_line_intersections = [find_circle_line_intersection(x2, y2, r2, eq[0], eq[1]) for eq in tangent_equations]
polygon_points = circle_1_line_intersections + circle_2_line_intersections
draw_polygon(plt, polygon_points)
# small arrow
plt.annotate('', xytext=(x1, y1), xy=(x2, y1 - r1*2), arrowprops=dict(edgecolor="black", arrowstyle="->", lw=1))
Adding styling and legend, image by Writer

And eventually bubble-size legend:

# bubble size legend
legend_areas_original = [150, 50]
legend_position = (11, 10.2)
for i in legend_areas_original:
    i_r = area_to_radius(i) * radius_scaler
    circle = plt.Circle((legend_position[0], legend_position[1] + i_r), i_r, color="black", fill=False, linewidth=0.6, facecolor="none")
    plt.gca().add_patch(circle)
    plt.text(legend_position[0], legend_position[1] + 2*i_r, str(i), fontsize=12, ha="center", va="center",
              bbox=dict(facecolor="white", edgecolor="none", boxstyle="round,pad=0.1"))
legend_label_r = area_to_radius(np.max(legend_areas_original)) * radius_scaler
plt.text(legend_position[0], legend_position[1] + 2*legend_label_r + 0.3, 'Activity, hours', fontsize=12, ha="center", va="center")

Our final chart looks like this:

Adding second legend, image by Writer

The visualization looks very stylish and concentrates quite a number of information in a compact form.

Here is the total code for the graph:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sympy as sp
from scipy.spatial import ConvexHull
import math
from matplotlib import rcParams
import matplotlib.patches as patches

def check_position_relative_to_line(a, b, x0, y0):
    y_line = a * x0 + b
    
    if y0 > y_line:
        return 1 # line is above the purpose
    elif y0 < y_line:
        return -1

    
def find_tangent_equations(x1, y1, r1, x2, y2, r2):
    a, b = sp.symbols('a b')

    tangent_1 = (a*x1 + b - y1)**2 - r1**2 * (a**2 + 1)  
    tangent_2 = (a*x2 + b - y2)**2 - r2**2 * (a**2 + 1) 

    eqs_1 = [tangent_2, tangent_1]
    solution = sp.solve(eqs_1, (a, b))
    parameters = [(float(e[0]), float(e[1])) for e in solution]

    # filter just external tangents
    parameters_filtered = []
    for tangent in parameters:
        a = tangent[0]
        b = tangent[1]
        if abs(check_position_relative_to_line(a, b, x1, y1) + check_position_relative_to_line(a, b, x2, y2)) == 2:
            parameters_filtered.append(tangent)

    return parameters_filtered

def find_circle_line_intersection(circle_x, circle_y, circle_r, line_a, line_b):
    x, y = sp.symbols('x y')
    circle_eq = (x - circle_x)**2 + (y - circle_y)**2 - circle_r**2
    intersection_eq = circle_eq.subs(y, line_a * x + line_b)

    sol_x_raw = sp.solve(intersection_eq, x)[0]
    try:
        sol_x = float(sol_x_raw)
    except:
        sol_x = sol_x_raw.as_real_imag()[0]
    sol_y = line_a * sol_x + line_b
    return sol_x, sol_y

def draw_polygon(plt, points):
    hull = ConvexHull(points)
    convex_points = [points[i] for i in hull.vertices]

    x, y = zip(*convex_points)
    x += (x[0],)
    y += (y[0],)

    plt.fill(x, y, color="#99d8e1", alpha=1, zorder=1)

def area_to_radius(area):
    radius = math.sqrt(area / math.pi)
    return radius

# data generation
df = pd.DataFrame({'user': ['Emily', 'Emily', 'James', 'James', 'Tony', 'Tony', 'Olivia', 'Olivia', 'Oliver', 'Oliver', 'Benjamin', 'Benjamin'],
                   'period': ['pre', 'post', 'pre', 'post', 'pre', 'post', 'pre', 'post', 'pre', 'post', 'pre', 'post'],
                   'num_purchases': [10, 9, 3, 5, 2, 4, 8, 7, 6, 7, 4, 6],
                   'revenue': [70, 60, 80, 90, 20, 15, 80, 76, 17, 19, 45, 55],
                   'activity': [100, 80, 50, 90, 210, 170, 60, 55, 30, 20, 200, 120]})

x_alias, y_alias, a_alias="num_purchases", 'revenue', 'activity'

# scaling metrics
radius_scaler = 0.1
df['radius'] = df[a_alias].apply(area_to_radius) * radius_scaler
df['y_scaled'] = df[y_alias] / df[x_alias].max()

# plot parameters
plt.subplots(figsize=(10, 10))
rcParams['font.family'] = 'DejaVu Sans'
rcParams['font.size'] = 14
plt.grid(color="gray", linestyle=(0, (10, 10)), linewidth=0.5, alpha=0.6, zorder=1)
plt.axvline(x=0, color="white", linewidth=2)
plt.gca().set_facecolor('white')
plt.gcf().set_facecolor('white')

# spines formatting
plt.gca().spines["top"].set_visible(False)
plt.gca().spines["right"].set_visible(False)
plt.gca().spines["bottom"].set_visible(False)
plt.gca().spines["left"].set_visible(False)
plt.gca().tick_params(axis="both", which="both", length=0)

# plot labels
plt.xlabel("Number purchases") 
plt.ylabel("Revenue, $")
plt.title("Product users performance", fontsize=18, color="black")

# axis limits
axis_lim = df[x_alias].max() * 1.2
plt.xlim(0, axis_lim)
plt.ylim(0, axis_lim)

# bubble pre
for _, row in df[df.period=='pre'].iterrows():
    x = row[x_alias]
    y = row.y_scaled
    r = row.radius
    circle = patches.Circle((x, y), r, facecolor="#99d8e1", edgecolor="none", linewidth=0, zorder=2)
    plt.gca().add_patch(circle)

# transition area
for user in df.user.unique():
    user_pre = df[(df.user==user) & (df.period=='pre')]
    x1, y1, r1 = user_pre[x_alias].values[0], user_pre.y_scaled.values[0], user_pre.radius.values[0]
    user_post = df[(df.user==user) & (df.period=='post')]
    x2, y2, r2 = user_post[x_alias].values[0], user_post.y_scaled.values[0], user_post.radius.values[0]

    tangent_equations = find_tangent_equations(x1, y1, r1, x2, y2, r2)
    circle_1_line_intersections = [find_circle_line_intersection(x1, y1, r1, eq[0], eq[1]) for eq in tangent_equations]
    circle_2_line_intersections = [find_circle_line_intersection(x2, y2, r2, eq[0], eq[1]) for eq in tangent_equations]

    polygon_points = circle_1_line_intersections + circle_2_line_intersections
    draw_polygon(plt, polygon_points)

# bubble post
for _, row in df[df.period=='post'].iterrows():
    x = row[x_alias]
    y = row.y_scaled
    r = row.radius
    label = row.user
    circle = patches.Circle((x, y), r, facecolor="#2d699f", edgecolor="none", linewidth=0, zorder=2)
    plt.gca().add_patch(circle)

    plt.text(x, y - r - 0.3, label, fontsize=12, ha="center")

# bubble size legend
legend_areas_original = [150, 50]
legend_position = (11, 10.2)
for i in legend_areas_original:
    i_r = area_to_radius(i) * radius_scaler
    circle = plt.Circle((legend_position[0], legend_position[1] + i_r), i_r, color="black", fill=False, linewidth=0.6, facecolor="none")
    plt.gca().add_patch(circle)
    plt.text(legend_position[0], legend_position[1] + 2*i_r, str(i), fontsize=12, ha="center", va="center",
              bbox=dict(facecolor="white", edgecolor="none", boxstyle="round,pad=0.1"))
legend_label_r = area_to_radius(np.max(legend_areas_original)) * radius_scaler
plt.text(legend_position[0], legend_position[1] + 2*legend_label_r + 0.3, 'Activity, hours', fontsize=12, ha="center", va="center")


## pre-post legend 
# circle 1
legend_position, r1 = (11, 2.2), 0.3
x1, y1 = legend_position[0], legend_position[1]
circle = patches.Circle((x1, y1), r1, facecolor="#99d8e1", edgecolor="none", linewidth=0, zorder=2)
plt.gca().add_patch(circle)
plt.text(x1, y1 + r1 + 0.15, 'Pre', fontsize=12, ha="center", va="center")
# circle 2
x2, y2 = legend_position[0], legend_position[1] - r1*3
r2 = r1*0.7
circle = patches.Circle((x2, y2), r2, facecolor="#2d699f", edgecolor="none", linewidth=0, zorder=2)
plt.gca().add_patch(circle)
plt.text(x2, y2 - r2 - 0.15, 'Post', fontsize=12, ha="center", va="center")
# tangents
tangent_equations = find_tangent_equations(x1, y1, r1, x2, y2, r2)
circle_1_line_intersections = [find_circle_line_intersection(x1, y1, r1, eq[0], eq[1]) for eq in tangent_equations]
circle_2_line_intersections = [find_circle_line_intersection(x2, y2, r2, eq[0], eq[1]) for eq in tangent_equations]
polygon_points = circle_1_line_intersections + circle_2_line_intersections
draw_polygon(plt, polygon_points)
# small arrow
plt.annotate('', xytext=(x1, y1), xy=(x2, y1 - r1*2), arrowprops=dict(edgecolor="black", arrowstyle="->", lw=1))

# y axis formatting
max_y = df[y_alias].max()
nearest_power_of_10 = 10 ** math.ceil(math.log10(max_y))
ticks = [round(nearest_power_of_10/5 * i, 2) for i in range(0, 6)]
yticks_scaled = ticks / df[x_alias].max()
yticklabels = [str(i) for i in ticks]
yticklabels[0] = ''
plt.yticks(yticks_scaled, yticklabels)

plt.savefig("plot_with_white_background.png", bbox_inches="tight", dpi=300)

Adding a time dimension to bubble charts enhances their ability to convey dynamic data changes intuitively. By implementing smooth transitions between “before” and “after” states, users can higher understand trends and comparisons over time.

While no ready-made solutions were available, developing a custom approach proved each difficult and rewarding, requiring mathematical insights and careful animation techniques. The proposed method may be easily prolonged to numerous datasets, making it a beneficial tool for Data Visualization in business, science, and analytics.

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x