Data Visualization Explained (Part 5): Visualizing Time-Series Data in Python (Matplotlib, Plotly, and Altair)

-

It’s time to begin constructing your personal data visualizations. In this text, I’ll walk through the strategy of visualizing time-series data in Python intimately. If you might have not read the previous articles in my data visualization series, I strongly recommend reading a minimum of the previous article for a review of Python.

Over the course of coding visualizations in Python, I’ll give attention to three Python packages: Matplotlib, Plotly, and Altair. One approach to learning these might involve writing 1-2 articles per package, each delving into the chosen package intimately. While it is a valid approach, the main target of my series will not be on any particular library; it’s in regards to the data visualization process itself. These packages are simply tools—a method to an end.

In consequence, I’ll structure this text and those to follow each around a selected of information visualization, and I’ll discuss implement that visualization in each of the listed packages to make sure you might have a breadth of approaches available to you.

First up: a definition for time-series data.

What Is Time-Series Data?

Formally, time-series data involves a variable that may be a function of time. In easy terms, this just means some data that changes over time.

For instance, a public company’s stock price over the past ten years is time-series data. For those who’d prefer a more scientific example, consider the weather. A graph depicting the day by day temperature of your favorite city over the course of the 12 months is a graph that depicts time-series data.

Time-series data is a superb start line for data visualization for a number of reasons:

  • It’s an especially common and useful style of data. There is kind of a bit of data that relies on time, and understanding it provides meaningful insight into the topic of interest going forward.
  • There are tried and true methods to visualise time-series data effectively, as you’ll see below. Master these, and also you’ll be in fine condition.
  • As compared with another sorts of data, time-series visualizations are fairly intuitive to humans and align with our perception of time. This makes it easier to give attention to the fundamental elements of visualization design when starting out, as an alternative of getting bogged down in attempting to make sense of very complex data.

Let’s start by taking a take a look at different visualization methods on a conceptual level.

How Is Time-Series Data Visualized?

The usual for time-series visualization is the famed line chart:

Image by Wikimedia Commons

This chart generally puts time on the x-axis, and the variable that changes with time on the y-axis. This provides a view that look like “moving forward,” in step with humans’ linear perception of time.

Though the road chart is the usual, there are other, related possibilities.

Multiple Line Chart

This approach is a direct extension of a singular line chart and displays several related time series on the identical plot, allowing comparison between groups or categories (e.g., sales by region):

Image by Our World in Data

Area Chart

Functionally, an area chart is nearly the exact same as a line chart, but the realm under the road is filled in. It emphasizes the magnitude of change:

Image by Wikimedia Commons

Stacked Area Chart

Technically, the stacked area chart is the analogue to the multiple line chart, but it surely is a bit trickier to read. Particularly, the whole is , with the baseline for every stacked line starting on the one below it. For example, at 2023 within the chart below, “Ages 25-64” represents about 4 billion people, since we start counting where “Ages 15-24” ends.

Image by Our World in Data

Bar Chart (Vertical or Horizontal)

Finally, in some cases, a bar chart can also be appropriate for time-series visualization. This approach is beneficial for those who wish to point out discrete time intervals—reminiscent of monthly sum or yearly average of some metric—quite than continuous data. That said, I won’t be coding bar charts in this text.

Image by Our World In Data

Now, let’s get to truly constructing these visualizations. In each of the examples below, I’ll walk through the code in a selected visualization library for constructing line charts and area charts. I even have linked the info here and encourage you to follow along. To internalize these techniques, you have to practice using them yourself.

Coding Time-Series Visualizations in Matplotlib

import pandas as pd
import matplotlib.pyplot as plt

# Load data
df = pd.read_csv('sales_data.csv')
df['Date'] = pd.to_datetime(df['Date'])

# Example 1: Easy Line Chart
fig1, ax1 = plt.subplots(figsize=(10, 6))
ax1.plot(df['Date'], df['Product A Sales'], linewidth=2)
ax1.set_xlabel('Date')
ax1.set_ylabel('Sales')
ax1.set_title('Product A Sales Over Time')
ax1.grid(True, alpha=0.3)
plt.tight_layout()
# Display with: fig1

# Example 2: Multiple Line Chart
fig2, ax2 = plt.subplots(figsize=(10, 6))
ax2.plot(df['Date'], df['Product A Sales'], label='Product A', linewidth=2)
ax2.plot(df['Date'], df['Product B Sales'], label='Product B', linewidth=2)
ax2.plot(df['Date'], df['Product C Sales'], label='Product C', linewidth=2)
ax2.set_xlabel('Date')
ax2.set_ylabel('Sales')
ax2.set_title('Sales Comparison - All Products')
ax2.legend()
ax2.grid(True, alpha=0.3)
plt.tight_layout()
# Display with: fig2

# Example 3: Area Chart
fig3, ax3 = plt.subplots(figsize=(10, 6))
ax3.fill_between(df['Date'], df['Product A Sales'], alpha=0.4)
ax3.plot(df['Date'], df['Product A Sales'], linewidth=2)
ax3.set_xlabel('Date')
ax3.set_ylabel('Sales')
ax3.set_title('Product A Sales - Area Chart')
ax3.grid(True, alpha=0.3)
plt.tight_layout()
# Display with: fig3

# Example 4: Stacked Area Chart
fig4, ax4 = plt.subplots(figsize=(10, 6))
ax4.stackplot(df['Date'], df['Product A Sales'], df['Product B Sales'], df['Product C Sales'],
              labels=['Product A', 'Product B', 'Product C'], alpha=0.7)
ax4.set_xlabel('Date')
ax4.set_ylabel('Sales')
ax4.set_title('Total Sales - Stacked Area Chart')
ax4.legend(loc='upper left')
ax4.grid(True, alpha=0.3)
plt.tight_layout()
# Display with: fig4

Running this code produces the next 4 visualizations:

Let’s break the code down step-by-step to make sure you understand what is occurring:

  1. First, we load the info into pandas as a CSV file and make sure the date is correctly represented as a datetime object.
  2. Matplotlib structures charts throughout the Figure object, which represents the complete Canvas. This may be accessed directly using plt.figure, but having multiple variables using plt.subplots is more intuitive for multiple visualizations. Every call to plt.subplots defines a brand new, separate Figure (canvas).
  3. The road fig1, ax1 = plt.subplots(figsize=(10, 6)) defines the primary subplot; fig1 represents the canvas, but ax1 represents the inside it and is the variable where you’ll make most changes.
  4. Matplotlib has different functions for various charts. The plot function plots 2-D points after which connects them to construct a line chart. That is what we specify within the line ax1.plot(df['Date'], df['Product A Sales'], linewidth=2).
  5. The remaining lines are primarily aesthetic functions that do exactly what their names suggest: labeling axes, adding gridlines, and specifying layout.
  6. For the multiple line chart, the code is precisely the identical, except we call plot 3 times: one for every set of x-y points that we wish to graph to point out all of the products.
  7. The area chart is nearly similar to the road chart, aside from the addition of ax3.fill_between(df['Date'], df['Product A Sales'], alpha=0.4), which tells Matplotlib to shade the realm below the road.
  8. The stacked area chart, in contrast, requires us to make use of the stacked_plot function, which takes in all three data arrays we wish to plot without delay. The remaining aesthetic code, nevertheless, is similar.

Try programming these yourself in your favorite IDE or in a Jupyter notebook. What patterns do you see? Which chart do you favor probably the most?

Also, do not forget that you don’t want to memorize this syntax, especially for those who are latest to programming data visualizations or latest to Python typically. Give attention to trying to grasp what is occurring on a conceptual level; you possibly can all the time look up the actual syntax and plug your data in as needed.

This can hold true for the remaining two examples as well.

Coding Time-Series Visualizations in Plotly

Here is the code to generate the identical visualizations as above, this time in Plotly’s style:

import pandas as pd
import plotly.graph_objects as go

# Load data
df = pd.read_csv('sales_data.csv')
df['Date'] = pd.to_datetime(df['Date'])

# Example 1: Easy Line Chart
fig1 = go.Figure()
fig1.add_trace(go.Scatter(x=df['Date'], y=df['Product A Sales'], mode='lines', name='Product A'))
fig1.update_layout(
    title='Product A Sales Over Time',
    xaxis_title='Date',
    yaxis_title='Sales',
    template='plotly_white'
)
# Display with: fig1

# Example 2: Multiple Line Chart
fig2 = go.Figure()
fig2.add_trace(go.Scatter(x=df['Date'], y=df['Product A Sales'], mode='lines', name='Product A'))
fig2.add_trace(go.Scatter(x=df['Date'], y=df['Product B Sales'], mode='lines', name='Product B'))
fig2.add_trace(go.Scatter(x=df['Date'], y=df['Product C Sales'], mode='lines', name='Product C'))
fig2.update_layout(
    title='Sales Comparison - All Products',
    xaxis_title='Date',
    yaxis_title='Sales',
    template='plotly_white'
)
# Display with: fig2

# Example 3: Area Chart
fig3 = go.Figure()
fig3.add_trace(go.Scatter(
    x=df['Date'], y=df['Product A Sales'],
    mode='lines',
    name='Product A',
    fill='tozeroy'
))
fig3.update_layout(
    title='Product A Sales - Area Chart',
    xaxis_title='Date',
    yaxis_title='Sales',
    template='plotly_white'
)
# Display with: fig3

# Example 4: Stacked Area Chart
fig4 = go.Figure()
fig4.add_trace(go.Scatter(
    x=df['Date'], y=df['Product A Sales'],
    mode='lines',
    name='Product A',
    stackgroup='one'
))
fig4.add_trace(go.Scatter(
    x=df['Date'], y=df['Product B Sales'],
    mode='lines',
    name='Product B',
    stackgroup='one'
))
fig4.add_trace(go.Scatter(
    x=df['Date'], y=df['Product C Sales'],
    mode='lines',
    name='Product C',
    stackgroup='one'
))
fig4.update_layout(
    title='Total Sales - Stacked Area Chart',
    xaxis_title='Date',
    yaxis_title='Sales',
    template='plotly_white'
)
# Display with: fig4

We obtain the next 4 visualizations:

Here’s a breakdown of the code:

  • Plotly is fully independent of Matplotlib. It uses similarly named Figure objects, but doesn’t have any ax objects.
  • The Scatter function with mode “lines” is used to construct a line chart with the desired x- and y-axis data. You possibly can consider the add_trace function as adding a brand new component to an existing Figure. Thus, for the multiple line chart, we simply call add_trace with the suitable Scatter inputs 3 times.
  • For labeling and aesthetics in Plotly, use the update_layout function.
  • The realm chart is built identically to the road chart, with the addition of the optional argument fill='tozeroy'.
    • Upon first glance, this may occasionally appear to be some obscure color, but it surely is definitely saying “TO ZERO Y,” specifying to Plotly the realm that needs to be filled in.
    • For those who’re having trouble visualizing this, try changing the argument to “tozerox” and see what happens.
  • For the stacked area chart, we’d like a unique optional parameter: stackgroup='one'. Adding this to every of the Scatter calls tells Plotly that they’re all to be constructed as a part of the identical stack.

A bonus of Plotly is that by default, all Plotly charts are interactive and are available with the power to zoom, hover for tooltips, and toggle the legend. (Note the photographs above are saved as PNGs, so you will want to generate the plots yourself with a view to see this.)

Coding Time-Series Visualizations in Altair

Let’s finish off by generating these 4 visualizations in Altair. Here is the code:

import pandas as pd
import altair as alt

# Load data
df = pd.read_csv('sales_data.csv')
df['Date'] = pd.to_datetime(df['Date'])

# Example 1: Easy Line Chart
chart1 = alt.Chart(df).mark_line().encode(
    x='Date:T',
    y='Product A Sales:Q'
).properties(
    title='Product A Sales Over Time',
    width=700,
    height=400
)
# Display with: chart1

# Example 2: Multiple Line Chart
# Reshape data for Altair
df_melted = df.melt(id_vars='Date', var_name='product', value_name='sales')

chart2 = alt.Chart(df_melted).mark_line().encode(
    x='Date:T',
    y='sales:Q',
    color='product:N'
).properties(
    title='Sales Comparison - All Products',
    width=700,
    height=400
)
# Display with: chart2

# Example 3: Area Chart
chart3 = alt.Chart(df).mark_area(opacity=0.7).encode(
    x='Date:T',
    y='Product A Sales:Q'
).properties(
    title='Product A Sales - Area Chart',
    width=700,
    height=400
)
# Display with: chart3

# Example 4: Stacked Area Chart
chart4 = alt.Chart(df_melted).mark_area(opacity=0.7).encode(
    x='Date:T',
    y='sales:Q',
    color='product:N'
).properties(
    title='Total Sales - Stacked Area Chart',
    width=700,
    height=400
)
# Display with: chart4

We obtain the next charts:

Let’s break down the code:

  • Altair has a rather different structure from Matplotlib and Plotly. It takes some practice to understand, but when you understand it, its intuitiveness makes constructing latest visualizations straightforward.
  • The whole lot in Altair revolves across the Chart object, into which you pass in your data. Then, you utilize a mark_ function to specify what sort of chart you would like to construct, and the encoding function to specify what variables will correspond to what visual elements on the chart (e.g., x-axis, y-axis, color, size, etc.).
  • For the road chart, we use the mark_line function, after which specify that we wish the date on the x-axis and the sales on the y-axis.
  • The melt function doesn’t change the info itself, just its . It puts the products all right into a single column, a “long format” which is more amenable to Altair’s visualization model. For more details, try this beneficial article.
  • Once we transform the info as above, we will construct our multiple line chart just by adding a “color” encoding, as shown within the code. This was made possible because all of the product types at the moment are available in a single column, and we will tell Altair to differentiate them by color.
  • The code for generating area charts showcases the great thing about Altair’s structure. The whole lot stays the identical—all it’s worthwhile to do is change the function getting used to mark_area!

As you explore other sorts of visualizations on your personal (and in future articles!), Altair’s model for constructing visualizations will grow to be easier to implement (and hopefully appreciate).

What’s Next?

In future articles, I’ll cover use these libraries to construct additional sorts of visualizations. As you proceed learning, do not forget that the aim of those articles is to master anybody tool. That is about learning data visualization holistically, and my hope is that you might have walked away from this text with a greater understanding of how time-series data is visualized.

As for the code, that comfort comes with time and practice. For now, it’s best to be happy to take the examples above and adjust them for your personal data as needed.

Until next time.

References

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x