Constructing a Modern Dashboard with Python and Gradio

-

second in a brief series on developing data dashboards using the newest Python-based GUI development tools, Streamlit, Gradio, and Taipy. 

The source dataset for every dashboard will likely be the identical, but stored in several formats. As much as possible, I’ll also attempt to make the actual dashboard layouts for every tool resemble one another and have the identical functionality.

In the primary a part of this series, I created a Streamlit version of the dashboard that retrieves its data from an area PostgreSQL database. You’ll be able to view that article here.

This time, we’re exploring the usage of the Gradio library.

The information for this dashboard will likely be in an area CSV file, and Pandas will likely be our primary data processing engine.

If you would like to see a fast demo of the app, I actually have deployed it to Hugging Face Spaces. You’ll be able to run it using the link below, but note that the 2 input date picker pop-ups don’t work as a consequence of a known bug within the Hugging Face environment. This is barely the case for deployed apps on HF, you may still change the dates manually. Running the app locally works tremendous and doesn’t have this issue.

Dashboard demo on HuggingFace

What’s Gradio?

Gradio is an open-source Python package that simplifies the technique of constructing demos or web applications for machine learning models, APIs, or any Python function. With it, you may create demos or web applications while not having JavaScript, CSS, or webhosting experience. By writing just a couple of lines of Python code, you may unlock the ability of Gradio and seamlessly showcase your machine-learning models to a broader audience.

Gradio simplifies the event process by providing an intuitive framework that eliminates the complexities related to constructing user interfaces from scratch. Whether you’re a machine learning developer, researcher, or enthusiast, Gradio permits you to create beautiful and interactive demos that enhance the understanding and accessibility of your machine learning models.

This open-source Python package helps you bridge the gap between your machine learning expertise and a broader audience, making your models accessible and actionable.

What we’ll develop

We’re developing an information dashboard. Our source data will likely be a single CSV file containing 100,000 synthetic sales records.

The actual source of the info isn’t essential. It could just as easily be a text file, an Excel file, SQLite, or any database you may hook up with.

That is what our final dashboard will seem like.

Image by Creator

There are 4 principal sections.

  • The highest row enables the user to pick out specific start and end dates and/or product categories using date pickers and a drop-down list, respectively.
  • The second row — Key metrics — shows a top-level summary of the chosen data.
  • The Visualisation section allows the user to pick out one in every of three graphs to display the input dataset.
  • The raw data section is precisely what it claims to be. This tabular representation of the chosen data effectively shows a snapshot of the underlying CSV data file.

Using the dashboard is simple. Initially, stats for the entire data set are displayed. The user can then narrow the info focus using the three filter fields at the highest of the display. The graphs, key metrics, and raw data sections dynamically update to reflect the user’s selections within the filter fields.

The underlying data

As mentioned, the dashboard’s source data is contained in a single comma-separated values (CSV) file. The information consists of 100,000 synthetic sales-related records. Listed below are the primary ten records of the file to offer you an idea of what it looks like.

+----------+------------+------------+----------------+------------+---------------+------------+----------+-------+--------------------+
| order_id | order_date | customer_id| customer_name  | product_id | product_names | categories | quantity | price | total              |
+----------+------------+------------+----------------+------------+---------------+------------+----------+-------+--------------------+
| 0        | 01/08/2022 | 245        | Customer_884   | 201        | Smartphone    | Electronics| 3        | 90.02 | 270.06             |
| 1        | 19/02/2022 | 701        | Customer_1672  | 205        | Printer       | Electronics| 6        | 12.74 | 76.44              |
| 2        | 01/01/2017 | 184        | Customer_21720 | 208        | Notebook      | Stationery | 8        | 48.35 | 386.8              |
| 3        | 09/03/2013 | 275        | Customer_23770 | 200        | Laptop        | Electronics| 3        | 74.85 | 224.55             |
| 4        | 23/04/2022 | 960        | Customer_23790 | 210        | Cabinet       | Office     | 6        | 53.77 | 322.62             |
| 5        | 10/07/2019 | 197        | Customer_25587 | 202        | Desk          | Office     | 3        | 47.17 | 141.51             |
| 6        | 12/11/2014 | 510        | Customer_6912  | 204        | Monitor       | Electronics| 5        | 22.5  | 112.5              |
| 7        | 12/07/2016 | 150        | Customer_17761 | 200        | Laptop        | Electronics| 9        | 49.33 | 443.97             |
| 8        | 12/11/2016 | 997        | Customer_23801 | 209        | Coffee Maker  | Electronics| 7        | 47.22 | 330.54             |
| 9        | 23/01/2017 | 151        | Customer_30325 | 207        | Pen           | Stationery | 6        | 3.5   | 21                 |
+----------+------------+------------+----------------+------------+---------------+------------+----------+-------+--------------------+

And here is a few Python code you should utilize to generate the same dataset. Make sure that each the NumPy and Pandas libraries are installed first.

# generate the 100K record CSV file
#
import polars as pl
import numpy as np
from datetime import datetime, timedelta

def generate(nrows: int, filename: str):
    names = np.asarray(
        [
            "Laptop",
            "Smartphone",
            "Desk",
            "Chair",
            "Monitor",
            "Printer",
            "Paper",
            "Pen",
            "Notebook",
            "Coffee Maker",
            "Cabinet",
            "Plastic Cups",
        ]
    )
    categories = np.asarray(
        [
            "Electronics",
            "Electronics",
            "Office",
            "Office",
            "Electronics",
            "Electronics",
            "Stationery",
            "Stationery",
            "Stationery",
            "Electronics",
            "Office",
            "Sundry",
        ]
    )
    product_id = np.random.randint(len(names), size=nrows)
    quantity = np.random.randint(1, 11, size=nrows)
    price = np.random.randint(199, 10000, size=nrows) / 100
    # Generate random dates between 2010-01-01 and 2023-12-31
    start_date = datetime(2010, 1, 1)
    end_date = datetime(2023, 12, 31)
    date_range = (end_date - start_date).days
    # Create random dates as np.array and convert to string format
    order_dates = np.array([(start_date + timedelta(days=np.random.randint(0, date_range))).strftime('%Y-%m-%d') for _ in range(nrows)])
    # Define columns
    columns = {
        "order_id": np.arange(nrows),
        "order_date": order_dates,
        "customer_id": np.random.randint(100, 1000, size=nrows),
        "customer_name": [f"Customer_{i}" for i in np.random.randint(2**15, size=nrows)],
        "product_id": product_id + 200,
        "product_names": names[product_id],
        "categories": categories[product_id],
        "quantity": quantity,
        "price": price,
        "total": price * quantity,
    }
    # Create Polars DataFrame and write to CSV with explicit delimiter
    df = pl.DataFrame(columns)
    df.write_csv(filename, separator=',',include_header=True)  # Ensure comma is used because the delimiter

# Generate 100,000 rows of knowledge with random order_date and save to CSV
generate(100_000, "/mnt/d/sales_data/sales_data.csv")

Installing and using Gradio

Installing Gradio is simple using pip, but for coding, one of the best practice is to establish a separate Python environment for all of your work. I take advantage of Miniconda for that purpose, but be at liberty to make use of whatever method suits your work practice.

Gradio needs at the least Python 3.8 installed to work accurately.

Once the environment is created, switch to it using the ‘activate’ command, after which run ‘pip install’ to install our required Python libraries.

#create our test environment
(base) C:Usersthoma>conda create -n gradio_dashboard python=3.12 -y

# Now activate it
(base) C:Usersthoma>conda activate gradio_dashboard

# Install python libraries, etc ...
(gradio_dashboard) C:Usersthoma>pip install gradio pandas matplotlib cachetools

Key differences between Streamlit and Gradio

As I’ll display in this text, it’s possible to supply very similar data dashboards using Streamlit and Gradio. Nonetheless, their ethos differs in several key ways.

Focus

  • Gradio specialises in creating interfaces for machine learning models, whilst Streamlit is more designed for general-purpose data applications and visualisations.

Ease of use

  • Gradio is thought for its simplicity and rapid prototyping capabilities, making it easier for beginners to make use of. Streamlit offers more advanced features and customisation options, which can require a steeper learning curve.

Interactivity

  • Streamlit uses a reactive Programming model where any input change triggers a whole script rerun, updating all components immediately. Gradio, by default, updates only when a user clicks a submit button, though it may possibly be configured for live updates.

Customization

  • Gradio focuses on pre-built components for quickly demonstrating AI models. Streamlit provides more extensive customisation options and adaptability for complex projects.

Deployment

  • Having deployed each a Streamlit and a Gradio app, I might say it’s easier to deploy a Streamlit app than a Gradio app. In Streamlit, deployment will be done with a single click via the Streamlit Community Cloud. This functionality is built into any Streamlit app you create. Gradio offers deployment using Hugging Face Spaces, but it surely involves more work. Neither method is especially complex, though.

Use cases

Streamlit excels in creating data-centric applications and interactive dashboards for complex projects. Gradio is right for quickly showcasing machine learning models and constructing simpler applications.

The Gradio Dashboard Code

I’ll break down the code into sections and explain each as we proceed.

We start by importing the required external libraries and loading the complete dataset from the CSV file right into a Pandas DataFrame.

import gradio as gr
import pandas as pd
import matplotlib.pyplot as plt
import datetime
import warnings
import os
import tempfile
from cachetools import cached, TTLCache

warnings.filterwarnings("ignore", category=FutureWarning, module="seaborn")

# ------------------------------------------------------------------
# 1) Load CSV data once
# ------------------------------------------------------------------
csv_data = None

def load_csv_data():
    global csv_data
    
    # Optional: specify column dtypes if known; adjust as essential
    dtype_dict = {
        "order_id": "Int64",
        "customer_id": "Int64",
        "product_id": "Int64",
        "quantity": "Int64",
        "price": "float",
        "total": "float",
        "customer_name": "string",
        "product_names": "string",
        "categories": "string"
    }
    
    csv_data = pd.read_csv(
        "d:/sales_data/sales_data.csv",
        parse_dates=["order_date"],
        dayfirst=True,      # in case your dates are DD/MM/YYYY format
        low_memory=False,
        dtype=dtype_dict
    )

load_csv_data()

Next, we configure a time-to-live cache with a maximum of 128 items and an expiration of 300 seconds. That is used to store the outcomes of pricy function calls and speed up repeated lookups

The get_unique_categories function returns a listing of unique, cleaned (capitalised) categories from the `csv_data` DataFrame, caching the result for quicker access.

The get_date_range function returns the minimum and maximum order dates from the dataset, or None if the info is unavailable.

The filter_data function filters the csv_data DataFrame based on a specified date range and optional category, returning the filtered DataFrame.

The get_dashboard_stats function retrieves summary metrics — total revenue, total orders, average order value, and top category — for the given filters. Internally it uses filter_data() to scope the dataset after which calculate these key statistics.

The get_data_for_table function returns an in depth DataFrame of filtered sales data, sorted by order_id and order_date, including additional revenue for every sale.

The get_plot_data function formats data for generating a plot by summing revenue over time, grouped by date.

The get_revenue_by_category function aggregates and returns revenue by category, sorted by revenue, inside the required date range and category.

The get_top_products function returns the highest 10 products by revenue, filtered by date range and category.

Based on the orientation argument, the create_matplotlib_figure function generates a bar plot from the info and saves it as a picture file, either vertical or horizontal.

cache = TTLCache(maxsize=128, ttl=300)

@cached(cache)
def get_unique_categories():
    global csv_data
    if csv_data is None:
        return []
    cats = sorted(csv_data['categories'].dropna().unique().tolist())
    cats = [cat.capitalize() for cat in cats]
    return cats

def get_date_range():
    global csv_data
    if csv_data is None or csv_data.empty:
        return None, None
    return csv_data['order_date'].min(), csv_data['order_date'].max()

def filter_data(start_date, end_date, category):
    global csv_data

    if isinstance(start_date, str):
        start_date = datetime.datetime.strptime(start_date, '%Y-%m-%d').date()
    if isinstance(end_date, str):
        end_date = datetime.datetime.strptime(end_date, '%Y-%m-%d').date()

    df = csv_data.loc[
        (csv_data['order_date'] >= pd.to_datetime(start_date)) &
        (csv_data['order_date'] <= pd.to_datetime(end_date))
    ].copy()

    if category != "All Categories":
        df = df.loc[df['categories'].str.capitalize() == category].copy()

    return df

def get_dashboard_stats(start_date, end_date, category):
    df = filter_data(start_date, end_date, category)
    if df.empty:
        return (0, 0, 0, "N/A")

    df['revenue'] = df['price'] * df['quantity']
    total_revenue = df['revenue'].sum()
    total_orders = df['order_id'].nunique()
    avg_order_value = total_revenue / total_orders if total_orders else 0

    cat_revenues = df.groupby('categories')['revenue'].sum().sort_values(ascending=False)
    top_category = cat_revenues.index[0] if not cat_revenues.empty else "N/A"

    return (total_revenue, total_orders, avg_order_value, top_category.capitalize())

def get_data_for_table(start_date, end_date, category):
    df = filter_data(start_date, end_date, category)
    if df.empty:
        return pd.DataFrame()

    df = df.sort_values(by=["order_id", "order_date"], ascending=[True, False]).copy()

    columns_order = [
        "order_id", "order_date", "customer_id", "customer_name",
        "product_id", "product_names", "categories", "quantity",
        "price", "total"
    ]
    columns_order = [col for col in columns_order if col in df.columns]
    df = df[columns_order].copy()

    df['revenue'] = df['price'] * df['quantity']
    return df

def get_plot_data(start_date, end_date, category):
    df = filter_data(start_date, end_date, category)
    if df.empty:
        return pd.DataFrame()
    df['revenue'] = df['price'] * df['quantity']
    plot_data = df.groupby(df['order_date'].dt.date)['revenue'].sum().reset_index()
    plot_data.rename(columns={'order_date': 'date'}, inplace=True)
    return plot_data

def get_revenue_by_category(start_date, end_date, category):
    df = filter_data(start_date, end_date, category)
    if df.empty:
        return pd.DataFrame()
    df['revenue'] = df['price'] * df['quantity']
    cat_data = df.groupby('categories')['revenue'].sum().reset_index()
    cat_data = cat_data.sort_values(by='revenue', ascending=False)
    return cat_data

def get_top_products(start_date, end_date, category):
    df = filter_data(start_date, end_date, category)
    if df.empty:
        return pd.DataFrame()
    df['revenue'] = df['price'] * df['quantity']
    prod_data = df.groupby('product_names')['revenue'].sum().reset_index()
    prod_data = prod_data.sort_values(by='revenue', ascending=False).head(10)
    return prod_data

def create_matplotlib_figure(data, x_col, y_col, title, xlabel, ylabel, orientation='v'):
    plt.figure(figsize=(10, 6))
    if data.empty:
        plt.text(0.5, 0.5, 'No data available', ha='center', va='center')
    else:
        if orientation == 'v':
            plt.bar(data[x_col], data[y_col])
            plt.xticks(rotation=45, ha='right')
        else:
            plt.barh(data[x_col], data[y_col])
            plt.gca().invert_yaxis() 

    plt.title(title)
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    plt.tight_layout()

    with tempfile.NamedTemporaryFile(delete=False, suffix=".png") as tmpfile:
        plt.savefig(tmpfile.name)
    plt.close()
    return tmpfile.name

The update_dashboard function retrieves key sales statistics (total revenue, total orders, average order value, and top category) by calling theget_dashboard_stats function. It gathers data for 3 distinct visualisations (revenue over time, revenue by category, and top products), then uses create_matplotlib_figure to generate plots. It prepares and returns an information table (via the get_data_for_table() function) together with all generated plots and stats in order that they will be displayed within the dashboard.

The create_dashboard function sets the date boundaries (minimum and maximum dates) and establishes the initial default filter values. It uses Gradio to construct a user interface (UI) featuring date pickers, category drop-downs, key metric displays, plot tabs, and an information table. It then wires up the filters in order that changing any of them triggers a call to the update_dashboard function, ensuring the dashboard visuals and metrics are at all times in sync with the chosen filters. Finally, it returns the assembled Gradio interface launched as an online application.

def update_dashboard(start_date, end_date, category):
    total_revenue, total_orders, avg_order_value, top_category = get_dashboard_stats(start_date, end_date, category)

    # Generate plots
    revenue_data = get_plot_data(start_date, end_date, category)
    category_data = get_revenue_by_category(start_date, end_date, category)
    top_products_data = get_top_products(start_date, end_date, category)

    revenue_over_time_path = create_matplotlib_figure(
        revenue_data, 'date', 'revenue',
        "Revenue Over Time", "Date", "Revenue"
    )
    revenue_by_category_path = create_matplotlib_figure(
        category_data, 'categories', 'revenue',
        "Revenue by Category", "Category", "Revenue"
    )
    top_products_path = create_matplotlib_figure(
        top_products_data, 'product_names', 'revenue',
        "Top Products", "Revenue", "Product Name", orientation='h'
    )

    # Data table
    table_data = get_data_for_table(start_date, end_date, category)

    return (
        revenue_over_time_path,
        revenue_by_category_path,
        top_products_path,
        table_data,
        total_revenue,
        total_orders,
        avg_order_value,
        top_category
    )

def create_dashboard():
    min_date, max_date = get_date_range()
    if min_date is None or max_date is None:
        min_date = datetime.datetime.now()
        max_date = datetime.datetime.now()

    default_start_date = min_date
    default_end_date = max_date

    with gr.Blocks(css="""
        footer {display: none !essential;}
        .tabs {border: none !essential;}  
        .gr-plot {border: none !essential; box-shadow: none !essential;}
    """) as dashboard:
        
        gr.Markdown("# Sales Performance Dashboard")

        # Filters row
        with gr.Row():
            start_date = gr.DateTime(
                label="Start Date",
                value=default_start_date.strftime('%Y-%m-%d'),
                include_time=False,
                type="datetime"
            )
            end_date = gr.DateTime(
                label="End Date",
                value=default_end_date.strftime('%Y-%m-%d'),
                include_time=False,
                type="datetime"
            )
            category_filter = gr.Dropdown(
                selections=["All Categories"] + get_unique_categories(),
                label="Category",
                value="All Categories"
            )

        gr.Markdown("# Key Metrics")

        # Stats row
        with gr.Row():
            total_revenue = gr.Number(label="Total Revenue", value=0)
            total_orders = gr.Number(label="Total Orders", value=0)
            avg_order_value = gr.Number(label="Average Order Value", value=0)
            top_category = gr.Textbox(label="Top Category", value="N/A")

        gr.Markdown("# Visualisations")
        # Tabs for Plots
        with gr.Tabs():
            with gr.Tab("Revenue Over Time"):
                revenue_over_time_image = gr.Image(label="Revenue Over Time", container=False)
            with gr.Tab("Revenue by Category"):
                revenue_by_category_image = gr.Image(label="Revenue by Category", container=False)
            with gr.Tab("Top Products"):
                top_products_image = gr.Image(label="Top Products", container=False)

        gr.Markdown("# Raw Data")
        # Data Table (below the plots)
        data_table = gr.DataFrame(
            label="Sales Data",
            type="pandas",
            interactive=False
        )

        # When filters change, update the whole lot
        for f in [start_date, end_date, category_filter]:
            f.change(
                fn=lambda s, e, c: update_dashboard(s, e, c),
                inputs=[start_date, end_date, category_filter],
                outputs=[
                    revenue_over_time_image, 
                    revenue_by_category_image, 
                    top_products_image,
                    data_table,
                    total_revenue, 
                    total_orders,
                    avg_order_value, 
                    top_category
                ]
            )

        # Initial load
        dashboard.load(
            fn=lambda: update_dashboard(default_start_date, default_end_date, "All Categories"),
            outputs=[
                revenue_over_time_image, 
                revenue_by_category_image, 
                top_products_image,
                data_table,
                total_revenue, 
                total_orders,
                avg_order_value, 
                top_category
            ]
        )

    return dashboard

if __name__ == "__main__":
    dashboard = create_dashboard()
    dashboard.launch(share=False)

Running the program

Create a Python file, e.g. gradio_test.py, and insert all of the above code snippets. Reserve it, and run it like this,

(gradio_dashboard) $ python gradio_test.py

* Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.

Click on the local URL shown, and the dashboard will open full screen in your browser.

Summary

This text provides a comprehensive guide to constructing an interactive sales performance dashboard using Gradio and a CSV file as its source data.

Gradio is a contemporary, Python-based open-source framework that simplifies the creation of data-driven dashboards and GUI applications. The dashboard I developed allows users to filter data by date ranges and product categories, view key metrics similar to total revenue and top-performing categories, explore visualisations like revenue trends and top products, and navigate through raw data with pagination.

I also mentioned some key differences between developing visualisation tools using Gradio and Streamlit, one other popular front-end Python library.

This guide provides a comprehensive implementation of a Gradio data dashboard, covering all the process from creating sample data to developing Python functions for querying data, generating plots, and handling user input. This step-by-step approach demonstrates the way to leverage Gradio’s capabilities to create user-friendly and dynamic dashboards, making it ideal for data engineers and scientists who wish to construct interactive data applications.

Although I used a CSV file for my data, modifying the code to make use of one other data source, similar to a relational database management system (RDBMS) like SQLite, ought to be straightforward. For instance, in my other article on this series on creating the same dashboard using Streamlit, the info source is a PostgreSQL database.

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x