Why Is My Code So Slow? A Guide to Py-Spy Python Profiling

-

frustrating issues to debug in data science code aren’t syntax errors or logical mistakes. Quite, they arrive from code that does exactly what it’s presupposed to do, but takes its sweet time doing it.

Functional but inefficient code could be a massive bottleneck in a knowledge science workflow. In this text, I’ll provide a transient introduction and walk-through of py-spy, a robust tool designed to profile your Python code. It could possibly pinpoint exactly where your program is spending essentially the most time so inefficiencies could be identified and corrected.

Example Problem

Let’s arrange an easy research query to put in writing some code for:

“For all flights going between US states and territories, which departing airport has the longest flights on average?”

Below is a straightforward Python script to reply this research query, using data retrieved from the Bureau of Transportation Statistics (BTS). The dataset consists of knowledge from every flight inside US states and territories between January and June of 2025 with information on the origin and destination airports. It’s roughly 3.5 million rows.

It calculates the Haversine Distance — the shortest distance between two points on a sphere — for every flight. Then, it groups the outcomes by departing airport to seek out the typical distance and reports the highest five.

import pandas as pd  
import math  
import time  
  
  
def haversine(lat_1, lon_1, lat_2, lon_2):  
    """Calculate the Haversine Distance between two latitude and longitude points"""  
    lat_1_rad = math.radians(lat_1)  
    lon_1_rad = math.radians(lon_1)  
    lat_2_rad = math.radians(lat_2)  
    lon_2_rad = math.radians(lon_2)  
  
    delta_lat = lat_2_rad - lat_1_rad  
    delta_lon = lon_2_rad - lon_1_rad  
  
    R = 6371  # Radius of the earth in km  
  
    return 2*R*math.asin(math.sqrt(math.sin(delta_lat/2)**2 + math.cos(lat_1_rad)*math.cos(lat_2_rad)*(math.sin(delta_lon/2))**2))  
  
  
if __name__ == '__main__':  
    # Load in flight data to a dataframe  
    flight_data_file = r"./data/2025_flight_data.csv"  
    flights_df = pd.read_csv(flight_data_file)  
  
    # Start timer to see how long evaluation takes  
    start = time.time()  
  
    # Calculate the haversine distance between each flight's start and end airport  
    haversine_dists = []  
    for i, row in flights_df.iterrows():  
        haversine_dists.append(haversine(lat_1=row["LATITUDE_ORIGIN"],  
                                         lon_1=row["LONGITUDE_ORIGIN"],  
                                         lat_2=row["LATITUDE_DEST"],  
                                         lon_2=row["LONGITUDE_DEST"]))  
  
    flights_df["Distance"] = haversine_dists  
  
    # Get result by grouping by origin airport, taking the typical flight distance and      printing the highest 5  
    result = (  
        flights_df  
        .groupby('DISPLAY_AIRPORT_NAME_ORIGIN').agg(avg_dist=('Distance', 'mean'))  
        .sort_values('avg_dist', ascending=False)  
    )  
  
    print(result.head(5))  
  
    # End timer and print evaluation time  
    end = time.time()  
    print(f"Took {end - start} s")

Running this code gives the next output:

                                        avg_dist
DISPLAY_AIRPORT_NAME_ORIGIN                     
Pago Pago International              4202.493567
Guam International                   3142.363005
Luis Munoz Marin International       2386.141780
Ted Stevens Anchorage International  2246.530036
Daniel K Inouye International        2211.857407
Took 169.8935534954071 s

These results make sense, because the airports listed are in American Samoa, Guam, Puerto Rico, Alaska, and Hawaii, respectively. These are all locations outside of the contiguous United States where one would expect long average flight distances.

The issue here isn’t the outcomes — that are valid — however the execution time: almost three minutes! While three minutes may be tolerable for a one-off run, it becomes a productivity killer during development. Imagine this as a part of an extended data pipeline. Each time a parameter is tweaked, a bug is fixed, or a cell is re-run, you’re forced to sit down idle while this system runs. That friction breaks your flow and turns a fast evaluation into an all-afternoon affair.

Now let’s see how py-spy will help us diagnose exactly what lines are taking so long.

What Is Py-Spy?

To know what py-spy is doing and the advantages of using it, it helps to match py-spy to the built-in Python profiler cProfile.

  • cProfile: This can be a Tracing Profiler, working just like a stopwatch on each function call. The time between each function call and return is measured and reported. While highly accurate, this adds significant overhead, because the profiler has to continually pause and record data, which might decelerate the script significantly.
  • py-spy: This can be a Sampling Profiler, working just like a high speed camera the entire program directly. py-spy sits completely outside the running Python script and takes high-frequency snapshots of this system’s state. It looks at the complete “Call Stack” to see exactly what line of code is being run and what function called it, all the best way as much as the highest level.

Running Py-spy

With a view to run py-spy on a Python script, the py-spy library should be installed within the Python environment.

pip install py-spy

Once the py-spy library is installed, our script could be profiled by running the next command within the terminal:

py-spy record -o profile.svg -r 100 -- python primary.py

Here’s what each a part of this command is definitely doing:

  • py-spy: Calls the tool.
  • record: This tells py-spy to make use of its “record” mode, which can repeatedly monitor this system while it runs and saves the info.
  • -o profile.svg: This specifies the output filename and format, telling it to output the outcomes as an SVG file called profile.svg.
  • -r 100: This specifies the sampling rate, setting it to 100 times per second. Which means that py-spy will check what this system is doing 100 times per second.
  • --: This separates the py-spy command from the Python script command. It tells py-spy that every little thing following this flag is the command to run, not arguments for py-spy itself.
  • python primary.py: That is the command to run the Python script to be profiled with py-spy, on this case running primary.py.

Note: If running on Linux, sudo privileges are sometimes a requirement for running py-spy, for security reasons.

After this command is finished running, an output file profile.svg will appear which can allow us to dig deeper into what parts of the code are taking the longest.

Py-spy Output

Icicle Graph output from py-spy

Opening up the output profile.svg reveals the visualization that py-spy has created for a way much time our program spent in numerous parts of the code. That is referred to as a Icicle Graph (or sometimes a Flame Graph if the y-axis is inverted) and is interpreted as follows:

  • Bars: Each coloured bar represents a specific function that was called throughout the execution of this system.
  • X-axis (Population): The horizontal axis represents the gathering of all samples taken throughout the profiling. They’re grouped in order that the width of a specific bar represents the proportion of the overall samples that this system was within the function represented by that bar. Note: That is a timeline; the ordering doesn’t represent when the function was called, only the overall volume of time spent.
  • Y-axis (Stack Depth): The vertical axis represents the decision stack. The highest bar labeled “all” represents the complete program, and the bars below it represent functions called from “all”. This continues down recursively with each bar broken down into the functions that were called during its execution. The very bottom bar shows the function that was actually running on the CPU when the sample was taken.

Interacting with the Graph

While the image above is static, the actual .svg file generated by py-spy is fully interactive. Once you open it in an internet browser, you may:

  • Search (Ctrl+F): Highlight specific functions to see where they seem within the stack.
  • Zoom: Click on any bar to zoom in on that specific function and its children, allowing you to isolate complex parts of the decision stack.
  • Hover: Hovering over any bar displays the particular function name, file path, line number, and the precise percentage of time it consumed.

Probably the most critical rule for reading the icicle graph is solely: The broader the bar, the more frequent the function. If a function bar spans 50% of the graph’s width, it signifies that this system was working on executing that function for 50% of the overall runtime.

Diagnosis

From the icicle graph above, we are able to see that the bar representing the Pandas iterrows() function is noticeably wide. Hovering over that bar when viewing the profile.svg file reveals that the true proportion for this function was 68.36%. So over 2/3 of the runtime was spent within the iterrows() function. Intuitively this bottleneck is sensible, as iterrows() creates a Pandas Series object for each single row within the loop, causing massive overhead. This reveals a transparent goal to try to optimize the runtime of the script.

Optimizing The Script

The clearest path to optimize this script based on what was learned from py-spy is to stop using iterrows() to loop over every row to calculate that haversine distance. As an alternative, it ought to be replaced with a vectorized calculation using NumPy that can do the calculation for each row with only one function call. So the changes to be made are:

  • Rewrite the haversine() function to make use of vectorized and efficient C-level NumPy operations that allow whole arrays to be passed in moderately than one set of coordinates at a time.
  • Replace the iterrows() loop with a single call to this newly vectorized haversine() function.
import pandas as pd  
import numpy as np  
import time  
  
  
def haversine(lat_1, lon_1, lat_2, lon_2):  
    """Calculate the Haversine Distance between two latitude and longitude points"""  
    lat_1_rad = np.radians(lat_1)  
    lon_1_rad = np.radians(lon_1)  
    lat_2_rad = np.radians(lat_2)  
    lon_2_rad = np.radians(lon_2)  
  
    delta_lat = lat_2_rad - lat_1_rad  
    delta_lon = lon_2_rad - lon_1_rad  
  
    R = 6371  # Radius of the earth in km  
  
    return 2*R*np.asin(np.sqrt(np.sin(delta_lat/2)**2 + np.cos(lat_1_rad)*np.cos(lat_2_rad)*(np.sin(delta_lon/2))**2))  
  
  
if __name__ == '__main__':  
    # Load in flight data to a dataframe  
    flight_data_file = r"./data/2025_flight_data.csv"  
    flights_df = pd.read_csv(flight_data_file)  
  
    # Start timer to see how long evaluation takes  
    start = time.time()  
  
    # Calculate the haversine distance between each flight's start and end airport  
    flights_df["Distance"] = haversine(lat_1=flights_df["LATITUDE_ORIGIN"],  
                                       lon_1=flights_df["LONGITUDE_ORIGIN"],  
                                       lat_2=flights_df["LATITUDE_DEST"],  
                                       lon_2=flights_df["LONGITUDE_DEST"])  
  
    # Get result by grouping by origin airport, taking the typical flight distance and      printing the highest 5  
    result = (  
        flights_df  
        .groupby('DISPLAY_AIRPORT_NAME_ORIGIN').agg(avg_dist=('Distance', 'mean'))  
        .sort_values('avg_dist', ascending=False)  
    )  
  
    print(result.head(5))  
  
    # End timer and print evaluation time  
    end = time.time()  
    print(f"Took {end - start} s")

Running this code gives the next output:

                                        avg_dist
DISPLAY_AIRPORT_NAME_ORIGIN                     
Pago Pago International              4202.493567
Guam International                   3142.363005
Luis Munoz Marin International       2386.141780
Ted Stevens Anchorage International  2246.530036
Daniel K Inouye International        2211.857407
Took 0.5649983882904053 s

These results are similar to the outcomes from before the code was optimized, but as an alternative of taking nearly three minutes to process, it took just over half a second!

Looking Ahead

In the event you are reading this from the long run (late 2026 or beyond), check in the event you are running Python 3.15 or newer. Python 3.15 is predicted to introduce a native sampling profiler in the usual library, offering similar functionality to py-spy without requiring external installation. For anyone on Python 3.14 or older py-spy stays the gold standard.

This text explored a tool for tackling a typical frustration in data science — a script that functions as intended, but is inefficiently written and takes a protracted time to run. An example script was provided to learn which US departure airports have the longest average flight distance in keeping with the Haversine distance. This script worked as expected, but took almost three minutes to run.

Using the py-spy Python profiler, we were in a position to learn that the reason for the inefficiency was the usage of the iterrows() function. By replacing iterrows() with a more efficient vectorized calculation of the Haversine distance, the runtime was optimized from three minutes all the way down to just over half a second.

See my GitHub Repository for the code from this text, including the preprocessing of the raw data from BTS.

Thanks for reading!

Data Sources

Data from the Bureau of Transportation Statistics (BTS) is a piece of the U.S. Federal Government and is in the general public domain under 17 U.S.C. § 105. It’s free to make use of, share, and adapt without copyright restriction.

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x