Visualising Strava Race Evaluation

-

We’re ready now to play with the info to create the visualisations.

Challenges:

To acquire the info needed for the visuals my first intuition was: have a look at the cumulative distance column for each runner, discover when a lap distance was accomplished (1000, 2000, 3000, etc.) by each of them and do the differences of timestamps.

That algorithm looks easy, and might work, however it had some limitations that I needed to deal with:

  1. Exact lap distances are sometimes accomplished in between two data points registered. To be more accurate I needed to do interpolation of each position and time.
  2. As a consequence of difference within the precision of devices, there may be misalignments across runners. Essentially the most typical is when a runner’s lap notification beeps before one other one even in the event that they have been together the entire track. To minimise this I made a decision to use the reference runner to set the position marks for each lap within the track. The time difference might be calculated when other runners cross those marks (though their cumulative distance is ahead or behind the lap). That is more near the truth of the race: if someone crosses a degree before, they’re ahead (regardless the cumulative distance of their device)
  3. With the previous point comes one other problem: the latitude and longitude of a reference mark might never be exactly registered on the opposite runners’ data. I used Nearest Neighbours to search out the closest datapoint when it comes to position.
  4. Finally, Nearest Neighbours might bring mistaken datapoints if the track crosses the identical positions at different moments in time. So the population where the Nearest Neighbours will search for the most effective match must be reduced to a smaller group of candidates. I defined a window size of 20 datapoints across the goal distance (distance_cum).

Algorithm

With all of the previous limitations in mind, the algorithm needs to be as follows:

1. Select the reference and a lap distance (default= 1km)

2. Using the reference data, discover the position and the moment every lap was accomplished: the reference marks.

3. Go to other runner’s data and discover the moments they crossed those position marks. Then calculate the difference in time of each runners crossing the marks. Finally the delta of this time difference to represent the evolution of the gap.

Code Example

1. Select the reference and a lap distance (default= 1km)

  • Juan might be the reference (juan_df) on the examples.
  • The opposite runners might be Pedro (pedro_df ) and Jimena (jimena_df).
  • Lap distance might be 1000 metres

2. Create interpolate_laps(): function that finds or interpolates the precise point for every accomplished lap and return it in a brand new dataframe. The inferpolation is completed with the function: interpolate_value() that was also created.

## Function: interpolate_value()

Input:
- start: The starting value.
- end: The ending value.
- fraction: A price between 0 and 1 that represents the position between
the beginning and end values where the interpolation should occur.
Return:
- The interpolated value that lies between the start and end values
at the desired fraction.

def interpolate_value(start, end, fraction):
return start + (end - start) * fraction
## Function: interpolate_laps()

Input:
- track_df: dataframe with track data.
- lap_distance: metres per lap (default 1000)
Return:
- track_laps: dataframe with lap metrics. As many rows as laps identified.

def interpolate_laps(track_df , lap_distance = 1000):
#### 1. Initialise track_laps with the primary row of track_df
track_laps = track_df.loc[0][['latitude','longitude','elevation','date_time','distance_cum']].copy()

# Set distance_cum = 0
track_laps[['distance_cum']] = 0

# Transpose dataframe
track_laps = pd.DataFrame(track_laps)
track_laps = track_laps.transpose()

#### 2. Calculate number_of_laps = Total Distance / lap_distance
number_of_laps = track_df['distance_cum'].max()//lap_distance

#### 3. For every lap i from 1 to number_of_laps:
for i in range(1,int(number_of_laps+1),1):

# a. Calculate target_distance = i * lap_distance
target_distance = i*lap_distance

# b. Find first_crossing_index where track_df['distance_cum'] > target_distance
first_crossing_index = (track_df['distance_cum'] > target_distance).idxmax()

# c. If match is precisely the lap distance, copy that row
if (track_df.loc[first_crossing_index]['distance_cum'] == target_distance):
new_row = track_df.loc[first_crossing_index][['latitude','longitude','elevation','date_time','distance_cum']]

# Else: Create new_row with interpolated values, copy that row.
else:

fraction = (target_distance - track_df.loc[first_crossing_index-1, 'distance_cum']) / (track_df.loc[first_crossing_index, 'distance_cum'] - track_df.loc[first_crossing_index-1, 'distance_cum'])

# Create the brand new row
new_row = pd.Series({
'latitude': interpolate_value(track_df.loc[first_crossing_index-1, 'latitude'], track_df.loc[first_crossing_index, 'latitude'], fraction),
'longitude': interpolate_value(track_df.loc[first_crossing_index-1, 'longitude'], track_df.loc[first_crossing_index, 'longitude'], fraction),
'elevation': interpolate_value(track_df.loc[first_crossing_index-1, 'elevation'], track_df.loc[first_crossing_index, 'elevation'], fraction),
'date_time': track_df.loc[first_crossing_index-1, 'date_time'] + (track_df.loc[first_crossing_index, 'date_time'] - track_df.loc[first_crossing_index-1, 'date_time']) * fraction,
'distance_cum': target_distance
}, name=f'lap_{i}')

# d. Add the brand new row to the dataframe that stores the laps
new_row_df = pd.DataFrame(new_row)
new_row_df = new_row_df.transpose()

track_laps = pd.concat([track_laps,new_row_df])

#### 4. Convert date_time to datetime format and take away timezone
track_laps['date_time'] = pd.to_datetime(track_laps['date_time'], format='%Y-%m-%d %H:%M:%S.%f%z')
track_laps['date_time'] = track_laps['date_time'].dt.tz_localize(None)

#### 5. Calculate seconds_diff between consecutive rows in track_laps
track_laps['seconds_diff'] = track_laps['date_time'].diff()

return track_laps

Applying the interpolate function to the reference dataframe will generate the next dataframe:

juan_laps = interpolate_laps(juan_df , lap_distance=1000)
Dataframe with the lap metrics in consequence of interpolation. Image by Writer.

Note because it was a 10k race, 10 laps of 1000m has been identified (see column distance_cum). The column seconds_diff has the time per lap. The remaining of the columns (latitude, longitude, elevation and date_time) mark the position and time for every lap of the reference as the results of interpolation.

3. To calculate the time gaps between the reference and the opposite runners I created the function gap_to_reference()

## Helper Functions:
- get_seconds(): Convert timedelta to total seconds
- format_timedelta(): Format timedelta as a string (e.g., "+01:23" or "-00:45")
# Convert timedelta to total seconds
def get_seconds(td):
# Convert to total seconds
total_seconds = td.total_seconds()

return total_seconds

# Format timedelta as a string (e.g., "+01:23" or "-00:45")
def format_timedelta(td):
# Convert to total seconds
total_seconds = td.total_seconds()

# Determine sign
sign = '+' if total_seconds >= 0 else '-'

# Take absolute value for calculation
total_seconds = abs(total_seconds)

# Calculate minutes and remaining seconds
minutes = int(total_seconds // 60)
seconds = int(total_seconds % 60)

# Format the string
return f"{sign}{minutes:02d}:{seconds:02d}"

## Function: gap_to_reference()

Input:
- laps_dict: dictionary containing the df_laps for all of the runnners' names
- df_dict: dictionary containing the track_df for all of the runnners' names
- reference_name: name of the reference
Return:
- matches: processed data with time differences.


def gap_to_reference(laps_dict, df_dict, reference_name):
#### 1. Get the reference's lap data from laps_dict
matches = laps_dict[reference_name][['latitude','longitude','date_time','distance_cum']]

#### 2. For every racer (name) and their data (df) in df_dict:
for name, df in df_dict.items():

# If racer is the reference:
if name == reference_name:

# Set time difference to zero for all laps
for lap, row in matches.iterrows():
matches.loc[lap,f'seconds_to_reference_{reference_name}'] = 0

# If racer just isn't the reference:
if name != reference_name:

# a. For every lap find the closest point in racer's data based on lat, lon.
for lap, row in matches.iterrows():

# Step 1: set the position and lap distance from the reference
target_coordinates = matches.loc[lap][['latitude', 'longitude']].values
target_distance = matches.loc[lap]['distance_cum']

# Step 2: find the datapoint that might be within the centre of the window
first_crossing_index = (df_dict[name]['distance_cum'] > target_distance).idxmax()

# Step 3: select the 20 candidate datapoints to search for the match
window_size = 20
window_sample = df_dict[name].loc[first_crossing_index-(window_size//2):first_crossing_index+(window_size//2)]
candidates = window_sample[['latitude', 'longitude']].values

# Step 4: get the closest match using the coordinates
nn = NearestNeighbors(n_neighbors=1, metric='euclidean')
nn.fit(candidates)
distance, indice = nn.kneighbors([target_coordinates])

nearest_timestamp = window_sample.iloc[indice.flatten()]['date_time'].values
nearest_distance_cum = window_sample.iloc[indice.flatten()]['distance_cum'].values
euclidean_distance = distance

matches.loc[lap,f'nearest_timestamp_{name}'] = nearest_timestamp[0]
matches.loc[lap,f'nearest_distance_cum_{name}'] = nearest_distance_cum[0]
matches.loc[lap,f'euclidean_distance_{name}'] = euclidean_distance

# b. Calculate time difference between racer and reference at this point
matches[f'time_to_ref_{name}'] = matches[f'nearest_timestamp_{name}'] - matches['date_time']

# c. Store time difference and other relevant data
matches[f'time_to_ref_diff_{name}'] = matches[f'time_to_ref_{name}'].diff()
matches[f'time_to_ref_diff_{name}'] = matches[f'time_to_ref_diff_{name}'].fillna(pd.Timedelta(seconds=0))

# d. Format data using helper functions
matches[f'lap_difference_seconds_{name}'] = matches[f'time_to_ref_diff_{name}'].apply(get_seconds)
matches[f'lap_difference_formatted_{name}'] = matches[f'time_to_ref_diff_{name}'].apply(format_timedelta)

matches[f'seconds_to_reference_{name}'] = matches[f'time_to_ref_{name}'].apply(get_seconds)
matches[f'time_to_reference_formatted_{name}'] = matches[f'time_to_ref_{name}'].apply(format_timedelta)

#### 3. Return processed data with time differences
return matches

Below the code to implement the logic and store results on the dataframe matches_gap_to_reference:

# Lap distance
lap_distance = 1000

# Store the DataFrames in a dictionary
df_dict = {
'jimena': jimena_df,
'juan': juan_df,
'pedro': pedro_df,
}

# Store the Lap DataFrames in a dictionary
laps_dict = {
'jimena': interpolate_laps(jimena_df , lap_distance),
'juan': interpolate_laps(juan_df , lap_distance),
'pedro': interpolate_laps(pedro_df , lap_distance)
}

# Calculate gaps to reference
reference_name = 'juan'
matches_gap_to_reference = gap_to_reference(laps_dict, df_dict, reference_name)

The columns of the resulting dataframe contain the necessary information that might be displayed on the graphs:

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x