Clustering Eating Behaviors in Time: A Machine Learning Approach to Preventive Health

-

It’s well that  we eat matters — but what if  and  we eat matters just as much?

Within the midst of ongoing scientific debate around the advantages of intermittent fasting, this query becomes much more intriguing. As someone enthusiastic about machine learning and healthy living, I used to be inspired by a 2017 research paper[1] exploring this intersection. The authors introduced a novel distance metric called Modified Dynamic Time Warping (MDTW) — a way designed to account not just for the dietary content of meals but additionally their timing throughout the day.

Motivated by their work[1], I built a full implementation of MDTW from scratch using Python. I applied it to cluster simulated individuals into temporal dietary patterns, uncovering distinct behaviors like skippers, snackers, and night eaters.

While MDTW may sound like a distinct segment metric, it fills a critical gap in time-series comparison. Traditional distance measures — equivalent to Euclidean distance and even classical Dynamic Time Warping (DTW) — struggle when applied to dietary data. People don’t eat at fixed times or with consistent frequency. They skip meals, snack irregularly, or eat late at night.

MDTW is designed for exactly this sort of temporal misalignment and behavioral variability. By allowing flexible alignment while penalizing mismatches in each nutrient content and meal timing, MDTW reveals subtle but meaningful differences in how people eat.

What this text covers:

  1. Mathematical foundation of MDTW — explained intuitively.
  2. From formula to code — implementing MDTW in Python with dynamic programming.
  3. Generating synthetic dietary data to simulate real-world eating behavior.
  4. Constructing a distance matrix between individual eating records.
  5. Clustering individuals with K-Medoids and evaluating with silhouette and elbow methods.
  6. Visualizing clusters as scatter plots and joint distributions.
  7. Interpreting temporal patterns from clusters: who eats when and the way much?

Quick Note on Classical Dynamic Time Warping (DTW)

Dynamic Time Warping (DTW) is a classic algorithm used to measure similarity between two sequences which will vary in length or timing. It’s widely utilized in speech recognition, gesture evaluation, and time series alignment. Let’s see a quite simple example of the Sequence A is aligned to Sequence B (shifted version of B) with using traditional dynamic time warping algorithm using  library. As input, we give a distance metric as Euclidean. Also, we put time series to calculate the gap between these time series and optimized aligned path.

import numpy as np
import matplotlib.pyplot as plt
from fastdtw import fastdtw
from scipy.spatial.distance import euclidean
# Sample sequences (scalar values)
x = np.linspace(0, 3 * np.pi, 30)
y1 = np.sin(x)
y2 = np.sin(x+0.5)  # Shifted version
# Convert scalars to vectors (1D)
y1_vectors = [[v] for v in y1]
y2_vectors = [[v] for v in y2]
# Use absolute distance for scalars
distance, path = fastdtw(y1_vectors, y2_vectors, dist=euclidean)
#or for scalar 
# distance, path = fastdtw(y1, y2, dist=lambda x, y: np.abs(x-y))

distance, path = fastdtw(y1, y2,dist=lambda x, y: np.abs(x-y))
# Plot the alignment
plt.figure(figsize=(10, 4))
plt.plot(y1, label='Sequence A (slow)')
plt.plot(y2, label='Sequence B (shifted)')

# Draw alignment lines
for (i, j) in path:
    plt.plot([i, j], [y1[i], y2[j]], color='gray', linewidth=0.5)

plt.title(f'Dynamic Time Warping Alignment (Distance = {distance:.2f})')
plt.xlabel('Time Index')
plt.legend()
plt.tight_layout()
plt.savefig('dtw_alignment.png')
plt.show()
Illustration of the applying of dynamic time warping to 2 time series (Image by creator)

The trail returned by fastdtw (or any DTW algorithm) is a sequence of index pairs (i, j) that represent the optimal alignment between two time series. Each pair indicates that element A[i] is matched with B[j]. By summing the distances between all these matched pairs, the algorithm computes the optimized cumulative cost — the minimum total distance required to warp one sequence to the opposite.

Modified Dynamic Warping

The important thing challenge when applying dynamic time warping (DTW) to dietary data (vs. easy examples like sine waves or fixed-length sequences) lies within the complexity and variability of real-world eating behaviors. Some challenges and the proposed solution within the paper[1] as a response to every challenge are as follows:

  1. Irregular Time Steps: MDTW accounts for this by explicitly incorporating the time difference in the gap function.
  2. Multidimensional Nutrients: MDTW supports multidimensional vectors to represent nutrients equivalent to calories, fat etc. and uses a weight matrix to handle differing units and the importance of nutrients,
  3. Unequal variety of meals: MDTW allows for matching with empty eating events, penalizing skipped or unmatched meals appropriately.
  4. Time Sensitivity: MDTW includes a time difference penalty, weighting eating events far apart in time even when the nutrients are similar.

Eating Occasion Data Representation

In keeping with the modified dynamic time warping proposed within the paper[1], everyone’s food regimen could be considered a sequence of eating events, where each event has:

For example how eating records appear in real data, I created three synthetic dietary profiles only considering calorie consumption — Skipper, Night Eater, and Snacker. Let’s assume if we ingest the raw data from an API on this format:

skipper={
    'person_id': 'skipper_1',
    'records': [
        {'time': 12, 'nutrients': [300]},  # Skipped breakfast, large lunch
        {'time': 19, 'nutrients': [600]},  # Large dinner
    ]
}
night_eater={
    'person_id': 'night_eater_1',
    'records': [
        {'time': 9, 'nutrients': [150]},   # Light breakfast
        {'time': 14, 'nutrients': [250]},  # Small lunch
        {'time': 22, 'nutrients': [700]},  # Large late dinner
    ]
}
snacker=  {
    'person_id': 'snacker_1',
    'records': [
        {'time': 8, 'nutrients': [100]},   # Light morning snack
        {'time': 11, 'nutrients': [150]},  # Late morning snack
        {'time': 14, 'nutrients': [200]},  # Afternoon snack
        {'time': 17, 'nutrients': [100]},  # Early evening snack
        {'time': 21, 'nutrients': [200]},  # Night snack
    ]
}
raw_data = [skipper, night_eater, snacker]

As suggested within the paper, the dietary values must be normalized by the overall calorie consumptions.

import numpy as np
import matplotlib.pyplot as plt
def create_time_series_plot(data,save_path=None):
    plt.figure(figsize=(10, 5))
    for person,record in data.items():
        #in case the nutrient vector has multiple dimension
        data=[[time, float(np.mean(np.array(value)))] for time,value in record.items()]

        time = [item[0] for item in data]
        nutrient_values = [item[1] for item in data]
        # Plot the time series
        plt.plot(time, nutrient_values, label=person, marker='o')

    plt.title('Time Series Plot for Nutrient Data')
    plt.xlabel('Time')
    plt.ylabel('Normalized Nutrient Value')
    plt.legend()
    plt.grid(True)
    if save_path:
        plt.savefig(save_path)

def prepare_person(person):
    
    # Check if all nutrients have same length
    nutrients_lengths = [len(record['nutrients']) for record in person["records"]]
    
    if len(set(nutrients_lengths)) != 1:
        raise ValueError(f"Inconsistent nutrient vector lengths for person {person['person_id']}.")

    sorted_records = sorted(person["records"], key=lambda x: x['time'])

    nutrients = np.stack([np.array(record['nutrients']) for record in sorted_records])
    total_nutrients = np.sum(nutrients, axis=0)

    # Check to avoid division by zero
    if np.any(total_nutrients == 0):
        raise ValueError(f"Zero total nutrients for person {person['person_id']}.")

    normalized_nutrients = nutrients / total_nutrients

    # Return a dictionary {time: [normalized nutrients]}
    person_dict = {
        record['time']: normalized_nutrients[i].tolist()
        for i, record in enumerate(sorted_records)
    }

    return person_dict
prepared_data = {person['person_id']: prepare_person(person) for person in raw_data}
create_time_series_plot(prepared_data)
Plot of eating occasion of three different profiles (Image by creator)

Calculation Distance of Pairs

The computation of distance measure between pair of people are defined within the formula below. The primary term represent an Euclidean distance of nutrient vectors whereas the second takes under consideration the time penalty.

This formula is implemented within the local_distance function with the suggested values:

import numpy as np

def local_distance(eo_i, eo_j,delta=23, beta=1, alpha=2):
    """
    Calculate the local distance between two events.
    Args:
        eo_i (tuple): Event i (time, nutrients).
        eo_j (tuple): Event j (time, nutrients).
        delta (float): Time scaling factor.
        beta (float): Weighting factor for time difference.
        alpha (float): Exponent for time difference scaling.
    Returns:
        float: Local distance.
    """
    ti, vi = eo_i
    tj, vj = eo_j
   
    vi = np.array(vi)
    vj = np.array(vj)

    if vi.shape != vj.shape:
        raise ValueError("Mismatch in feature dimensions.")
    if np.any(vi < 0) or np.any(vj < 0):
        raise ValueError("Nutrient values must be non-negative.")
    if np.any(vi>1 ) or np.any(vj>1):
        raise ValueError("Nutrient values should be within the range [0, 1].")   
    W = np.eye(len(vi))  # Assume W = identity for now
    value_diff = (vi - vj).T @ W @ (vi - vj) 
    time_diff = (np.abs(ti - tj) / delta) ** alpha
    scale = 2 * beta * (vi.T @ W @ vj)
    distance = value_diff + scale * time_diff
  
    return distance

We construct a neighborhood distance matrix (,) for every pair of people being compared. The variety of rows and columns on this matrix corresponds to the variety of eating occasions for every individual.

Once the local distance matrix deo(i,j) is constructed — capturing the pairwise distances between all eating occasions of two individuals — the following step is to compute the global cost matrix dER(i,j). This matrix accumulates the minimal alignment cost by considering three possible transitions at each step: matching two eating occasions, skipping an occasion in the primary record (aligning to an empty), or skipping an occasion within the second record.

To compute the overall distance between two sequences of eating occasions, we construct:

A local distance matrix deo filled using local_distance.

  • A global cost matrix dER using dynamic programming, minimizing over:
  • Match
  • Skip in the primary sequence (align to empty)
  • Skip within the second sequence

These directly implement the reoccurrence:

import numpy as np

def mdtw_distance(ER1, ER2, delta=23, beta=1, alpha=2):
    """
    Calculate the modified DTW distance between two sequences of events.
    Args:
        ER1 (list): First sequence of events (time, nutrients).
        ER2 (list): Second sequence of events (time, nutrients).
        delta (float): Time scaling factor.
        beta (float): Weighting factor for time difference.
        alpha (float): Exponent for time difference scaling.
    
    Returns:
        float: Modified DTW distance.
    """
    m1 = len(ER1)
    m2 = len(ER2)
   
    # Local distance matrix including matching with empty
    deo = np.zeros((m1 + 1, m2 + 1))

    for i in range(m1 + 1):
        for j in range(m2 + 1):
            if i == 0 and j == 0:
                deo[i, j] = 0
            elif i == 0:
                tj, vj = ER2[j-1]
                deo[i, j] = np.dot(vj, vj)  
            elif j == 0:
                ti, vi = ER1[i-1]
                deo[i, j] = np.dot(vi, vi)
            else:
                deo[i, j]=local_distance(ER1[i-1], ER2[j-1], delta, beta, alpha)

    # # Global cost matrix
    dER = np.zeros((m1 + 1, m2 + 1))
    dER[0, 0] = 0

    for i in range(1, m1 + 1):
        dER[i, 0] = dER[i-1, 0] + deo[i, 0]
    for j in range(1, m2 + 1):
        dER[0, j] = dER[0, j-1] + deo[0, j]

    for i in range(1, m1 + 1):
        for j in range(1, m2 + 1):
            dER[i, j] = min(
                dER[i-1, j-1] + deo[i, j],   # Match i and j
                dER[i-1, j] + deo[i, 0],     # Match i to empty
                dER[i, j-1] + deo[0, j]      # Match j to empty
            )
   
    
    return dER[m1, m2]  # Return the ultimate cost

ERA = list(prepared_data['skipper_1'].items())
ERB = list(prepared_data['night_eater_1'].items())
distance = mdtw_distance(ERA, ERB)
print(f"Distance between skipper_1 and night_eater_1: {distance}")

From Pairwise Comparisons to a Distance Matrix

Once we define how you can calculate the gap between two individuals’ eating patterns using MDTW, the following natural step is to compute distances across the entire dataset. To do that, we construct a distance matrix where each entry (i,j) represents the MDTW distance between person i and person j.

That is implemented within the function below:

import numpy as np

def calculate_distance_matrix(prepared_data):
    """
    Calculate the gap matrix for the prepared data.
    
    Args:
        prepared_data (dict): Dictionary containing prepared data for everyone.
        
    Returns:
        np.ndarray: Distance matrix.
    """
    n = len(prepared_data)
    distance_matrix = np.zeros((n, n))
    
    # Compute pairwise distances
    for i, (id1, records1) in enumerate(prepared_data.items()):
        for j, (id2, records2) in enumerate(prepared_data.items()):
            if i < j:  # Only upper triangle
                print(f"Calculating distance between {id1} and {id2}")
                ER1 = list(records1.items())
                ER2 = list(records2.items())
                
                distance_matrix[i, j] = mdtw_distance(ER1, ER2)
                distance_matrix[j, i] = distance_matrix[i, j]  # Symmetric matrix
                
    return distance_matrix
def plot_heatmap(matrix,people_ids,save_path=None):
    """
    Plot a heatmap of the gap matrix.  
    Args:
        matrix (np.ndarray): The space matrix.
        title (str): The title of the plot.
        save_path (str): Path to avoid wasting the plot. If None, the plot won't be saved.
    """
    plt.figure(figsize=(8, 6))
    plt.imshow(matrix, cmap='hot', interpolation='nearest')
    plt.colorbar()
  
    plt.xticks(ticks=range(len(matrix)), labels=people_ids)
    plt.yticks(ticks=range(len(matrix)), labels=people_ids)
    plt.xticks(rotation=45)
    plt.yticks(rotation=45)
    if save_path:
        plt.savefig(save_path)
    plt.title('Distance Matrix Heatmap')

distance_matrix = calculate_distance_matrix(prepared_data)
plot_heatmap(distance_matrix, list(prepared_data.keys()), save_path='distance_matrix.png')

After computing the pairwise Modified Dynamic Time Warping (MDTW) distances, we will visualize the similarities and differences between individuals’ dietary patterns using a heatmap. Each cell (i,j) within the matrix represents the MDTW distance between person i and person j— lower values indicate more similar temporal eating profiles.

This heatmap offers a compact and interpretable view of dietary dissimilarities, making it easier to discover clusters of comparable eating behaviors.

This means that skipper_1 shares more similarity with night_eater_1 than with snacker_1. The rationale is that each skipper and night eater have fewer, larger meals concentrated later within the day, while the snacker distributes smaller meals more evenly across your entire timeline.

Distance Matrix Heatmap (Image by creator)

Clustering Temporal Dietary Patterns

After calculating the pairwise distances using Modified Dynamic Time Warping (MDTW), we’re left with a distance matrix that reflects how dissimilar each individual’s eating pattern is from the others. But this matrix alone doesn’t tell us much at a look — to disclose structure in the information, we want to go one step further.

Before applying any Clustering Algorithm, we first need a dataset that reflects realistic dietary behaviors. Since access to large-scale dietary intake datasets could be limited or subject to usage restrictions, I generated synthetic eating event records that simulate diverse each day patterns. Each record represents an individual’s calorie intake at specific hours throughout a 24-hour period.

import numpy as np

def generate_synthetic_data(num_people=5, min_meals=1, max_meals=5,min_calories=200,max_calories=800):
    """
    Generate synthetic data for a given number of individuals.
    Args:
        num_people (int): Number of individuals to generate data for.
        min_meals (int): Minimum variety of meals per person.
        max_meals (int): Maximum variety of meals per person.
        min_calories (int): Minimum calories per meal.
        max_calories (int): Maximum calories per meal.
    Returns:
        list: List of dictionaries containing synthetic data for everyone.
    """
    data = []
    np.random.seed(42)  # For reproducibility
    for person_id in range(1, num_people + 1):
        num_meals = np.random.randint(min_meals, max_meals + 1)  # random variety of meals between min and max
        meal_times = np.sort(np.random.selection(range(24), num_meals, replace=False))  # random times sorted

        raw_calories = np.random.randint(min_calories, max_calories, size=num_meals)  # random calories between min and max

        person_record = {
            'person_id': f'person_{person_id}',
            'records': [
                {'time': float(time), 'nutrients': [float(cal)]} for time, cal in zip(meal_times, raw_calories)
            ]
        }

        data.append(person_record)
    return data

raw_data=generate_synthetic_data(num_people=1000, min_meals=1, max_meals=5,min_calories=200,max_calories=800)
prepared_data = {person['person_id']: prepare_person(person) for person in raw_data}
distance_matrix = calculate_distance_matrix(prepared_data)

Selecting the Optimal Variety of Clusters

To find out the suitable variety of clusters for grouping dietary patterns, I evaluated two popular methods: the Elbow Method and the Silhouette Rating.

  • The Elbow Method analyzes the clustering cost (inertia) because the variety of clusters increases. As shown within the plot, the associated fee decreases sharply as much as 4 clusters, after which the speed of improvement slows significantly. This “elbow” suggests diminishing returns beyond 4 clusters.
  • The Silhouette Rating, which measures how well each object lies inside its cluster, showed a comparatively high rating at 4 clusters (≈0.50), even when it wasn’t absolutely the peak.
Optimal variety of cluster (Image by creator)

The next code computes the clustering cost and silhouette scores for various values of (variety of clusters), using the K-Medoids algorithm and a precomputed distance matrix derived from the MDTW metric:

from sklearn.metrics import silhouette_score
from sklearn_extra.cluster import KMedoids
import matplotlib.pyplot as plt

costs = []
silhouette_scores = []
for k in range(2, 10):
    model = KMedoids(n_clusters=k, metric='precomputed', random_state=42)
    labels = model.fit_predict(distance_matrix)
    costs.append(model.inertia_)
    rating = silhouette_score(distance_matrix, model.labels_, metric='precomputed')
    silhouette_scores.append(rating)

# Plot
ks = list(range(2, 10))
fig, ax1 = plt.subplots(figsize=(8, 5))

color1 = 'tab:blue'
ax1.set_xlabel('Variety of Clusters (k)')
ax1.set_ylabel('Cost (Inertia)', color=color1)
ax1.plot(ks, costs, marker='o', color=color1, label='Cost')
ax1.tick_params(axis='y', labelcolor=color1)

# Create a second y-axis that shares the identical x-axis
ax2 = ax1.twinx()
color2 = 'tab:red'
ax2.set_ylabel('Silhouette Rating', color=color2)
ax2.plot(ks, silhouette_scores, marker='s', color=color2, label='Silhouette Rating')
ax2.tick_params(axis='y', labelcolor=color2)

# Optional: mix legends
lines1, labels1 = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax1.legend(lines1 + lines2, labels1 + labels2, loc='upper right')
ax1.vlines(x=4, ymin=min(costs), ymax=max(costs), color='gray', linestyle='--', linewidth=0.5)

plt.title('Cost and Silhouette Rating vs Variety of Clusters')
plt.tight_layout()
plt.savefig('clustering_metrics_comparison.png')
plt.show()

Interpreting the Clustered Dietary Patterns

Once the optimal variety of clusters (k=4) was determined, each individual within the dataset was assigned to one in every of these clusters using the K-Medoids model. Now, we want to grasp what characterizes each cluster.

To achieve this, I followed the approach suggested in the unique MDTW paper [1]: analyzing the largest eating occasion for each individual, defined by each the time of day it occurred and the fraction of total each day intake it represented. This provides insight into people eat essentially the most calories and they eat during that peak occasion.

# Kmedoids clustering with the optimal variety of clusters
from sklearn_extra.cluster import KMedoids
import seaborn as sns
import pandas as pd

k=4
model = KMedoids(n_clusters=k, metric='precomputed', random_state=42)
labels = model.fit_predict(distance_matrix)

# Find the time and fraction of their largest eating occasion
def get_largest_event(record):
    total = sum(v[0] for v in record.values())
    largest_time, largest_value = max(record.items(), key=lambda x: x[1][0])
    fractional_value = largest_value[0] / total if total > 0 else 0
    return largest_time, fractional_value

# Create a largest meal data per cluster
data_per_cluster = {i: [] for i in range(k)}
for i, person_id in enumerate(prepared_data.keys()):
    cluster_id = labels[i]
    t, v = get_largest_event(prepared_data[person_id])
    data_per_cluster[cluster_id].append((t, v))

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# Convert to pandas DataFrame
rows = []
for cluster_id, values in data_per_cluster.items():
    for hour, fraction in values:
        rows.append({"Hour": hour, "Fraction": fraction, "Cluster": f"Cluster {cluster_id}"})
df = pd.DataFrame(rows)
plt.figure(figsize=(10, 6))
sns.scatterplot(data=df, x="Hour", y="Fraction", hue="Cluster", palette="tab10")
plt.title("Eating Events Across Clusters")
plt.xlabel("Hour of Day")
plt.ylabel("Fraction of Day by day Intake (largest meal)")
plt.grid(True)
plt.tight_layout()
plt.show()
Each point represents a person’s largest eating event (Image by creator)

While the scatter plot offers a broad overview, a more detailed understanding of every cluster’s eating behavior could be gained by examining their joint distributions.
By plotting the joint histogram of the hour and fraction of each day intake for the biggest meal, we will discover characteristic patterns, using the code below:

# Plot each cluster using seaborn.jointplot
for cluster_label in df['Cluster'].unique():
    cluster_data = df[df['Cluster'] == cluster_label]
    g = sns.jointplot(
        data=cluster_data,
        x="Hour",
        y="Fraction",
        kind="scatter",
        height=6,
        color=sns.color_palette("deep")[int(cluster_label.split()[-1])]
    )
    g.fig.suptitle(cluster_label, fontsize=14)
    g.set_axis_labels("Hour of Day", "Fraction of Day by day Intake (largest meal)", fontsize=12)
    g.fig.tight_layout()
    g.fig.subplots_adjust(top=0.95)  # adjust title spacing
    plt.show()
Each subplot represents the joint distribution of time (x-axis) and fractional calorie intake (y-axis) for people inside a cluster. Higher densities indicate common timings and portion sizes of the biggest meals. (Image by creator)

To know how individuals were distributed across clusters, I visualized the number of individuals assigned to every cluster. The bar plot below shows the frequency of people grouped by their temporal dietary pattern. This helps assess whether certain eating behaviors — equivalent to skipping meals, late-night eating, or frequent snacking — are more prevalent within the population.

Histogram showing the number of people assigned to every dietary pattern cluster (Image by creator)

Based on the joint distribution plots, distinct temporal dietary behaviors emerge across clusters:

Cluster 0 (Flexible or Irregular Eater) reveals a broad dispersion of the biggest eating occasions across each the 24-hour day and the fraction of each day caloric intake.

Cluster 1 (Frequent Light Eaters) displays a more evenly distributed eating pattern, where no single eating occasion exceeds 30% of the overall each day intake, reflecting frequent but smaller meals throughout the day. That is the cluster that almost definitely represents “normal eaters” — those that eat three relatively balanced meals spread throughout the day. That is due to low variance in timing and fraction per eating event.

Cluster 2 (Early Heavy Eaters) is defined by a really distinct and consistent pattern: individuals on this group eat almost their entire each day caloric intake (near 100%) in a single meal, predominantly throughout the early hours of the day (midnight to noon).

Cluster 3 (Late Night Heavy Eaters) is characterised by individuals who eat nearly all of their each day calories in a single meal throughout the late evening or night hours (between 6 PM and midnight). Like Cluster 2, this group exhibits a unimodal eating pattern with a very high fractional intake (~1.0), indicating that the majority members eat once per day, but unlike Cluster 2, their eating window is significantly delayed.

CONCLUSION

On this project, I explored how Modified Dynamic Time Warping (MDTW) may help uncover temporal dietary patterns — focusing not only on what we eat, but when and how much. Using synthetic data to simulate realistic eating behaviors, I demonstrated how MDTW can cluster individuals into distinct profiles like irregular or flexible eaters, frequent light eaters, early heavy eaters and later night eaters based on the timing and magnitude of their meals.

While the outcomes show that MDTW combined with K-Medoids can reveal meaningful patterns in eating behaviors, this approach isn’t without its challenges. Because the dataset was synthetically generated and clustering was based on a single initialization, there are several caveats price noting:

  • The clusters appear messy, possibly since the synthetic data lacks strong, naturally separable patterns — especially if meal times and calorie distributions are too uniform.
  • Some clusters overlap significantly, particularly Cluster 0 and Cluster 1, making it harder to differentiate between truly different behaviors.
  • Without labeled data or expected ground truth, evaluating cluster quality is difficult. A possible improvement can be to inject known patterns into the dataset to check whether the clustering algorithm can reliably recuperate them.

Despite these limitations, this work shows how a nuanced distance metric — designed for irregular, real-life patterns — can surface insights traditional tools may overlook. The methodology could be prolonged to personalized health monitoring, or any domain where when things occur matters just as much as what happens.

I’d love to listen to your thoughts on this project — whether it’s feedback, questions, or ideas for where MDTW might be applied next. This could be very much a piece in progress, and I’m all the time excited to learn from others.

For those who found this handy, have ideas for improvements, or wish to collaborate, be happy to open a problem or send a Pull Request on GitHub. Contributions are greater than welcome!

Thanks a lot for reading all of the strategy to the tip — it really means so much.

Code on GitHub : https://github.com/YagmurGULEC/mdtw-time-series-clustering

REFERENCES

[1] Khanna, Nitin, et al. “Modified dynamic time warping (MDTW) for estimating temporal dietary patterns.” . IEEE, 2017.

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x