Stop Retraining Blindly: Use PSI to Construct a Smarter Monitoring Pipeline

, cleaned the information, made a number of transformations, modeled it, after which deployed your model to be utilized by the client.

That’s a whole lot of work for an information scientist. However the job will not be accomplished once the model hits the .

All the things looks perfect in your dashboard. But under the hood, something’s incorrect. Most models don’t fail loudly. They don’t “crash” like a buggy app. As a substitute, they simply… drift.

Remember, you continue to need to observe it to make sure the results are accurate.

Considered one of the best ways to try this is by checking if the data is drifting.

In other words, you’ll measure if the distribution of the hitting your model is analogous to the distribution of the information used to coach it.

Why Models Don’t Scream

Once you deploy a model, you’re betting that the longer term looks just like the past. You expect that the brand new data could have similar patterns in comparison to the information used to coach it.

Let’s take into consideration that for a minute: if I trained my model to acknowledge apples and oranges, what would occur if suddenly all my model receives are pineapples?

Yes, the real-world data is messy. User behavior changes. Economic shifts occur. Even a small change in your data pipeline can mess things up.

Should you wait for metrics like accuracy or RMSE to drop, you’re already behind. Why? Because labels often take weeks or months to reach. You would like a option to catch trouble before the damage is finished.

PSI: The Data Smoke Detector

The Population Stability Index (PSI) is a classic tool. It was born within the credit risk world to observe loan models.

Population stability index (PSI) is a statistical measure with a basis in information theory that quantifies the difference between one probability distribution from a reference probability distribution.

[1]

It doesn’t care about your model’s accuracy. It only cares about one thing: Is the information coming in today different from the information used during training?

This metric is a option to quantify how much “mass” moved between buckets. In case your training data had 10% of users in a certain age group, but production has 30%, PSI will flag it.

Interpret it: What the Numbers are Telling You

We normally follow these rule-of-thumb thresholds:

PSI < 0.10: All the things is effective. Your data is stable.
0.10 ≤ PSI < 0.25: Something’s changing. You must probably investigate.
PSI ≥ 0.25: Major shift. Your model could be making bad guesses.

Code

The Python script on this exercise will perform the next steps.

Break the information into “buckets” (quantiles).
It calculates the proportion of knowledge in each bucket for each your training set and your production set.
The formula then compares these percentages. In the event that they’re nearly similar, the PSI stays near zero. The more they diverge, the upper the rating climbs.

Here is the code for the PSI calculation function.

def psi(ref, recent, bins=10):
    
    # Data to array
    ref, recent = np.array(ref), np.array(recent)
    
    # Generate 10 equal buckets between 0% and 100%
    quantiles = np.linspace(0, 1, bins + 1)
    breakpoints = np.quantile(ref, quantiles)
    
    # Counting the variety of samples in each bucket
    ref_counts = np.histogram(ref, breakpoints)[0]
    new_counts = np.histogram(recent, breakpoints)[0]
    
    # Calculating the proportion
    ref_pct = ref_counts / len(ref)
    new_pct = new_counts / len(recent)
    
    # If any bucket is zero, add a really small number
    # to stop division by zero
    ref_pct = np.where(ref_pct == 0, 1e-6, ref_pct)
    new_pct = np.where(new_pct == 0, 1e-6, new_pct)
    
    # Calculate PSI and return
    return np.sum((ref_pct - new_pct) * np.log(ref_pct / new_pct))

It’s fast, low-cost, and doesn’t require “true” labels to work, meaning that you just don’t must wait a number of weeks to have enough predictions to calculate metrics reminiscent of RMSE. That’s why it’s a production favorite.

PSI checks in case your model’s current data has modified an excessive amount of in comparison with the information used to construct it. Comparing today’s data to a baseline, it helps ensure your model stays stable and reliable.

Where PSI Shines

PSI is great since it’s easy to automate
You’ll be able to run it day by day on every feature.

Where It Doesn’t

It might probably be sensitive to how you select your buckets.
It doesn’t let you know the information modified, only that it did.
It looks at features one after the other.
It would miss subtle interactions between multiple variables.

How Pro Teams Use It

Mature teams don’t just take a look at a single PSI value. They track the trend over time.

A single spike could be a glitch. A gentle upward crawl is an indication that it’s time to retrain your model. Pair PSI with other metrics like a summary stats (mean, variance) for a full picture.

Let’s quickly take a look at this toy example of knowledge that drifted. First, we generate some random data.

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression

# 1. Generate Reference Data
# np.random.seed(42)
X,y = make_regression(n_samples=1000, n_features=3, noise=5, random_state=42)
df = pd.DataFrame(X, columns= ['var1', 'var2', 'var3'])
df['y'] = y

# Separate X and y
X_ref, y_ref = df.drop('y', axis=1), df.y

# View data head
df.head()

Reference data generated for a regression model. Image by the creator.

Then, we train the model.

# 2. Train Regression Model
model = LinearRegression().fit(X_ref, y_ref)

Now, let’s generate some drifted data.

# Generate the Drift Data
X,y = make_regression(n_samples=500, n_features=3, noise=5, random_state=42)
df2 = pd.DataFrame(X, columns= ['var1', 'var2', 'var3'])
df2['y'] = y

# Add the drift
df2['var1'] = 5 + 1.5 * X_ref.var1 + np.random.normal(0, 5, 1000)

# Separate X and y
X_new, y_new = df2.drop('y', axis=1), df2.y

# View
df2.head()

Next, we will use our function to calculate the PSI. You must notice the massive variance in PSI for variable 1.

# 4. Calculate PSI for the drifted feature
for v in df.columns[:-1]:
  psi_value= psi(X_ref[v], X_new[v])
  print(f"PSI Rating for Feature {v}: {psi_value:.4f}")

PSI Rating for Feature var1: 2.3016
PSI Rating for Feature var2: 0.0546
PSI Rating for Feature var3: 0.1078

And, finally, allow us to check the impact it has on the estimated y.

# 5. Generate Estimates to see the impact
preds_ref = model.predict(X_ref[:5])
preds_drift = model.predict(X_new[:5])

print("nSample Predictions (Reference vs Drifted):")
print(f"Ref Preds: {preds_ref.round(2)}")
print(f"Drift Preds: {preds_drift.round(2)}")

Sample Predictions (Reference vs Drifted):
Ref Preds: [-104.22  -57.58  -32.69  -18.24   24.13]
Drift Preds: [ 508.33  621.61 -241.88   13.19  433.27]

We also can visualize the differences by variable. We create a straightforward function to plot the histograms overlaid.

def drift_plot(ref, recent):
    fig = plt.hist(ref)
    fig = plt.hist(recent, color='r', alpha=.5);
    
    return plt.show(fig)

# Calculate PSI for the drifted feature
for v in df.columns[:-1]:
  psi_value= psi(X_ref[v], X_new[v])
  print(f"PSI Rating for Feature {v}: {psi_value:.4f}")
  drift_plot(X_ref[v], X_new[v])

Listed below are the outcomes.

Data drift for the three variables. Image by the creator.

The difference is large for variable 1!

Before You Go

We saw how easy it’s to calculate PSI, and the way it might probably show us where the drift is occurring. We quickly identified var1 as our problematic variable. Monitoring your model without monitoring your data is a large blind spot.

We’ve got to be certain that the identical data distribution identified when the model was trained remains to be valid, so the model can keep using the pattern from the reference data to estimate over recent data.

Production ML is less about constructing the “perfect” model and more about maintaining alignment with reality.

The perfect models don’t just predict well. They know when the world has modified.

Should you liked this content, find me on my website.
https://gustavorsantos.me

GitHub Repository

The code for this exercise.

https://github.com/gurezende/Studying/blob/master/Python/statistics/data_drift/Data_Drift.ipynb

References

[1. PSI Definition] https://arize.com/blog-course/population-stability-index-psi/

[2. Numpy Histogram] https://numpy.org/doc/2.2/reference/generated/numpy.histogram.html

[3. Numpy Linspace] https://numpy.org/devdocs/reference/generated/numpy.linspace.html

[4. Numpy Where] https://numpy.org/devdocs/reference/generated/numpy.where.html

[5. Make Regression data] https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html

Stop Retraining Blindly: Use PSI to Construct a Smarter Monitoring Pipeline

Why Models Don’t Scream

PSI: The Data Smoke Detector

Interpret it: What the Numbers are Telling You

Code

How Pro Teams Use It

Before You Go

GitHub Repository

References

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Easily Construct High quality-Tuning and Evaluation Datasets on the Hub — No Code Required

8 areas with research breakthroughs in 2025

How Agents Plan Tasks with To-Do Lists

A Guardrail for Safety and Adversarial Robustness in Modern LLM Systems

How social media encourages the worst of AI boosterism

Stop Retraining Blindly: Use PSI to Construct a Smarter Monitoring Pipeline

Why Models Don’t Scream

PSI: The Data Smoke Detector

Interpret it: What the Numbers are Telling You

Code

How Pro Teams Use It

Before You Go

GitHub Repository

References

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.