Keeping Probabilities Honest: The Jacobian Adjustment

Introduction

customer annoyance from wait times. Calls arrive randomly, so wait time X follows an Exponential distribution—most waits are short, just a few are painfully long.

Now I’d argue that annoyance isn’t linear: a 10-minute wait feels greater than twice as bad as a 5-minute one. So you select to model “annoyance units” as (Y = X²).

Easy, right? Just take the pdf of X, replace x with (sqrt{y}), and also you’re done.

You plot it. It looks reasonable—peaked near zero, long tail.

But what in case you actually computed the CDF? You’ll expect 1 right?

The result? 2.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import expon

# CDF of Exponential(1): F(x) = 1 - exp(-x) for x >= 0
def cdf_exp(x):
    return 1 - np.exp(-x)

# Improper (naive) pdf for Y = X²: just substitute x = sqrt(y)
def wrong_pdf(y):
    return np.exp(-np.sqrt(y))  # This integrates to 2!

# Quick numerical check of integral
from scipy.integrate import quad
integral, err = quad(wrong_pdf, 0, np.inf)
print(f"Numerical integral ≈ {integral:.3f} (ought to be 1, but it surely's 2)")

# prints 2

Your recent distribution claims every possible consequence is twice as likely accurately.

That’s not possible… but it surely happened since you missed one small adjustment.

This “adjustment” is the Jacobian—a scaling factor that compensates for the way the transformation stretches or compresses the axis at different points. Skip it, and your probabilities lie. Include it, and every little thing adds up perfectly again.

On this post, we’ll construct the intuition, derive the mathematics step-by-step, see it appear naturally in histogram equalization, visualize the stretching/shrinking empirically, and prove it with simulations.

The Intuition

To understand why the Jacobian adjustment is obligatory, let’s use a tangible analogy: consider a probability distribution as a set amount of sand—exactly 1 pound—spread along a number line, where the peak of the sand pile at each point represents the probability density. The full sand at all times adds as much as 1, representing 100% probability.

Now, whenever you transform the random variable (say, from X to Y = X²), it’s like grabbing that number line—a versatile rubber sheet—and warping it in keeping with the transformation. You’re not adding or removing sand; you’re just stretching or compressing different parts of the sheet.

In regions where the transformation compresses the sheet (an extended stretch of the unique line gets squished right into a shorter segment on the brand new Y-axis), the identical amount of sand now occupies less horizontal space. To maintain the full sand conserved, the pile must get taller—the density increases. For instance, near Y=0 within the squaring transformation, many small X values (from 0 to 1) get crammed right into a tiny Y interval (0 to 1), so the density shoots up dramatically.

Conversely, in regions where the transformation stretches the sheet (a brief segment of the unique line gets pulled into an extended one on the Y-axis), the sand spreads out over extra space, making the pile shorter and flatter—the density decreases. For big X (say, from 10 to 11), Y stretches from 100 to 121—a much wider interval—so the density thins on the market.

The important thing point: the full sand stays exactly 1 lb, regardless of the way you warp the sheet. Without accounting for this local stretching and shrinking, your recent density can be inconsistent, like claiming you’ve got 2 lb of sand after the warp. The Jacobian is the mathematical factor that mechanically adjusts the peak all over the place to preserve the full amount.

The Math

Let’s formalize the intuition with the instance of ( Y = g(X) = X^2 ), where ( X ) has pdf ( f_X(x) = e^{-x} ) for ( x geq 0 ) (Exponential with rate 1).

Consider a small interval around ( x ) with width ( Delta x ).
The probability in that interval is roughly ( f_X(x) Delta x ).

After transformation, this maps to an interval around ( y = x^2 ) with width
( Delta y approx left| g'(x) right| Delta x = |2x| Delta x ).

To conserve probability:
$$ f_Y(y) Delta y approx f_X(x) Delta x, $$
so
$$ f_Y(y) approx frac{f_X(x)} g'(x) right $$

Within the limit as ( Delta x to 0 ), this becomes exact:
$$ f_Y(y) = f_X(x) left| frac{dx}{dy} right|, $$
where ( x = sqrt{y} ) (the inverse) and
( frac{dx}{dy} = frac{1}{2sqrt{y}} ).

Plugging in:
$$ f_Y(y) = e^{-sqrt{y}} cdot frac{1}{2sqrt{y}} quad text{for } y > 0. $$

Without the Jacobian term ( frac{1}{2sqrt{y}} ), the naive
( f_Y(y) = e^{-sqrt{y}} )
integrates to 2:

Let ( u = sqrt{y} ), ( y = u^2 ), ( dy = 2u , du ):
$$ int_0^infty e^{-sqrt{y}} , dy $$

$$ = int_0^infty e^{-u}cdot 2u , du $$

$$= 2 int_0^infty u e^{-u} , du $$

$$ = 2 Gamma(2) = 2 cdot 1 = 2. $$

The Jacobian adjustment ensures $$ int_0^infty f_Y(y) , dy = 1. $$

A note on (Gamma)

(Gamma) is the representation of factorial for real numbers. $$Gamma(n) = (n-1)! quad text{for positive integers } n$$

This scaling factor ( left| frac{dx}{dy} right| ) is precisely what compensates for the local stretching and shrinking of the axis.

The General Form

Let Y = g(X), where g is a strictly monotonic (increasing or decreasing) differentiable function, and X has pdf ( f_X(x) ).

We would like the pdf ( f_Y(y) ) of Y.

Consider a small interval around x with width ( Delta x ).
The probability in that interval is roughly ( f_X(x) Delta x ).

After transformation y = g(x), this interval maps to an interval around y with width
( Delta y approx left| g'(x) right| Delta x ).

Going back to the equation we developed previously:
$$ f_Y(y) = f_X(x) left| frac{dx}{dy} right|, $$
where we use the inverse relation (x = h(y) = g^{-1}(y) ), and
( frac{dx}{dy} = h'(y) = frac{1}{g'(x)} ).

Thus the overall formula is
$$ f_Y(y) = f_X(h(y)) left| h'(y) right|. $$

Emperical Proof

Simulating the stretching and shrinking

The most effective method to “feel” the stretching and shrinking is to zoom in on two regions individually: near zero (where compression happens) and farther out (where stretching dominates).

We’ll generate 4 plots:

1. Original X histogram, zoomed on small values (X < 1), with equal small intervals of width 0.1 — to indicate the source of compression.

2. Corresponding Y = X² histogram, zoomed near zero — showing how those tiny X intervals get even tinier on Y (shrink).

3. Original X histogram for larger values (X > 1), with equal intervals of width 1 — to indicate the source of stretching.

4. Corresponding Y histogram for giant values — showing how those X intervals explode into huge Y intervals (stretch).

import numpy as np
import matplotlib.pyplot as plt

# Generate large sample for clear visuals
n = 50000
x = np.random.exponential(scale=1, size=n)
y = x**2

fig = plt.figure(figsize=(16, 10))

def plot_histogram(ax, data, bins, density, color, alpha, title, xlabel, ylabel):
    ax.hist(data, bins=bins, density=density, color=color, alpha=alpha)
    ax.set_title(title)
    ax.set_xlabel(xlabel)
    ax.set_ylabel(ylabel)

# Plot 1: X small values (compression source)
ax1 = fig.add_subplot(2, 2, 1)
plot_histogram(ax1, x[x < 1], bins=50, density=True, color='skyblue', alpha=0.7,
               title='Original X (zoomed X < 1)', xlabel='X', ylabel='Density')

# Equal-width intervals of 0.1 on small X
small_x_lines = np.arange(0, 1.01, 0.1)
for line in small_x_lines:
    ax1.axvline(line, color='red', linestyle='--', alpha=0.8)

# Plot 2: Y near zero (showing shrink/compression)
ax2 = fig.add_subplot(2, 2, 2)
plot_histogram(ax2, y[y < 1], bins=100, density=True, color='lightcoral', alpha=0.7,
               title='Y = X² near zero (compression visible)', xlabel='Y', ylabel='Density')

# Mapped small intervals on Y (very narrow!)
small_y_lines = small_x_lines**2
for line in small_y_lines:
    ax2.axvline(line, color='red', linestyle='--', alpha=0.8)

# Plot 3: X larger values (stretching source)
ax3 = fig.add_subplot(2, 2, 3)
plot_histogram(ax3, x[(x > 1) & (x < 12)], bins=50, density=True, color='skyblue', alpha=0.7,
               title='Original X (X > 1)', xlabel='X', ylabel='Density')

# Equal-width intervals of 1 on larger X
large_x_starts = [1, 3, 5, 7, 9, 11]
large_x_lines = large_x_starts + [s + 1 for s in large_x_starts]
for line in large_x_lines:
    if line < 12:
        ax3.axvline(line, color='red', linestyle='--', alpha=0.8)

# Plot 4: Y large values (showing stretch)
ax4 = fig.add_subplot(2, 2, 4)
plot_histogram(ax4, y[(y > 1) & (y < 150)], bins=80, density=True, color='lightgreen', alpha=0.7,
               title='Y = X² large values (stretching visible)', xlabel='Y', ylabel='Density')

# Mapped large intervals on Y (huge gaps!)
large_y_lines = np.array(large_x_lines)**2
for line in large_y_lines:
    if line < 150:
        ax4.axvline(line, color='red', linestyle='--', alpha=0.8)

# Update annotation styles with modified font style
fig.text(0.5, 0.75, "X-axis shrinking → Density increasesnY-axis higher near zero", 
         ha='center', va='center', fontsize=12, color='black', 
         fontstyle='italic', bbox=dict(facecolor='#f7f7f7', edgecolor='none', alpha=0.9))

fig.text(0.5, 0.25, "X-axis stretching → Density decreasesnY-axis lower for giant values", 
         ha='center', va='center', fontsize=12, color='black', 
         fontstyle='italic', bbox=dict(facecolor='#f7f7f7', edgecolor='none', alpha=0.9))

fig.set_dpi(250)
plt.tight_layout()
plt.show()

Depending on the distribution, different regions are scaled and shrunk within the transformation.

Simulating the Jacobian adjustment

To see the Jacobian adjustment in motion, let’s simulate data from the Exponential(1) distribution for X, compute Y = X², and plot the empirical histogram of Y against the theoretical pdf for increasing sample sizes n. As n grows, the histogram should converge to the proper adjusted pdf, not the naive one.

import numpy as np
import matplotlib.pyplot as plt

def correct_pdf(y):
    return np.exp(-np.sqrt(y)) / (2 * np.sqrt(y))

def naive_pdf(y):
    return np.exp(-np.sqrt(y))

# Sample sizes to check
sample_sizes = [100, 1_000, 10_000]

fig, axs = plt.subplots(1, len(sample_sizes), figsize=(15, 5))

y_vals = np.linspace(0.01, 50, 1000)  # Range for plotting theoretical pdfs

for i, n in enumerate(sample_sizes):
    # Sample X ~ Exp(1)
    x = np.random.exponential(scale=1, size=n)
    y = x**2
    
    # Plot histogram (normalized to density)
    axs[i].hist(y, bins=50, range=(0, 50), density=True, alpha=0.6, color='skyblue', label='Empirical Histogram')
    
    # Plot theoretical pdfs
    axs[i].plot(y_vals, correct_pdf(y_vals), 'g-', label='Correct PDF (with Jacobian)')
    axs[i].plot(y_vals, naive_pdf(y_vals), 'r--', label='Naive PDF (no Jacobian)')
    
    axs[i].set_title(f'n = {n}')
    axs[i].set_xlabel('Y = X²')
    axs[i].set_ylabel('Density')
    axs[i].legend()
    axs[i].set_ylim(0, 0.5)  # For consistent viewing
    axs[i].grid(True)  # Add grid to every subplot

# Set the figure DPI to 250 for higher resolution
fig.set_dpi(250)

plt.grid()
plt.tight_layout()
plt.show()

And the result's what we expect.

let's simulate data from the Exponential(1) distribution for X, compute Y = X², and plot the empirical histogram of Y against the theoretical pdf for increasing sample sizes n. As n grows, the histogram should converge to the correct adjusted pdf, not the naive one. — A proof of the Jacobian adjustment: the green curve accurately matches the sampled data for y

Histogram Equalization: an actual world application

A summary plot of histogram equalization from a post I wrote about it.

A classic example where the Jacobian adjustment appears naturally is histogram equalization in image processing.

We treat pixel intensities X (typically in ([0, 255])) as samples from some distribution with empirical pdf based on the image histogram.

The goal is to remodel them to recent intensities Y in order that Y is roughly uniform on ([0, 255]) — this spreads out the values and improves contrast.

The transformation used is precisely the scaled cumulative distribution function (CDF) of X:

$$ Y = 255 cdot F_X(X) $$

where ( F_X(x) = int_{-infty}^x f_X(t) , dt ) (empirical CDF in practice).

Why does this work? It's a direct application of the Probability Integral Transform (PIT):

If ( Y = F_X(X) ) and X is continuous, then Y ~ Uniform([0,1]).

Scaling by 255 gives Uniform([0,255]).

Now see the Jacobian at work:

Let ( g(x) = L cdot F_X(x) ) (( L = 255 )).

The derivative ( g'(x) = L cdot f_X(x) ) (for the reason that derivative of the CDF is the pdf).

Apply the change-of-variables formula:

$$ f_Y(y) = f_X(x) / |g'(x)| = f_X(x) / (L f_X(x)) = 1/L $$

The ( f_X(x) ) cancels perfectly, leaving a continuing (uniform) density.

The Jacobian factor ( 1 / |g'(x)| ) mechanically flattens the distribution by compensating for regions where the unique density was high or low.

In discrete images, rounding makes it approximate, however the principle is identical.

In Conclusion

The Jacobian adjustment is one in all those quiet pieces of mathematics that feels unnecessary—until you skip it and suddenly your probabilities don’t add as much as 1 anymore. Whether you’re squaring waiting times, modeling energy from speed, or flattening image histograms, the transformation changes not only the values but how probability is distributed across them. The factor ( left| frac{dx}{dy} right| ) (or its multivariate cousin, the determinant) is the precise compensation that keeps the full probability conserved while accounting for local stretching and compression.

Next time you transform a random variable, remember the sand on the rubber sheet: warp the axis all you wish, but the full sand must stay the identical. The Jacobian Adjustment is the rule that makes it occur.

Code

Link to Colab Notebook

References and Further Reading

A couple of things I discovered useful.

Keeping Probabilities Honest: The Jacobian Adjustment

Introduction

The Intuition

The Math

The General Form

Emperical Proof

Simulating the stretching and shrinking

Simulating the Jacobian adjustment

Histogram Equalization: an actual world application

In Conclusion

Code

References and Further Reading

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Scaling AI-based Data Processing with Hugging Face + Dask

Introducing the AMD fifth Gen EPYC™ CPU

AI Wrapped: The 14 AI terms you couldn’t avoid in 2025

A Security Review of Gradio 5

Fixing Gradient Accumulation

Keeping Probabilities Honest: The Jacobian Adjustment

Introduction

The Intuition

The Math

The General Form

Emperical Proof

Simulating the stretching and shrinking

Simulating the Jacobian adjustment

Histogram Equalization: an actual world application

In Conclusion

Code

References and Further Reading

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.