is a statistical approach used to reply the query: “How long will something last?” That “something” could range from a patient’s lifespan to the sturdiness of a machine component or the duration of a user’s subscription.
Some of the widely used tools on this area is the Kaplan-Meier estimator.
Born on the planet of biology, Kaplan-Meier made its debut tracking life and death. But like all true celebrity algorithm, it didn’t stay in its lane. Today, it’s showing up in business dashboards, marketing teams, and churn analyses all over the place.
But here’s the catch: business isn’t biology. It’s messy, unpredictable, and stuffed with plot twists. For this reason there are a few issues that make our lives tougher once we try to make use of survival evaluation within the business world.
To start with, we’re typically not only interested by whether a customer has “survived” (whatever survival could mean on this context), but moderately in how much of that individual’s economic value has survived.
Secondly, contrary to biology, it’s very possible for patrons to “die” and “resuscitate” multiple times (consider if you unsubscribe/resubscribe to a web-based service).
In this text, we are going to see find out how to extend the classical Kaplan-Meier approach in order that it higher suits our needs: modeling a continuous (economic) value as an alternative of a binary one (life/death) and allowing “resurrections”.
A refresher on the Kaplan-Meier estimator
Let’s pause and rewind for a second. Before we start customizing Kaplan-Meier to suit our business needs, we’d like a fast refresher on how the classic version works.
Suppose you had 3 subjects (let’s say lab mice) and also you gave them a drugs that you must test. The medication was given at different moments in time: subject received it in January, subject in April, and subject in May.
Then, you measure how long they survive. Subject died after 6 months, subject after 4 months, and subject continues to be alive on the time of the evaluation (November).
Graphically, we are able to represent the three subjects as follows:
Now, even when we desired to measure an easy metric, like average survival, we might face an issue. Actually, we don’t understand how long subject will survive, because it continues to be alive today.
It is a classical problem in statistics, and it’s called “right censoring“.
Right censoring is stats-speak for “we don’t know what happened after a certain point” and it’s an enormous deal in survival evaluation. So big that it led to the event of one of the iconic estimators in statistical history: the Kaplan-Meier estimator, named after the duo who introduced it back within the Fifties.
So, how does Kaplan-Meier handle our problem?
First, we align the clocks. Even when our mice were treated at different times, what matters is . So we reset the -axis to zero for everybody — day zero is the day they got the drug.

Now that we’re all on the identical timeline, we would like to construct something useful: an aggregate survival curve. This curve tells us the probability that a mouse in our group will survive not less than months post-treatment.
Let’s follow the logic together.
- As much as time 3? Everyone’s still alive. So survival = 100%. Easy.
- At time 4, mouse dies. Which means out of the three mice, only 2 of them survived after time 4. That provides us a survival rate of 67% at time 4.
- Then at time 6, mouse checks out. Of the two mice that had made it to time 6, only one survived, so the survival rate from time 5 to six is 50%. Multiply that by the previous 67%, and we get 33% survival as much as time 6.
- After time 7 we don’t produce other subjects which can be observed alive, so the curve has to stop here.
Let’s plot these results:

Since code is usually easier to know than words, let’s translate this to Python. We have now the next variables:
kaplan_meier
, an array containing the Kaplan-Meier estimates for every time limit, e.g. the probability of survival as much as time .obs_t
, an array that tells us whether a person is observed (e.g., not right-censored) at time .surv_t
, boolean array that tells us whether each individual is alive at time .surv_t_minus_1
, boolean array that tells us whether each individual is alive at time -1.
All we now have to do is to take all of the individuals observed at , compute their survival rate from -1 to (survival_rate_t
), and multiply it by the survival rate as much as time -1 (km[t-1]
) to acquire the survival rate as much as time (km[t]
) In other words,
survival_rate_t = surv_t[obs_t].sum() / surv_t_minus_1[obs_t].sum()
kaplan_meier[t] = kaplan_meier[t-1] * survival_rate_t
where, in fact, the start line is kaplan_meier[0] = 1
.
When you don’t need to code this from scratch, the Kaplan-Meier algorithm is out there within the Python library lifelines
, and it may be used as follows:
from lifelines import KaplanMeierFitter
KaplanMeierFitter().fit(
durations=[6,7,4],
event_observed=[1,0,1],
).survival_function_["KM_estimate"]
When you use this code, you’ll obtain the identical result we now have obtained manually with the previous snippet.
To date, we’ve been hanging out within the land of mice, medicine, and mortality. Not exactly your average quarterly KPI review, right? So, how is this convenient in business?
Moving to a business setting
To date, we’ve treated “death” as if it’s obvious. In Kaplan-Meier land, someone either lives or dies, and we are able to easily log the time of death. But now let’s stir in some real-world business messiness.
is
It seems it’s difficult to reply this query, not less than for a few reasons:
- “Death” will not be easy to define. Let’s say you’re working at an e-commerce company. You need to know when a user has “died”. Must you count them as dead after they delete their account? That’s easy to trace… but too rare to be useful. What if they only start shopping less? But much less is dead? Every week of silence? A month? Two? You see the issue. The definition of “death” is bigoted, and depending on where you draw the road, your evaluation might tell wildly different stories.
- “Death” will not be everlasting. Kaplan-Meier has been conceived for biological applications wherein once a person is dead there is no such thing as a return. But in business applications, resurrection will not be only possible but pretty frequent. Imagine a streaming service for which individuals pay a monthly subscription. It’s easy to define “death” on this case: it’s when users cancel their subscriptions. Nonetheless, it’s pretty frequent that, a while after cancelling, they re-subscribe.
So how does all this play out in data?
Let’s walk through a toy example. Say we now have a user on our e-commerce platform. Over the past 10 months, here’s how much they’ve spent:

To squeeze this into the Kaplan-Meier framework, we’d like to translate that spending behavior right into a life-or-death decision.
So we make a rule: if a user stops spending for two consecutive months, we declare them “inactive”.
Graphically, this rule looks like the next:

For the reason that user spent $0 for 2 months in a row (month 4 and 5) we are going to consider this user inactive ranging from month 4 on. And we are going to do this despite the user began spending again in month 7. It is because, in Kaplan-Meier, resurrections are assumed to be inconceivable.
Now let’s add two more users to our example. Since we now have decided a rule to show their value curve right into a survival curve, we also can compute the Kaplan-Meier survival curve:

By now, you’ve probably noticed how much nuance (and data) we’ve thrown away simply to make this work. User got here back from the dead — but we ignored that. User ‘s spending dropped significantly — but Kaplan-Meier doesn’t care, because all it sees is 1s and 0s. We forced a continuous value (spending) right into a binary box (alive/dead), and along the best way, we lost a complete lot of data.
So the query is: can we extend Kaplan-Meier in a way that:
- keeps the unique, continuous data intact,
- avoids arbitrary binary cutoffs,
- allows for resurrections?
Yes, we are able to. In the subsequent section, I’ll show you the way.
Introducing “Value Kaplan-Meier”
Let’s start with the straightforward Kaplan-Meier formula we now have seen before.
# kaplan_meier: array containing the Kaplan-Meier estimates,
# e.g. the probability of survival as much as time t
# obs_t: array, whether a subject has been observed at time t
# surv_t: array, whether a subject was alive at time t
# surv_t_minus_1: array, whether a subject was alive at time t−1
survival_rate_t = surv_t[obs_t].sum() / surv_t_minus_1[obs_t].sum()
kaplan_meier[t] = kaplan_meier[t-1] * survival_rate_t
The primary change we’d like to make is to switch surv_t
and surv_t_minus_1
, that are boolean arrays that tell us whether a subject is alive (1) or dead (0) with arrays that tell us the (economic) value of every subject at a given time. For this purpose, we are able to use two arrays named val_t
and val_t_minus_1
.
But this will not be enough, because since we’re coping with continuous value, every user is on a distinct scale and so, assuming that we would like to weigh them equally, we’d like to rescale them based on some individual value. But what value should we use? Probably the most reasonable selection is to make use of their initial value at time 0, before they were influenced by whatever treatment we’re applying to them.
So we also need to make use of one other vector, named val_t_0
that represents the worth of the person at time 0.
# value_kaplan_meier: array containing the Value Kaplan-Meier estimates
# obs_t: array, whether a subject has been observed at time t
# val_t_0: array, user value at time 0
# val_t: array, user value at time t
# val_t_minus_1: array, user value at time t−1
value_rate_t = (
(val_t[obs_t] / val_t_0[obs_t]).sum()
/ (val_t_minus_1[obs_t] / val_t_0[obs_t]).sum()
)
value_kaplan_meier[t] = value_kaplan_meier[t-1] * value_rate_t
What we’ve built is a direct generalization of Kaplan-Meier. Actually, for those who set val_t = surv_t
, val_t_minus_1 = surv_t_minus_1
, and val_t_0
as an array of 1s, this formula collapses neatly back to our original survival estimator. So yes—it’s legit.
And here is the curve that we might obtain when applied to those 3 users.

Let’s call this new edition the Value Kaplan-Meier estimator. Actually, it answers the query:
x
We’ve got the idea. But does it work within the wild?
Using Value Kaplan-Meier in practice
When you take the Value Kaplan-Meier estimator for a spin on real-world data and compare it to the nice old Kaplan-Meier curve, you’ll likely notice something comforting — they often have the identical shape. That’s sign. It means we haven’t broken anything fundamental while upgrading from binary to continuous.
But here’s where things get interesting: Value Kaplan-Meier often sits a bit its traditional cousin. Why? Because on this latest world, users are allowed to “resurrect”. Kaplan-Meier, being the more rigid of the 2, would’ve written them off the moment they went quiet.
So how will we put this to make use of?
Imagine you’re running an experiment. At time zero, you begin a brand new treatment on a gaggle of users. Whatever it’s, you possibly can track how much value “survives” in each the treatment and control groups over time.
And that is what your output will probably appear to be:

Conclusion
Kaplan-Meier is a widely used and intuitive method for estimating survival functions, especially when the consequence is a binary event like death or failure. Nonetheless, many real-world business scenarios involve more complexity — resurrections are possible, and outcomes are higher represented by continuous values moderately than a binary state.
In such cases, Value Kaplan-Meier offers a natural extension. By incorporating the economic value of people over time, it enables a more nuanced understanding of value retention and decay. This method preserves the simplicity and interpretability of the unique Kaplan-Meier estimator while adapting it to higher reflect the dynamics of customer behavior.
Value Kaplan-Meier tends to offer a better estimate of retained value in comparison with Kaplan-Meier, attributable to its ability to account for recoveries. This makes it particularly useful in evaluating experiments or tracking customer value over time.