From Transactions to Trends: Predict When a Customer Is About to Stop Buying

how math can solve so many problems in the actual world. Once I was in grade school, I definitely didn’t see it that way. I never hated math, by the way in which, and neither did I actually have trouble learning most of the essential concepts.

Nevertheless, I confess that for a lot of the classes beyond the classic arithmetic, I normally thought,

Those were other times, though. There was no Web, no data science, and computers were barely a thing. But time passes. Life happens, and we get to see the day when we are going to solve essential business problems with !

On this post, we are going to use the famous linear regression for a unique problem: predicting customer churn.

Linear Regression vs Churn

Customer churn rarely happens overnight. In lots of cases, customers will step by step reduce their purchasing frequency before stopping completely. Some call that silent churn [1].

Predicting churn might be done with the normal churn models, which (1) require labeled churn data; (2) sometimes are complex to clarify; (3) detect churn after it already happened.

Then again, this project shows a unique solution, answering an easier query:

Is that this customer
slowing down the shopping?

This query is answered with the next logic.

We use monthly purchase trends and linear regression to measure customer momentum over time. If the shopper continues to extend their expenses, the summed amount will grow over time, resulting in a trend upward (or a positive slope in a linear regression, when you will). The alternative can be true. Lower transaction amounts will add as much as a downtrend.

Let’s break down the logic in small steps, and understand what we are going to do with the information:

Aggregate customer transactions by month
Create a continuous time index (e.g. 1, 2, 3…n)
Fill missing months with zero purchases
Fit a linear regression line
Use the slope (converted to degrees) to quantify buying behavior
Assessment: A negative slope indicates declining engagement. A positive slope indicates increasing engagement.

Well, let’s move on to the implementation next.

Code

The very first thing is importing some modules right into a Python session.

# Imports
import scipy.stats as stats
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

Then, we are going to generate some data that simulates some customers transactions. You possibly can have a look at the whole code on this GitHub repository. The dataset generated brings the columns customer_id, transaction_date, and total_amt, and can appear to be the following picture.

Dataset generated for this exercise. Image by the writer.

Now we are going to create a brand new column that extracts the month of the date, so it becomes easier for us to group the information later.

# Create latest column month
df['mth'] = df['transaction_date'].dt.month

# Group customers by month
df_group = (
    df
    .groupby(['mth','customer_id'])
    ['total_amt']
    .sum()
    .reset_index()
)

Here is the result.

If we quickly check if there are customers who haven’t made a transaction every month, we are going to find just a few cases.

That leads us to the following point. We now have to be sure that, if the shopper doesn’t have a minimum of one purchase monthly, then we’ve so as to add that month with a $0 expense.

Let’s construct a function that may try this and likewise calculate the slope of the shopper’s shopping trend.

This function looks enormous, but we are going to go over it in smaller chunks. Let’s do that.

Filter the information for a given customer using Pandas query() method.
Make a fast group and check if the shopper has a minimum of one purchase for each month.
If not, we are going to add the missing month with a $0 expense. I implemented this by merging a brief dataframe with the 12 months and $0 with the unique data. After the merge on months, those periods missing can be rows with NaN for the unique data column, which might be stuffed with $0.
Then, we normalize the axes. Do not forget that the X-axis is an index from 1 to 12, however the Y-axis is the expense amount, in 1000’s of dollars. So, to avoid distortion in our slope, we normalize all the things to the identical scale, between 0 and 1. For that, we use the custom function min_max_standardize.
Next, we are able to plot the regression using one other custom function.
Then we are going to calculate the slope, which is the primary result returned from the function scipy.linregress().
Finally, to calculate the angle of the slope in degrees, we are going to appeal to pure mathematics, using the concept of arc tangent to calculate the angle between the X-axis and the linear regression slope line. In Python, just use the functions np.arctan() and np.degrees() from numpy.

# Standardize the information
def min_max_standardize(vals):
    return (vals - np.min(vals)) / (np.max(vals) - np.min(vals))

#------------

# Quick Function to plot the regression
def plot_regression(x,y, cust):
  plt.scatter(x,y, color = 'gray')
  plt.plot(x,
          stats.linregress(x,y).slope*np.array(x) + stats.linregress(x,y).intercept,
          color = 'red',
          linestyle='--')
  plt.suptitle("Slope of the Linear Regression [Expenses x Time]")
  plt.title(f"Customer {cust} | Slope: {np.degrees(np.arctan(stats.linregress(x,y).slope)):.0f} degrees. Positive = Buying more | Negative = Buying less", size=9, color='gray')
  plt.show()

#-----

def get_trend_degrees(customer, plot=False):

  # Filter the information
  one_customer = df.query('customer_id == @customer')
  one_customer = one_customer.groupby('mth').total_amt.sum().reset_index().rename(columns={'mth':'period_idx'})

  # Check if all months are in the information
  cnt = one_customer.groupby('period_idx').period_idx.nunique().sum()

  # If not, add 0 to the months without transactions
  if cnt < 12:
      # Create a DataFrame with all 12 months
      all_months = pd.DataFrame({'period_idx': range(1, 13), 'total_amt': 0})

      # Merge with the prevailing one_customer data.
      # Use 'right' merge to maintain all 12 months from 'all_months' and fill missing total_amt.
      one_customer = pd.merge(all_months, one_customer, on='period_idx', how='left', suffixes=('_all', ''))

      # Mix the total_amt columns, preferring the actual data over the 0 from all_months
      one_customer['total_amt'] = one_customer['total_amt'].fillna(one_customer['total_amt_all'])

      # Drop the temporary _all column if it exists
      one_customer = one_customer.drop(columns=['total_amt_all'])

      # Sort by period_idx to make sure correct order
      one_customer = one_customer.sort_values(by='period_idx').reset_index(drop=True)

  # Min Max Standardization
  X = min_max_standardize(one_customer['period_idx'])
  y = min_max_standardize(one_customer['total_amt'])

  # Plot
  if plot:
    plot_regression(X,y, customer)

  # Calculate slope
  slope = stats.linregress(X,y)[0]

  # Calculate angle degrees
  angle = np.arctan(slope)
  angle = np.degrees(angle)

  return angle

Great. It's time to put this function to check. Let’s get two customers:

C_014.
That is an uptrend customer who’s buying more over time.

# Example of strong customer
get_trend_degrees('C_014', plot=True)

The plot it yields shows the trend. We notice that, regardless that there are some weaker months in between, overall, the amounts are likely to increase as time passes.

Uptrending customer. Image by the writer.

The trend is 32 degrees, thus pointing well up, indicating a robust relationship with this customer.

C_003.
It is a downtrend customer who’s buying less over time.

# Example of customer stop buying
get_trend_degrees('C_003', plot=True)

Downtrending customer. Image by the writer.

Here, the expenses over the months are clearly decreasing, making the slope of this curve point down. The road is 29 degrees negative, indicating that this customer goes away from the brand, thus requires to be stimulated to return back.

Before You Go

Well, that may be a wrap. This project demonstrates an easy, interpretable approach to detecting declining customer purchase behavior using linear regression.

As a substitute of counting on complex churn models, we analyze purchase trends over time to discover when customers are slowly disengaging.

This straightforward model can provide us an excellent notion of where the shopper is moving towards, whether it's a greater relationship with the brand or moving away from it.

Definitely, with other data from the business, it is feasible to enhance this logic and apply a tuned threshold and quickly discover potential churners every month, based on past data.

Before wrapping up, I would really like to present proper credit to the unique post that inspired me to learn more about this implementation. It's a post from Matheus da Rocha which you could find here, on this link.

Finally, find more about me on my website.

https://gustavorsantos.me