For multi-product corporations, one critical metric is commonly what known as “cross-product adoption”. (i.e. understanding how users engage with multiple offerings in a given product portfolio)
One measure suggested to calculate cross-product or cross-feature usage in the favored book Hacking Growth [1] is the Jaccard Index. Traditionally used to measure the similarity between two sets, the Jaccard Index can even function a robust tool for assessing product adoption patterns. It does this by quantifying the overlap in users between products, you possibly can discover cross-product synergies and growth opportunities.
A dbt package dbt_set_similarity is designed to simplify the calculation of set similarity metrics directly inside an analytics workflow. This package provides a way to calculate the Jaccard Indices inside SQL transformation workloads.
To import this package into your dbt project, add the next to the packages.yml
file. We may also need dbt_utils for the needs of this articles example. Run a dbt deps
command inside your project to put in the package.
packages:
- package: Matts52/dbt_set_similarity
version: 0.1.1
- package: dbt-labs/dbt_utils
version: 1.3.0
The Jaccard Index, also often called the Jaccard Similarity Coefficient, is a metric used to measure the similarity between two sets. It’s defined as the dimensions of the intersection of the sets divided by the dimensions of their union.
Mathematically, it may possibly be expressed as:
Where:
- A and B are two sets (ex. users of product A and product B)
- The numerator represents the variety of elements in each sets
- The denominator represents the full variety of distinct elements across each sets
The Jaccard Index is especially useful within the context of cross-product adoption because:
- It focuses on the overlap between two sets, making it ideal for understanding shared user bases
- It accounts for differences in the full size of the sets, ensuring that results are proportional and never skewed by outliers
For instance:
- If 100 users adopt Product A and 50 adopt Product B, with 25 users adopting each, the Jaccard Index is 25 / (100 + 50 — 25) = 0.2, indicating a 20% overlap between the 2 user bases by the Jaccard Index.
The instance dataset we will probably be using is a fictional SaaS company which offers cupboard space as a product for consumers. This company provides two distinct storage products: document storage (doc_storage) and photo storage (photo_storage). These are either true, indicating the product has been adopted, or false, indicating the product has not been adopted.
Moreover, the demographics (user_category) that this company serves are either tech enthusiasts or homeowners.
For the sake of this instance, we’ll read this csv file in as a “seed” model named seed_example
inside the dbt project.
Now, let’s say we would like to calculate the jaccard index (cross-adoption) between our document storage and photo storage products. First, we want to create an array (list) of the users who’ve the document storage product, alongside an array of the users who’ve the photo storage product. Within the second cte, we apply the jaccard_coef
function from the dbt_set_similarity
package to assist us easily compute the jaccard coefficient between the 2 arrays of user id’s.
with product_users as (
select
array_agg(user_id) filter (where doc_storage = true)
as doc_storage_users,
array_agg(user_id) filter (where photo_storage = true)
as photo_storage_users
from {{ ref('seed_example') }}
)select
doc_storage_users,
photo_storage_users,
{{
dbt_set_similarity.jaccard_coef(
'doc_storage_users',
'photo_storage_users'
)
}} as cross_product_jaccard_coef
from product_users
As we will interpret, evidently just over half (60%) of users who’ve adopted either of products, have adopted each. We will graphically confirm our result by placing the user id sets right into a Venn diagram, where we see three users have adopted each products, amongst five total users: 3/5 = 0.6.
Using the dbt_set_similarity
package, creating segmented jaccard indices for our different user categories must be fairly natural. We are going to follow the identical pattern as before, nonetheless, we’ll simply group our aggregations on the user category that a user belongs to.
with product_users as (
select
user_category,
array_agg(user_id) filter (where doc_storage = true)
as doc_storage_users,
array_agg(user_id) filter (where photo_storage = true)
as photo_storage_users
from {{ ref('seed_example') }}
group by user_category
)select
user_category,
doc_storage_users,
photo_storage_users,
{{
dbt_set_similarity.jaccard_coef(
'doc_storage_users',
'photo_storage_users'
)
}} as cross_product_jaccard_coef
from product_users
We will see from the information that amongst homeowners, cross-product adoption is higher, when considering jaccard indices. As shown within the output, all homeowners who’ve adopted considered one of the product, have adopted each. Meanwhile, only one-third of the tech enthusiasts who’ve adopted one product have adopted each of the products. Thus, in our very small dataset, cross-product adoption is higher amongst homeowners versus tech enthusiasts.
We will graphically confirm the output by again creating Venn diagram:
dbt_set_similarity provides an easy and efficient technique to calculate cross-product adoption metrics reminiscent of the Jaccard Index directly inside a dbt workflow. By applying this method, multi-product corporations can gain beneficial insights into user behavior and adoption patterns across their product portfolio. In our example, we demonstrated the calculation of overall cross-product adoption in addition to segmented adoption for distinct user categories.
Using the package for cross-product adoption is solely one straightforward application. In point of fact, there exists countless other potential applications of this method, for instance some areas are:
- Feature usage evaluation
- Marketing campaign impact evaluation
- Support evaluation
Moreover, this type of evaluation is actually not limited to simply SaaS, but can apply to virtually any industry. Completely happy Jaccard-ing!
References
[1] Sean Ellis and Morgan Brown, Hacking Growth (2017)