Data scientists are within the business of decision-making. Our work is concentrated on easy methods to make informed decisions under uncertainty.
And yet, on the subject of quantifying that uncertainty, we frequently lean on the concept of “statistical significance” — a tool that, at best, provides a shallow understanding.
In this text, we’ll explore why “statistical significance” is flawed: arbitrary thresholds, a false sense of certainty, and a failure to handle real-world trade-offs.
Most vital, we’ll learn easy methods to move beyond the binary mindset of great vs. non-significant, and adopt a decision-making framework grounded in economic impact and risk management.
Imagine we just ran an A/B test to guage a brand new feature designed to spice up the time users spend on our website — and, because of this, their spending.
The control group consisted of 5,000 users, and the treatment group included one other 5,000 users. This offers us two arrays, named treatment
and control
, each of them containing 5,000 values representing the spending of individual users of their respective groups.