often use Mean Reciprocal Rank (MRR) and Mean Average Precision (MAP) to evaluate the standard of their rankings. On this post, we are going to discuss why (MAP) and (MRR) poorly aligned with modern user behavior in...
Introduction
customer annoyance from wait times. Calls arrive randomly, so wait time X follows an Exponential distribution—most waits are short, just a few are painfully long.
Now I’d argue that annoyance isn’t linear: a 10-minute...
be a sensitive topic. Perhaps best avoided on first encounter with a Statistician. The disposition toward the subject has led to a tacit agreement that α = 0.05 is the gold standard—in fact,...
“What I cannot create, I don't understand” — attributed to R. Feynman
After Vibe Coding, we appear to have entered the (very area of interest, but much cooler) era of Vibe Proving: DeepMind wins gold...
to Constructing an Overengineered Retrieval System. That one was about constructing the whole system. This one is about doing the evals for it.
Within the previous article, I went through different parts of a RAG...
in some interesting conversations recently about designing LLM-based tools for end users, and one in every of the vital product design questions that this brings up is “what do people find out about...
a decade old now.
Back then, OpenAI felt like one (well-baked) startup amongst others. DeepMind was already around, but not yet fully integrated into Google. And, back then, the “triad of deep learning” —...
, and it’s officially leaf-raking season. As I engaged on this tedious task, I noticed it is essentially one big optimization problem.
When raking my leaves, I made 4 piles: one on either side of...