Prediction vs. Search Models: What Data Scientists Are Missing

As data scientists, we’ve turn out to be extremely focused on constructing algorithms, causal/predictive models, and advice systems (and now genAI). We optimize for accuracy, fine-tune hyperparameters, and search for the subsequent big fancy model to deploy in prod. But in our give attention to delivering a state-of-the-art implementation, we’ve neglected a category of models that may reshape how we predict in regards to the business problem itself.

Consider the rise of platform firms like Amazon, Spotify, Netflix, Uber, and Upstart. While their industries appear vastly different, they fundamentally operate as intermediaries in search-and-matching markets between demand and provide agents. These firms’ value proposition lies in reducing search costs for purchasers by providing a platform and an identical algorithm to attach agents together under uncertainty and heterogeneous preferences.

The Core Challenge

In these markets, the elemental questions aren’t just standard isolated machine learning problems similar to “how will we predict demand?” or “how do ads impact churn rate?” As an alternative, the critical challenges are:

What number of suppliers should we onboard given expected demand patterns?
How will we design matching mechanisms that generates the optimal allocation?
What pricing strategies maximize platform revenue while balancing platform growth and customer satisfaction?
How will we handle the downstream impact when changes in a single model primitive has a ripple effect?

Traditional data science approaches treat these as independent optimization problems and dedicate separate workstreams to them. Nonetheless, economists have been working on these problems because the Eighties and developed a unified theoretical framework to capture the interdependent nature of those platform dynamics called search theoretic models. Moreover, this was something I’ve studied deeply in graduate school but haven’t seen applied in industry work, so I’d prefer to bring attention to this set of models.

Why This Matters for Data Scientists

Data science as a field is great at measurement and algorithms, but falls behind in problem formulation (which we’ve got left to PMs and execs). Understanding these theoretical foundations informs how we take into consideration what metrics to measure and what algorithms to construct. As an alternative of constructing isolated prediction models, we will design systems that work jointly together to account for equilibrium effects, strategic behavior, and feedback loops. This theoretical lens helps us discover the proper experiment to run, understand when our models break down (cohort drift) attributable to changes in agent preferences, and design interventions that has a first-order impact on the equilibrium outcomes.

In this text, I’ll introduce the idea behind search models and exhibit their practical application using a lending platform (Upstart/LendingClub/Prosper) that matches borrowers and banks as a concrete example. We’ll explore how this framework can inform partner acquisition strategies, pricing and fee mechanisms, and what levers ought to be used to drive growth. Interested readers can proceed to the subsequent section for a brief background summarising how these models got here to be, or skip straight to the sensible example to grasp the right way to design these models.

The Economic Literature

This modeling framework comes from economics within the Eighties, when Dale Mortensen, Christopher Pissarides, and Peter Diamond were trying to grasp why unemployment exists even when there are job openings. This series of query led them to win the Nobel Prize in 2010 for his or her work. Their Diamond-Mortensen-Pissarides (DMP) model modified how we take into consideration markets. The core insight is that finding a job (or hiring someone) takes time (and costs money), resulting in frictions in an otherwise competitive market. Diamond showed in 1982 that when searching is dear, wages aren’t detemrined by aggregate supply and demand. As an alternative, they’re negotiated between a selected employee and firm after in a bilateral bargaining process. This negotiation uses Nash bargaining, where the wage is dependent upon each party’s bargaining power and out of doors options. If either side has higher outside options, they get a bigger share of the worth created by the match.

Mortensen expanded on this by showing that search costs create a pool of unemployed employees even in a healthy economy. Staff develop a “reservation wage”—the minimum they’ll accept based on what they look forward to finding in the event that they keep searching. Firms similarly balance the fee of keeping a position open against the expected value a employee would bring. Pissarides then tied these individual negotiations to economy-wide patterns, showing how unemployment and job creation relate to business cycles.

In 2005, Duffie, Gârleanu, and Pedersen applied this same pondering to financial markets. In over-the-counter markets, buyers and sellers have to seek out one another, similar to employees and firms. This search process creates bid-ask spreads and explains why the identical asset can trade at different prices at the identical time. A seller who needs money immediately (high liquidity demand) might accept a lower cost, while someone with enough time can wait for a greater offer. Lagos and Rocheteau later relaxed restrictions on binary asset holdings and introduced a variable asset portfolio for every agent and showed how monetary policy affects these decentralized markets.

The third piece of the puzzle comes from platform economics. Platforms create a marketplace that require each sellers and buyers. Ride-sharing platforms needs each drivers and riders. Lending platforms need each borrowers and banks. The literature on two-sided markets shows how platforms can maximize their revenue by setting prices and jointly controlling the dimensions of demand and provide agents. These platforms has to set a price to be certain that participants remain out there (Incentive Compatbility constraint), and that accepting the transaction is helpful for these agents (Individual Rationality constraint). Platforms could also handle instances of multiple markets (Amazon books/electronics), where demand/supply from one segment might need spillover effects into the opposite segment.

These three related streams of research could be combined to present us the tools to grasp modern digital platform firms. Below I’ll show a practical example on how these concepts tie together in a theoretical model to grasp the optimal behavior of a lending platform.

A Practical Example: Lending Platforms

Let’s apply this framework to lending platforms like Upstart, LendingClub, and Prosper. These firms use AI to underwrite loans, connecting banks which have available capital with consumers who need loans. They act as marketplaces where partner banks offer various loan types (personal, auto, mortgage) and consumers apply for credit. The platforms generate profits through origination fees, service fees, and late fees while reducing search costs for each side since banks don’t need to seek out and evaluate borrowers themselves, and consumers don’t have to shop around multiple banks. From a platform perspective, these firms face key economic challenges:

Demand forecasting: How much loan demand will we see next quarter?
Supply management: What number of partner banks do we’d like to handle that demand?
Competition design: How will we keep banks competing for borrowers without driving them away?
Matching mechanism: Should we use auctions, posted prices, or algorithmic matching to match borrowers and lenders?
Risk assessment: How will we model each bank risk appetite and borrower default probability?
Market segmentation: Are there any spillover effects between lending in several market segments?

None of those questions is simple to reply and every has many moving parts. You would possibly forecast loan demand using time series models, but that aggregate number must be broken down by loan type, amount, and duration since banks have different preferences amongst these dimensions. Smaller banks with limited capital may only need to originate short-term loans to high-credit borrowers, while large banks might provide longer-term loans from riskier borrowers in the event that they have excess capital. The matching algorithm must account for these preferences while ensuring each side get enough value (trade surplus) to just accept the offer.

On this framework, each loan represents a three-way negotiation between the borrower, bank, and platform. The borrower has the ability to reject any offer, the bank has the power to put a reservation rate of interest, while the platform has the ability to come to a decision the allocation of the overall trade surplus. The platform controls key parameters like rates of interest and charges, since changing these affects participation on each side. Rates which might be too high cause borrowers to go away and lower adoption rate and increase churn. Rates which might be too low reduce partner satisfaction and reduce the variety of partners. Every decision shifts the equilibrium, and understanding these dynamics is crucial for platform growth.

The Model Environment

Let’s construct the only model to grasp these dynamics. We’ll start with assumptions that make the maths tractable, which is able to make up our . This environment will only have one loan type lasting just one period, similar borrowers, and similar banks.

The environment exists in discrete time $t in mathcal{T}$, with no inter-period discounting. There exists a loan of size $S$ with an rate of interest of $r$, where $r$ is an endogenous variable (whose final result is set throughout the system and never a model primitive).

Borrowers arrive on the platform following an unconditional Poisson rate $Lambda$. Borrowers come into the platform demanding a loan of size $S$, which they value at $V(S)$. Their have a linear utility function $U_L = V(S) – (1+r)S$, the valuation they receive from the loan net of the payment that they must make in the subsequent period. The stock of unmatched borrowers at every time period is denoted $L_t$. Each borrower has a repayment probability $p$. After they have a proposal for a loan, they’ll decide to either accept or reject that provide. In the event that they reject the offer, they leave the market and exit the platform. The borrower at all times think that they may repay the loan.

On the banking side, there exists a set of banks $i in mathcal{J}$, with a maximum capital capability $K$ and a price of origination $c$. Each loan of size $S$ has a maturity date of $T=1$ (a loan that’s successfully originated reduces that bank’s available capital by $S$ for $1$ period). Their goal is to maximise profit by setting a minimum acceptable rate of interest on the platform, and can leave the platform if they can not generate profit.

On this environment, there exists a platform that has an identical technology $M(B,L)$ to match banks and borrowers. This platform can observe all parameters of every agent and determine the rate of interest $r$ charged to the borrower and origination fee $f$ charged to the bank that maximizes the revenue of the platform. The platform also has the power to onboard any variety of banks they desire by setting $B$. When a match occurs, the platform selects one bank at random from the stock of willing banks and provides a proposal: $ { S, r, f } $ that should be incentive-compatible for each the bank and the borrower.

For this application we’ll use a normal matching technology called the Cobb-Douglas (which can be utilized in the literature as a production function) that provides the mixture matching rate for this market. This matching function takes an input the variety of banks and borrowers and maps them into the variety of matches per period:

$$ M(B,L) = alpha B^beta L^{1-beta}$$

In every time period, the expected matching rate per bank is defined as the mixture variety of matches over the stock of banks: $phi equiv frac{M(B,L)}{B} = alpha B^{beta-1} L^{1-beta}$. If banks and borrowers are matched at random, the variety of matches per bank per unit time is similar and denoted as $phi$.

This concludes our work in establishing the that this model lives in. The environment should contain enough information to seek out the equilibrium (outcomes) of all parameters of interests of the model.

Finding the Equilibrium

This section’s goals is to seek out solutions to all model outcomes we’re concerned about. To unravel for the equilibrium, we must solve for all the endogenous (free) variables which have not been pre-defined by the environment. For this instance, because of this we’d like to unravel for the rate of interest $r$, the origination fee $f$, and the variety of banks $B$. There isn’t any set order in how we should always solve these statistics, but additionally it is vital to grasp the participation decision of the agents, then solve the matching rate, then finally the bargaining problem.

Under this full information framework, the optimal decision is to just accept for all borrowers and banks. For every loan origination, the expected profit of the bank is given by:

$$pi = p(1+r)S – (1+c)S – f$$

The primary term is represents the probability of repayment multiplied by the profit if the borrower repays the loan. The second term is the fee of origination (since a bank must borrow the funds from its own balance sheet/depositors and pay them a price $c$). The third term is what the bank gives the platform for originating the loan. In point of fact, the expected profit calculation considers long maturity loans ($T>1$), cost of collection conditional on default, and other aspects.

After we solve the expected per-loan profit, we must determine what number of loans get originated per time limit. To have a gradual state amount of unmatched borrowers, the arrival rate of borrowers must equal the variety of matches in the long term (since all borrowers accept the loan condition on a match). Because of this the flow rate of borrowers into the system $Lambda$ must equal to the flow rate of borrowers leaving the system $M(B,L)$:

$$ Lambda = M(B,L) = alpha B^beta L^{1-beta}$$

By solving for $L$, we get that $L = Big[ frac{Lambda}{alpha B^beta} Big]^frac{1}{1-beta}$. If needed, we also can find the expected arrival rate of a loan for a borrower by dividing the matching fucntion by the mass of borrowers. Since we define the match rate $M = Lambda$ by construction, the speed of arrival of loans for a bank is given by $phi = frac{Lambda}{B}$.

Since each loan that a bank funds takes up some a part of its reserve capability $K$, we also can solve for the utmost variety of loans $l$ the bank can fund directly. The budget constraint for the bank is given by $S cdot phi leq K$. Since we’ve got already solved for the flow rate of loans, a bank’s variety of loans per period is due to this fact given by $l^* = min{ frac{Lambda}{B}, frac{K}{S}}$. If the binding constraint $frac{K}{S}$ holds, because of this the platform should increase the variety of banks that it partners with since lending supply is constrained. Provided that there isn’t any free entry condition on the lender side, the platform can directly control the variety of banks $B$ in order that we will stay within the unconstrained equilibria, such that $l^* = frac{Lambda}{B}$.

Now that we all know variety of loans, we will determine the bank’s profit per unit time:

$$ Pi_B = frac{pi Lambda}{B} = frac{Lambda(p(1+r)S – (1+c)S – f)}{B}$$.

As we will see, increasing the variety of banks partnered with the platform decreases the expected profit per bank by decreasing the variety of loans that every bank can originate. For the reason that platform can set each the fees $f$ and the variety of banks $B$, it’s as much as the platform to come to a decision whether or not they need a small variety of banks and high per-bank profit (at the chance of inducing capability constraints) or whether or not they need to maximize the borrower’s surplus by increasing the variety of banks or decreasing the fee rate $r$. This also allows us to set a binding constraint on the utmost fees that the platform can charge, since banks wouldn’t be willing to tackle a loan if the profit is negative. Because of this the upper sure on the fees is given by $ bar{f} = p(1+r)S – (1+c)S$.

If the platform increases the allocation of trade surplus towards the bank by increasing $r$, they’ll charge the next fee and generate more revenue. Nonetheless, this may also decrease the expansion rate of borrowers moving onto the platform in point of fact. In this instance, we set the arrival rate of the borrower as exogenous so it could not be affected by the fee and rate, but we will envision an environment where $Lambda = f(f, r, B)$, which might change this problem to 1 with a conditional entry rate. Since we allow banks to post a reservation rate $underline{r}$ that sets their minimum required rate for any loan origination, we will model the lower sure of rate of interest $underline{r}$ as:

$$ underline{r} = frac{f + (1+c)S}{p S} – 1$$

If the platform decreases the fees charged, the banks can set a lower reserve rate, which increases borrower surplus. This can be possible if the probability of repayment increases, or if the fee of origination (risk-free rate) decreases.

The Negotiation

Now that we’ve got fully described the mixture matching and profit statistics, we’d like to pin down the behavior of every party in the course of the negotiation together with the profit-maximizing parameters for the platform.

When the borrower and bank gets matched, the platform makes a take-it-or-leave-it offer and the borrower can select to just accept or reject. If the borrower rejects, they exit the market (no outside option). Subsequently, the platform has to decide on a set of parameters ${ r,f}$ to satisfy the participation constraint of each the borrower and the banks subject to ${ underline{r},bar{f}}$. From the lienar utility specification, the borrower only accepts the loan in the event that they have a positive utility from it (since they’ll just reject and get $U_L = 0$). This enables us to define a maximum rate on the rate of interest parameter:

$$bar{r} = frac{V(S)}{S} -1 $$

Now that we all know the bounds for the free parameters $r$ and $f$, we will construct the maximization problem of the platform. The platform chooses a rate and fee parameter that satisfies the incentives of every participation agent but maximizes their very own net proceeds. Under this assumption, the platform maximizes:

$$ Pi_p = max_{r, f, B} f M(B,L) s.t. ;;; Pi_B geq 0 ;;;;;;;; U_L geq 0 $$

The bank chooses a set of rate of interest $r$, fees $f$, and variety of partner banks $B$ to maximise their fee rate and variety of matches. This problem has an analytical solution and could be solved in closed form to seek out the optimal parameters, or it may be solved numerically by grid-search or constrained optimization to seek out the set of parameters that maximizes $Pi_p$. I leave the issue of solving the closed-form solution for the readers.

To shut out this section, we define our equilibrium objects because the steady-state solution to our $.

What This Means for Business

This model reveals several key insights for platform strategy:

1. The selection of B: Increasing the variety of partner lenders increases the excess for the borrower. A method is thru a faster matching speed, which decreases the steady-state variety of unmatched borrowers. Since we modeled the borrower as leaving the market after the loan is rejected, this doesn’t put any downward pressure on the loan rate. Nonetheless, if we assumed that borrowers can re-enter the market after they reject a loan, then now they’ve the next outside option. This provides banks less bargaining power and lowers the utmost rate that borrowers are willing to be charged $bar{r}$. Nonetheless, increasing the variety of partner banks also decreases each banks’ profit per time (since per-bank profit falls with the variety of banks). This lowers the utmost amount the platform can charge for every transaction $bar{f}$, decreasing platform profit.

1. The selection of r: Selecting the proper $r$ involves determining whether the platform wants the banks or the borrowers to profit. In this easy model, the platform would select $r = bar{r}$ because it only must satisfy the borrower’s participation constraint and should not have to fret about entry conditions. Any increase to $r$ would allow the platform to extract more surplus from the trade through increasing fees. In a more complex model where the entry rate of borrower is positively correlated with their surplus, the optimal decision can be to shift a few of the surplus allocation to the borrowers to extend the per-period matching speed, which could increase total revenue for the platform. Finally, in a model with limited information (where the platform doesn’t know the true payoff of the borrower), the optimal rate of interest relies on an expectation of the valuation $mathbb{E}[V(S)]$ over the estimated distribution of borrowers. If there are differences across borrowers represented by $theta$, the expectation would change to be a conditional expectation over the expected borrower profile $mathbb{E}[V(S) | theta ]$. If the borrower profile is unknown (common in cold start cases), we will replace $theta$ with an ML-estimated version $hat{theta}$.

1. The selection of f: On this model, $f$ decides the allocation of trade surplus between the bank and the platform. A better fee increases the revenue for the platform and proportionally decrease the revenue for the banks. In point of fact, banks can decide to participate between different competing platforms, and their participation is dependent upon the revenue they expect to receive. This means that it is probably going optimal for the platform to allocate a few of the trade surplus towards banks to extend the possibilities of signing recent partners in later periods.

Final Remarks and Extensions

What We Haven’t Considered Yet

This basic model scratches the surface of platform dynamics. Real platforms cope with complexities we’ve intentionally ignored to maintain the maths tractable. For example, we assumed borrowers exit after rejection (to make the skin option 0), but in point of fact they’ll either stay out there, or visit a competitor platform. We also assumed that each banks and borrowers are similar, but banks could be diverse of their risk appetite, capital funding, and maturity preferences. Borrower scan also differ of their set of observed and latent features, impacting their probability of repayment, loan valuation, and loan size. This heterogeneity changes the matching problem from random project to sorted matching, where the platform needs to come to a decision which types should match with whom, which ties back to the worth proposition of the platform itself.

We’ve also ignored information asymmetry. Banks don’t perfectly observe default risk, borrowers don’t know their true creditworthiness, and platforms have limited insight into outside options of each parties. This creates opportunities for signaling (borrowers trying to seem creditworthy), screening (banks designing different reservation rates of interest for separate loan types), and mechanism design selections for the platform. Should a lending platform show borrowers all available rates or simply the most effective match? Should they reveal a borrower’s credit rating to banks or simply their proprietary risk assessment? Can revealing an excessive amount of information have a negative impact on match quality?

Extensions That Would Deepen Understanding

To make this framework operational, several natural extensions come to mind:

Dynamic Entry and Exit: Model how market conditions affect participation. When rates of interest rise, some borrowers drop out while others turn out to be desperate. Banks adjust their risk appetite and capital ratio based on regulatory changes and balance sheet constraints. Machine learning plays a big role here because the platform must forecast these flows and adjust fees/rates accordingly.
Competition Between Platforms: What happens when borrowers can concurrently search on Upstart, LendingClub, and Prosper? Multi-platform dynamics changes bargaining power and forces platforms to think deeply about how their decisions can impact the arrival flow rate and growth prospects. This might explain why some platforms give attention to speed (easy approval) while others emphasize higher rates. Understanding what area of interest each platform captures and which area of interest has unmet demand is critical to capturing a bigger piece of the pie.
Repute and Learning: Either side construct reputations over time, but provided that they continue to be on the platform to construct history. Banks that consistently offer competitive rates could attract more borrowers and receive the next matching ratio. Borrowers who repay builds a profile on the platform, improving the accuracy of their profile. As time goes on and more data is captured, the platform’s sorted matching efficiency is improved attributable to higher availability of signals. Modeling these dynamics would help understand customer lifetime value and judge whether the platforms should focus mainly on acquisition or retention.
Mechanism Design: As an alternative of take-it-or-leave-it offers and randomizing borrowers to the matched banks, platforms could run auctions where banks bid on borrowers. Alternatively, the platform could require posted prices where banks commit to rate schedules. Each mechanism has different implications for efficiency, revenue, and market thickness. The proper selection is dependent upon each regulatory constraints and the distribution of borrowers and banks.

From constructing models to modeling problems

This framework provides a strategic advantage since it forces you to take into consideration each first and second-order effects. Most data scientists optimize metrics in isolation, similar to reducing default rates, increasing conversion, and lower churn. But in all these markets, every model optimization affects all equilibrium objects. Lower default rates might mean a lower reservation rate for the bank, allowing the platform to capture more of the trade surplus through fees. If there’s borrower heterogentiy, higher matching probabilities might attract worse borrowers, resulting in a discount in average match quality.

The framework also helps discover which metrics actually matter. A lending platform could possibly accept negative margins on certain loans (loss leaders) if it keeps a high-value bank participating or have positive spillovers to different segments. Platforms might restrict borrower entry (or lower matches) even partner banks are already at high capital utilization. This sort of pondering should help industry data scientist move away from measurement for measurements’ sake and take a step back to take a look at the larger picture for whichever company they work for.

The platforms that win aren’t necessarily those who can predict repayment probability with 98% accuracy over ones with 93% accuracy, however the ones that understand the market dynamics their algorithms operate inside. This framework goals to maneuver your mindset away from constructing higher models to modeling the suitable problems. If you might have the chance to use this idea in your individual work, I’d love to listen to about it. Please don’t hesitate to succeed in out with questions, insights, or stories through my email or LinkedIn. If you might have any feedback on this text, please also be at liberty to succeed in out. Thanks for reading!

Prediction vs. Search Models: What Data Scientists Are Missing

The Core Challenge

Why This Matters for Data Scientists

The Economic Literature

A Practical Example: Lending Platforms

The Model Environment

Finding the Equilibrium

The Negotiation

What This Means for Business

Final Remarks and Extensions

What We Haven’t Considered Yet

Extensions That Would Deepen Understanding

From constructing models to modeling problems

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

CinePile 2.0 – making stronger datasets with adversarial refinement

Introducing HUGS – Scale your AI with Open Models

The Machine Learning “Advent Calendar” Day 23: CNN in Excel

A Deepdive into Aya Expanse: Advancing the Frontier of Multilinguality

Google DeepMind & DOE Partner on Genesis: AI for Science

Prediction vs. Search Models: What Data Scientists Are Missing

The Core Challenge

Why This Matters for Data Scientists

The Economic Literature

A Practical Example: Lending Platforms

The Model Environment

Finding the Equilibrium

The Negotiation

What This Means for Business

Final Remarks and Extensions

What We Haven’t Considered Yet

Extensions That Would Deepen Understanding

From constructing models to modeling problems

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.