Decisioning on the Edge: Policy Matching at Scale

-


This text was written in collaboration with César Ortega, whose insights and discussions helped shape the ideas presented here.


the best data product starts with sitting down with business partners to know day-to-day workflows, handoffs, and bottlenecks. In this text, we discuss a challenge that doesn’t require an advanced solution, just a straightforward optimization problem. It’s a great example of how basic tools can still solve high-value problems. Specifically, we give attention to optimizing the task of online insurance policies to trusted partners (independent insurance agencies: iia) at a worldwide insurance company.

Independent insurance agencies are privately owned intermediaries that sell insurance policies from multiple insurers. Unlike large insurance firms, they don’t design products, set prices, underwrite risk, or pay claims; as a substitute, they compare options across carriers, and place coverage that most closely fits the client’s needs, typically earning commissions for doing so. Here, the concept is to work together to deliver the very best value for each the agency and the client.

Reducing complexity

Optimization in the actual world is a spectrum. At one end are exact methods that may prove optimality, but they often could be at scale and may struggle as the issue grows in size and operational detail. At the opposite end are heuristics, starting from easy rule-based baselines which can be easy to clarify but hard to keep up as complexity grows (often living in large excel sheets), to more advanced metaheuristics that scale well computationally but could be harder to justify, audit, or debug.

 In practice, probably the most effective approach often sits in the center: pragmatic “good-enough” formulations, built with fastidiously chosen constraints that reflect each business rules and real operational limits as human workload and repair quality. 

The goal just isn’t theoretical perfection, but an answer that’s deliverable, comparable against baselines, and straightforward to iterate. With a modular structure and a staged modeling strategy, we are able to start easy, measure impact with KPIs: tangible (time to task, optimal agency selection, etc.) and intangible (avoid unfair concentrations of policies in a number of agencies, etc.), and evolve the system through small, protected improvements reasonably than waiting months for a textbook-optimal model.

Figure 1: Image by Creator.

That’s why we selected a light-weight optimization formulation. It captures the constraints that matter (capability, geographic eligibility, fairness, and bucket mix) and delivers a deterministic, auditable answer fast enough for real-time latency requirements. If needed, we are able to later extend the approach with decomposition techniques, stronger solvers, or heuristics without changing the system’s core contract.

The baseline

Historically, these digital policy-to-agency of assignments have been done manually, guided by non-standard criteria and individual judgment. While this approach sometimes works, this often resembled a : policies were distributed sequentially amongst available agencies (iia’s), with little consideration for differences in capability, expertise, or expected performance.

Figura 2. Round-robin is an easy heuristic that assigns each recent policy to the subsequent agency in a set rotation. Image by Creator.

While easy and seemingly fair, it often results in delays, missed opportunities, and uncertainty about which agency (iia) is the very best fit. The method also didn’t scale well, creating further task delays, and the outcomes didn’t consistently align with strategic goals reminiscent of profitability, quality, reproducibility, and transparency.

For that reason, we present how we solved a very important problem using a light-weight integer programming approach that matches incoming online insurance policies to agencies in real time. The tactic maximizes a productivity rating (reflecting how well an agency has performed prior to now) while balancing agency capability, fairness, and geographic admissibility constraints based on ZIP codes. We outline the mathematical formulation, the live-update logic, and the PuLP implementation. 

Figure 3. PuLP overview: an open-source Python library for formulating and solving linear and integer optimization problems. AI-generated illustration created by the writer with OpenAI.

What problem are we solving?

When a brand new online policy is purchased for a client, someone still has to make your mind up which agency should handle it. We depend on agencies because they add value beyond the same old, reminiscent of advocating at claim time, servicing changes and renewals, cross-selling, and more. Importantly, agencies also originate demand: they create recent clients (and consequently recent policies) into the funnel through their relationships and native presence, which compounds growth for the insurance company.

From a customer perspective, this matters since the agency is usually the primary point of contact: the standard and speed of agency (iia) service can shape the general experience, especially during high-stress moments like claims or urgent coverage changes.

Since agencies differ in licensing, geography, product strengths, sales reach, and day-to-day capability, the “best” agency can vary from moment to moment. An actual-time task optimization system routes each recent policy to eligible, available agencies which can be most probably to deliver value to each the business and the client, are treated fairly under clear rules, and are best positioned to drive future growth.


Good Old-Fashioned optimization

To create a transparent task process, it’s essential to think about broader business goals: reminiscent of ensuring the best agency handles the best kind of policy to maximise key performance indicators (KPIs) like policy volume and quality. It’s also essential that agencies understand how these decisions are made.

So, the implemented optimization algorithm should intelligently allocates policies to agencies based on KPIs, including the number and quality of policies they handle. As a substitute of counting on subjective or inconsistent human judgment, the algorithm uses real-time, data-driven decisions to optimize the policy task process efficiently and fairly.

The optimization model allocates policies to agencies based on measurable performance signals reasonably than subjective judgment. To make decisions reproducible, we translate agency performance right into a numeric value the optimizer can use. This is finished through productivity weights, where the important thing input is the swap ratio: a metric that captures how much value an agency brings per unit of policy it receives (for instance loss ratio, tenure, premium, cross-selling, etc.).

In practice, the swap ratio allows the model to distinguish agencies that consistently deliver strong outcomes from people who underperform. Higher-value policies can then be directed toward agencies which have demonstrated the power to handle them effectively, while still respecting capability limits, geographic eligibility, fairness requirements, and bucket-mix constraints.

Reasonably than counting on static rules, the system recalculates decisions as constraints, ensuring that assignments remain aligned with current operational capability and business priorities.

The system operates in two modes:

  • Batch mode: Optimizes based on historical allowances, providing a comprehensive review of past data to enhance future allocations.
  • Online mode: Re-optimizes with each recent incoming policy, including these recent policies within the optimization process, then updates the inventory and refines the batch optimization accordingly.

In essence, the batch mode handles historical data to determine baseline rules and patterns, while the net mode ensures real-time adaptability by dynamically adjusting to recent policies and conditions. This approach helps maintain optimal performance in a consistently changing environment.


The Solution: Optimization Algorithm

Given a set of agencies and an incoming flow of policies , we would like to resolve what number of policies to assign to every agency and every policy category (Gold, Silver, Bronze) in order that we maximize total productivity while adhering to certain constraints (agency capability, ZIP code eligibility, , total count, penalties, etc.).

Objetive function:

  • is the decision variable within the optimization problem and represents the variety of policies assigned to agency and category we only manage positive integer values only.
  • A: set of agencies (size |A| = ); a∈A.
  • C: set of categories {Gold, Silver, Bronze} (|C| = = 3); ∈ .
  • The productivity weights is one number per agency that estimates the advantage of sending yet another policy to that agency. That is calculated with the time the agency have over the swap ratio

Rules we must respect (constraints):

Logical constraints:

Logical constraints are those required for the model to be mathematically well-defined no matter business context (e.g., variables are integers and totals balance).

  1. Integrality & Non-negativity: you’ll be able to’t send negative or fractional policies.

2. Global conservation: the full variety of policies assigned across all agencies and buckets must equal the full inventory available for task on this run (the sum of all agency capacities).

Business constraints:

Business constraints encode domain policy decisions or operational rules (e.g., per‑agency capability, ZIP admissibility, bucket mix, online floors) that would change if the business rules change.

  1. Per-agency capability: an agency cannot receive more policies than it might probably currently handle (), which corresponds to the sum of the rows within the policy task matrix.

2. ZIP admissibility: agencies are only licensed or authorized to service policies in specific geographic areas. 

If a ZIP is inadmissible for agency a, lock its row total

By enforcing ZIP eligibility within the optimization, we ensure every task is operationally feasible, protecting service quality, because agencies are strongest within the regions where they’ve local presence and expertise.

3. Bucket bounds: business control that keep the monthly allocation balanced across policy tiers. 

Without them, the optimizer might push almost every thing into probably the most profitable tier, which might create risk concentration and operational strain. By setting minimums and maximums per bucket, you implement a healthy mix that reflects risk appetite, service capability, and strategic targets.

What’s Not within the batch

Batch mode is a full re‑optimization on a set inventory. It finds the very best baseline allocation without reacting to a single recent policy event. For that reason, we exclude the next “live” constraints which can be only needed when a brand new policy arrives:

  • Per‑agency floors from the previous allocation. Floors are a web based safeguard that stops any agency from losing policies when a brand new one arrives. In batch we’re computing the baseline itself, so there’s no “previous” baseline to guard.
  • ZIP lock is a : when a single recent policy arrives, if that policy’s ZIP is not allowed for agency A, we freeze agency A at cell level (Gold/Silver/Bronze) at its previous cell values so the brand new policy can’t be assigned there and we don’t move any existing policies away.
  • No headroom (“+1”) trick. Headroom is utilized in online mode to maintain feasibility when adding exactly one recent policy. Batch mode doesn’t add a single policy; it allocates the whole inventory directly.
  • Bucket bounds still apply online: each recent policy must keep Gold/Silver/Bronze totals inside their min/max. These restrictions are updated on a monthly basis or as business requirements change.

Why this works

By separating the method into batch (global balance) and online (local adjustment), the system achieves each stability and responsiveness. Batch optimization provides a consistent, auditable reference point, while live decisioning handles real-time arrivals without disrupting the general structure. This mix enables fast operational decisions while preserving fairness, capability control, and alignment with strategic targets.

E2E Implementation

The tip-to-end process involves greater than encoding rules in an optimization model. In our AWS setup, Airflow orchestrates scheduled data pipelines that refresh intermediate tables on every day, weekly, and monthly cadences. These jobs pull upstream data, construct curated datasets and live inventory tables, and store them in S3. The Optimization service reads the newest inputs from S3 and, when needed, calls a SageMaker endpoint to attain candidates and choose the very best agency under the capability, fairness, and ZIP-code constraints described earlier. External applications send requests through an HTTPS endpoint on API Gateway, which routes them via middleware chargeable for authentication, validation, and request transformation before invoking the Optimization service (and SageMaker, if required). The response (containing the chosen agency and decision metadata) is returned to the Contact Center and ultimately the tip user. Finally, outcomes and logs are written back to S3, feeding Airflow-driven monitoring and retraining, and Jenkins redeploys updated components to shut the loop.


Toy example

To exemplify the mechanics of the unique production implementation in a simplified and self-contained manner we create an artificial, runnable toy example demonstrating the core logic behind policy-to-agency task using linear integer programming with the PuLP library in Python.

The instance sets up a small scenario with 4 agencies and three policy categories (“Gold,” “Silver,” and “Bronze”). Productivity scores and capability limits are assigned for every agency, together with constraints reminiscent of ZIP code eligibility and minimum/maximum policy mix per category. The goal is to maximise the full productivity rating while respecting these constraints.

While the instance is synthetic and uses randomly generated weights and capacities, it effectively illustrates the elemental optimization logic and workflow, including variable construction, constraint enforcement, and solution interpretation. This approach could be directly scaled and adapted to real-world data and business constraints as demonstrated in the total implementation.

Table 1: Baseline task (batch) and online task after one recent policy.

In Table 1, we illustrate a straightforward iteration. Batch mode first computes a baseline monthly plan that allocates the initial inventory. Online mode then simulates incoming policies one by one toward a goal monthly total; each arrival triggers a re-optimization that preserves existing allocations and assigns only the incremental policy to an eligible agency (e.g., respecting ZIP admissibility). In this instance, the brand new policy is a high-value (Gold) policy and its ZIP is admissible for A1, so the increment goes to A1. If the ZIP were inadmissible for A1, the policy could be routed to the very best admissible agency as a substitute. This process repeats until the monthly bucket goal is reached.

Code

The code is obtainable on this repository: Link to the repository

To run the experiments, arrange a Python ≥3.11 environment with the required libraries (e.g., pulp, etc.). It is strongly recommended to make use of a virtual environment (via venv or conda) to maintain dependencies isolated.


Conclusion

In comparison with a round-robin baseline that assigns policies with no intelligence, our approach uses a productivity matrix derived from an swap ratio to route policies where they’re expected to create probably the most value. The optimization balances tangible metrics (the measurable value and capability each agency can deliver) with intangible considerations (fairness, stability, and the trust agencies place in a predictable allocation process). Briefly, it replaces a blind rotation with a transparent, auditable decision rule that reflects each performance and operational constraints.

By making policy assignments more transparent and predictable, we’ve built trust and collaboration. Agencies (iia’s) now understand how decisions are being made, which has increased their confidence in the method.

This instance shows how even a comparatively small optimization problem can generate meaningful improvements. By starting with a straightforward, well-defined formulation, we create a solid foundation that delivers immediate value while enabling future evolution. The identical framework could be prolonged through incremental iterations, incorporating richer signals, and more advanced decision logic. In practice, the best impact often comes not from constructing a fancy system upfront, but from starting easy and improving repeatedly because the business learns and the info matures.

References

[1]PuLP documentation, “PuLP 3.3.0 documentation.” COIN-OR. https://coin-or.github.io/pulp/primary/includeme.html

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x