From Monolith to Contract-Driven Data Mesh

-

, the move from a standard data warehouse to Data Mesh feels less like an evolution and more like an identity crisis.

At some point, every little thing works (possibly “works” is a stretch, but everybody knows the lay of the land) The subsequent day, a brand new CDO arrives with exciting news: “We’re moving to Data Mesh.”And suddenly, years of rigorously designed pipelines, models, and conventions are questioned.

In this text, I need to step away from theory and buzzwords and walk through a practical transition, from a centralised data “monolith” to a contract-driven Data Mesh, using a concrete example: website analytics.

The standardized data contract becomes the critical enabler for this transition. By adhering to an open, structured contract specification, schema definitions, business semantics, and quality rules are expressed in a consistent format that ETL and Data Quality tools can interpret directly. Since the contract follows a regular, these external platforms can programmatically generate tests, implement validations, orchestrate transformations, and monitor data health without custom integrations.

The contract shifts from static documentation to an executable control layer that seamlessly integrates governance, transformation, and observability. The Data Contract is basically the glue that holds the integrity of the Data Mesh.

Why traditional data warehousing becomes a monolith

When people hear “monolith”, they often consider bad architecture. But most monolithic data platforms didn’t start that way, they evolved into one.

A standard enterprise data warehouse typically has:

  • One central team liable for ingestion, modelling, quality, and publishing
  • One central architecture with shared pipelines and shared patterns
  • Tightly coupled components, where a change in a single model can ripple all over the place
  • Slow change cycles, because demand all the time exceeds capability
  • Limited domain context, as modelers are sometimes far faraway from the business
  • Scaling pain, as more data sources and use cases arrive

This isn’t incompetence, it’s a natural final result of centralisation and years of unintended consequences. Eventually, the warehouse becomes the bottleneck.

What Data Mesh actually changes (and what it doesn’t)

Data Mesh is usually misunderstood as “no more warehouse” or “everyone does their very own thing.”

In point of fact, it’s a network shift, not necessarily a technology shift.

At its core, Data Mesh is built on 4 pillars:

  1. Domain ownership
  2. Data as a Product
  3. Self-serve data platform
  4. Federated governance

The important thing difference is that as a substitute of 1 big system owned by one team, you get many small, connected data products, owned by domains, and linked together through clear contracts.

And that is where data contracts change into the quiet hero of the story.

Data contracts: the missing stabiliser

Data contracts borrow a well-known idea from software engineering: API contracts, applied to data.

They were popularised within the Data Mesh community between 2021 and 2023, with contributions from people and projects reminiscent of:

  • Andrew Jones, who introduced the term data contract widely through blogs and talks and his book, which was published in 20231
  • Chad Sanderson (gable.ai)
  • The Open Data Contract Standard, which was introduced by the Bitol project

An information contract explicitly defines the agreement between a knowledge producer and a knowledge consumer.

The instance: website analytics

Let’s ground this with a concrete scenario. 

Imagine an internet retailer, an internet toy store. The business desires to analyse the user behaviour on our website. 

PlayNest home page (AI generated)

There are two foremost departments which can be relevant to this exercise. Customer Experience,  which is liable for the user journey on our website; How the shopper feels once they are browsing our products. 

Then there may be the Marketing domain, who make campaigns that take users to our website, and  ideally make them fascinated about buying our product. 

There may be a natural overlap between these two departments. The boundaries between domains are sometimes fuzzy.

On the operational level, once we discuss web sites, you capture things like:

  • Visitors 
  • Sessions
  • Events
  • Devices
  • Browsers
  • Products

A conceptual model for this instance could seem like this:

From a marketing perspective, nonetheless, no one wants raw events. They need:

  • Marketing leads
  • Funnel performance
  • Campaign effectiveness
  • Abandoned carts
  • Which kind of products people clicked on for retargeting etc.

And from a customer experience perspective, they need to know:

  • Frustration scores
  • Conversion metrics (For instance what number of users created wishlists, which signals they’re fascinated about certain products, a kind of conversion from to user)

The centralised (pre-Mesh) approach

I’ll use a Medallion framework for instance how this may be inbuilt a centralised lakehouse architecture. 

  • Bronze: raw, immutable data from tools like Google Analytics
  • Silver: cleaned, standardized, source-agnostic models
  • Gold: curated, business-aligned datasets (facts, dimensions, marts)

Here within the Bronze layer, the raw CSV or JSON objects are stored in, for instance, an Object store like S3 or Azure Blob. The central team is liable for ingesting the information, ensuring the API specifications are followed and the ingestion pipelines are monitored.

Within the Silver layer, the central team begins to wash and transform the information. Perhaps the information modeling chosen was Data Vault and thus the information is standardised into specific data types, business objects are identified and certain similar datasets are being conformed or loosely coupled. 

Within the Gold layer, the true end-user requirements are documented in story boards and the centralised IT teams implement the scale and facts required for different domains’ analytical purposes.

Let’s now reframe this instance, moving from a centralised operating model to a decentralised, domain-owned approach.

Website analytics in a Data Mesh

A typical Data Mesh data model may very well be depicted like this:

A Data Product is owned by a Domain, with a particular type, and data is available in via input ports and goes out via output ports. Each port is governed by a knowledge contract.

As an organisation, if you could have chosen to go along with Data Mesh you’ll consistently have to make your mind up between the next two approaches:

Do you organise your landscape with these re-usable constructing blocks where logic is consolidated, OR:

Do you let all consumers of the information products resolve for themselves how you can implement it, with the danger of duplication of logic?

People have a look at this they usually tell me it is clear. In fact it’s best to select the primary option because it is the higher practice, and I agree. Except that in point of fact the primary two questions that will likely be asked are:

  • Who will own the foundational Data Product?
  • Who can pay for it?

These are fundamental questions that usually hamper the momentum of Data Mesh. Because you possibly can either overengineer it (having a lot of reusable parts, but in so doing hampering autonomy and escalate costs), or create a network of many little data products that don’t speak to one another. We would like to avoid each of those extremes.

For the sake of our example, let’s assume that as a substitute of each team ingesting Google Analytics independently, we create a number of shared foundational products, for instance Website User Behaviour and Products.

These products are owned by a particular domain (in our example it’s going to be owned by Customer Experience), they usually are liable for exposing the information in standard output ports, which must be governed by data contracts. The entire idea is that these products needs to be reusable within the organisation identical to external data sets are reusable through a standardised API pattern. Downstream domains, like Marketing, then construct Consumer Data Products on top.

Website User Behaviour Foundational Data Product

  • Designed for reuse
  • Stable, well-governed
  • Often built using Data Vault, 3NF, or similar resilient models
  • Optimised for change, not for dashboards
Website user behaviour in our Data Product model
Website user behaviour technical implementation

The 2 sources are treated as input ports to the foundational data product.

The modelling techniques used to construct the information product is again open to the domain to make your mind up however the motivation is for re-usability. Thus a more flexible modelling technique like Data Vault I even have often seen getting used inside this context.

The output ports are then also designed for re-usability. For instance, here you possibly can mix the Data Vault objects into an easier-to-consume format OR for more technical consumers you possibly can simply expose the raw data vault tables. These will simply be logically split into different output ports. You can also resolve to publish a separate output to be exposed to LLM’s or autonomous agents.

Marketing Lead Conversion Metrics Consumer Data Product

  • Designed for specific use cases
  • Shaped by the needs of the consuming domain
  • Often dimensional or highly aggregated
  • Allowed (and expected) to duplicate logic if needed
Marketing Lead conversion metrics in our Data Product model
Marketing Leads Conversion metrics technical implementation

Here I illustrate how we go for using other foundational data products as input ports. Within the case of the Website user behaviour we go for using the normalised Snowflake tables (since we wish to maintain constructing in Snowflake) and create a Data Product that is prepared for our specific consumption needs.

Our foremost consumers will likely be for analytics and dashboard constructing so choosing a Dimensional model is smart. It’s optimised for one of these analytical querying inside a dashboard.

Zooming into Data Contracts

The Data Contract is basically the glue that holds the integrity of the Data Mesh. The Contract shouldn’t just specify a number of the technical expectations but additionally the legal and quality requirements and anything that the buyer could be fascinated about. 

The Bitol Open Data Contract Standard2 set out to handle a number of the gaps that existed with the seller specific contracts that were available in the marketplace. Namely a shared, open standard for describing data contracts in a way that’s human-readable, machine-readable, and tool-agnostic.

Why a lot give attention to a shared standard?

  1. Shared language across domains

When every team defines contracts in another way, federation becomes unattainable.

A regular creates a common vocabulary for producers, consumers, and platform teams.

  1. Tool interoperability

An open standard allows data quality tools, orchestration frameworks, metadata platforms and CI/CD pipelines to all devour the identical contract definition, as a substitute of every requiring its own configuration format.

  1. Contracts as living artifacts

Contracts shouldn’t be static documents. With a regular, they will be versioned, validated routinely, tested in pipelines and compared over time. This moves contracts from “documentation” to enforceable agreements.

  1. Avoiding vendor lock-in

Many vendors now support data contracts, which is great, but without an open standard, switching tools becomes expensive.

The ODCS is a YAML template that features the next key components:

  1. Fundamentals – Purpose, ownership, domain, and intended consumers
  2. Schema – Fields, types, constraints, and evolution rules
  3. Data quality expectations – Freshness, completeness, validity, thresholds
  4. Service-level agreements (SLAs)  – Update frequency, availability, latency
  5. Support and communication channels – Who to contact when things break
  6. Teams and roles – Producer, owner, steward responsibilities
  7. Access and infrastructure – How and where the information is exposed (tables, APIs, files)
  8. Custom domain rules – Business logic or semantics that buyers must understand
Sample ODCS Data Contract for Website User behaviour

Not every contract needs every section — however the structure matters, since it makes expectations explicit and repeatable.

Data Contracts enabling interoperability

Our consumer data product within the context of information contracts and third party tools

In our example now we have a knowledge contract on the input port (Foundational data product) in addition to the output port (Consumer data product).  You must implement these expectations as seamlessly as possible, just as you’d with any contract between two parties. Because the contract follows a standardised, machine-readable format, you possibly can now integrate with third party ETL and data quality tools to implement these expectations.

Platforms reminiscent of dbt, SQLMesh, Coalesce, Great Expectations, Soda, and Monte Carlo can programmatically generate tests, implement validations, orchestrate transformations, and monitor data health without custom integrations. A few of these tools have already announced support for the Open Data Contract Standard. 

LLMs, MCP servers and Data Contracts

Through the use of standardised metadata, including the information contracts, organisations can safely make use of LLMs and other agentic AI applications to interact with their crown jewels, the information. 

Using a MCP server as translation layer between users, LLM’s and our data assets

So in our example, let’s assume Peter from wants to envision what the highest most visited products are:

Sample Claude interaction using distant MCP server

That is enough context for the LLM to make use of the metadata to find out which data products are relevant, but additionally to see that the user doesn’t have access to the information. It may now determine who and how you can request access.

Once access is granted:

Query executed to retrieve results

The LLM can interpret the metadata and create the query that matches the user request.

Ensuring autonomous agents and LLMs have strict guardrails under which to operate will allow the business to scale their AI use cases.

Multiple vendors are rolling out MCP servers to offer a well structured approach to exposing your data to autonomous agents. Forcing the interfacing to work through metadata standards and protocols (reminiscent of these data contracts) will allow safer and scalable roll-outs of those use cases.

The MCP server provides the toolset and the guardrails for which to operate in. The metadata, including the information contracts, provides the policies and enforceable rules under which any agent may operate. 

For the time being there may be a tsunami of AI use cases being requested by business. Most of them are currently still not adding value. Now now we have a first-rate opportunity to take a position in establishing the proper guardrails for these projects to operate in. There’ll come a critical mass moment when the worth will come, but first we’d like the constructing blocks.


I’ll go so far as to say this: a Data Mesh without contracts is just decentralised chaos. Without clear, enforceable agreements, autonomy turns into silos, shadow IT multiplies, and inconsistency scales faster than value. At that time, you haven’t built a mesh, you’ve distributed disorder. You would possibly as well revert to centralisation.

Contracts replace assumption with accountability. Construct small, connect smartly, govern clearly — don’t mesh around.


[1] Jones, A. (2023). Driving data quality with data contracts: A comprehensive guide to constructing reliable, trusted, and effective data platforms. O’Reilly Media.
[2] Bitol. (n.d.). Open data contract standard (v3.1.0). Retrieved February 18, 2026, from https://bitol-io.github.io/open-data-contract-standard/v3.1.0/

All images in this text was created by the creator

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x