Home Artificial Intelligence From Data Lakes to Data Mesh: A Guide to the Latest Enterprise Data Architecture 1. A Temporary History of Data Lakes 2. The Data Lake Monster 3. Introducing…Data Mesh! 4. The right way to Construct a Data Mesh 5. Final Words My Popular AI & Data Science articles

From Data Lakes to Data Mesh: A Guide to the Latest Enterprise Data Architecture 1. A Temporary History of Data Lakes 2. The Data Lake Monster 3. Introducing…Data Mesh! 4. The right way to Construct a Data Mesh 5. Final Words My Popular AI & Data Science articles

0
From Data Lakes to Data Mesh: A Guide to the Latest Enterprise Data Architecture
1. A Temporary History of Data Lakes
2. The Data Lake Monster
3. Introducing…Data Mesh!
4. The right way to Construct a Data Mesh
5. Final Words
My Popular AI & Data Science articles

Problem 3 — Fence-throwing

Dehghani calls the third and final mode of failure , which I prefer to think as leading to unproductive

Our hyper-specialised big data lake engineers working in the info lake are organisationally-siloed away from where the info originates and where it can be consumed.

Siloed hyper-specialised data platform team. Source: Z. Dehghani at MartinFowler.com (with permission)

This creates a poor incentive structure that doesn’t promote good delivery outcomes. Dehghani articulates this as…

“I personally don’t envy the lifetime of an information platform engineer. They should devour data from teams who don’t have any incentive in providing meaningful, truthful and proper data. They’ve little or no understanding of the source domains that generate the info and lack the domain expertise of their teams. They need to supply data for a various set of needs, operational or analytical, with no clear understanding of the applying of the info and access to the consuming domain’s experts.

What we discover are disconnected source teams, frustrated consumers fighting for a spot on top of the info platform team backlog and an over stretched data platform team.”

Data producers will ‘pack together’ a few of their data and throw it over the fence to the info engineers.

Your problem now! Good luck guys!

Overworked data engineers, who may or may not have done justice to the ingested data provided that they’re not data domain experts, will themselves throw some processed data out of the lake to serve downstream consumers.

Good luck, analysts and data scientists! Time for a fast nap after which I’m off to repair the fifty broken ETL pipelines on my backlog.

As you may see from Problems 2 and three, the challenges which have arisen from the info lake experiment are as much as technological.

Takeaways:

By federating data management to individual business domains, perhaps we could foster a culture of knowledge ownership and collaboration and

And hey, can we give these domains an actual stake in the sport?

Empower them to take in constructing strategic data assets by incentivising them to

In 2019, Dehghani proposed data mesh because the next-generation data architecture that embraces a decentralised approach to data management.

Her initial articles — here and here — generated significant interest within the enterprise data community that has since prompted many organisations worldwide to start their very own data mesh journey, including mine.

Moderately than pump data right into a centralised lake, data mesh federates data ownership and processing to that control and deliver , promoting easy accessibility and interconnectivity of knowledge across the whole organisation, enabling faster decision-making and promoting innovation.

Overview of knowledge mesh. Source: Data Mesh Architecture (with permission)

The info mesh dream is to create a foundation for extracting value from analytical data , with scale being applied to:

  • An ever-changing business, data and technology landscape.
  • Growth of knowledge producers and consumers.
  • Varied data processing requirements. A diversity of use cases demand a diversity of tools for transformation and processing. For example, real-time anomaly detection might leverage Apache Kafka; an NLP system for customer support often results in data science prototyping on Python packages like NLTK, image recognition leverages deep learning frameworks like TensorFlow & PyTorch; and the fraud detection team at my bank would like to process our big data with Apache Spark.

All these requirements have created technical debt for warehouses (in the shape of a mountain of unmaintainable ETL jobs) and a bottleneck for data lakes (because of the mountain of diverse work that’s squeezed through a small centralised data team).

Organisations eventually behold a threshold mountain of complexity where the technical debt outweigh the worth provided.

It’s a terrible situation.

To deal with these problems, Dehghani proposed that any data mesh implementation must embody with a purpose to realise the promise of scale, quality and usefulness.

The 4 Principles of Data Mesh. Source: Data Mesh Architecture (with permission)
  1. By placing data ownership within the hands of domain-specific teams, you . This approach enhances agility to changing business requirements and effectiveness in leveraging data-driven insights, which ultimately leads to higher and more revolutionary services and products, faster.
  2. Each business unit or domain is empowered to infuse product pondering to craft, own and improve quality and reusable — a self-contained and accessible data set treated as a product by the info’s producers. The goal is to publish and share data products across the info mesh to consumers sitting in other domains — regarded as on the mesh — in order that these strategic data assets will be leveraged by all.
  3. Empowering users with self-serve capabilities paves the way in which for accelerated data access and exploration. By providing a user-friendly platform equipped with the essential tools, resources, and services, you empower teams to turn into self-sufficient of their data needs. This democratisation of knowledge promotes faster decision-making and a culture of data-driven excellence.
  4. Centralised control stifles innovation and hampers agility. A federated approach ensures that decision-making authority is distributed across teams, enabling them to make autonomous selections when it counts. By striking the best balance between control and autonomy, you foster accountability, collaboration and innovation.

Of the 4 principles, data products are probably the most crucial. Consequently, we regularly see corporations execute their data product strategy in tandem with decentralising their data lake across their individual business domains.

Read my Explainer 101 on data products for all of the juicy details.

On the subject of executing an information mesh strategy…

For many corporations, the journey won’t be clean and tidy.

Constructing an information mesh won’t be a task relegated to a siloed engineering team toiling away within the basement until it’s able to deploy.

You’ll likely have to cleverly federate your existing data lake piece-by-piece until you reach an information platform that’s ‘sufficiently mesh’.

Think swapping out two aircraft engines for 4 smaller ones mid-flight, quite than constructing a latest plane in a pleasant shady hanger somewhere.

Or attempting to upgrade a road while keeping some lanes open to traffic, as an alternative of paving a latest one in parallel nearby and cutting the red ribbon once every little thing is good and dandy.

Constructing an information mesh is a considerable undertaking and also you’ll have to . Since it’s the business domains that may ultimately be in control of their very own end-to-end data affairs!

Full data mesh maturity may take a protracted time, because mesh is principally an organisational construct.

It’s just as much about operating models — in other words,because the technology itself, meaning cultural uplift and bringing people along for the journey is important.

You might want to teach the organisation the value of mesh and the way to use it.

Play your cards right, and over time your will morph right into a .

Some considerations for the Try datamesh-architecture.com for a deeper dive.

  • An information mesh architecture comprises a set of business domains, each with a who can perform cross-domain data evaluation on their very own. An often a part of the transformation office of the organisation — spreads the concept of mesh across the organisation and function advocates. They assist individual domains on a consultancy basis on their journey to turn into a ‘full member’ of the info mesh. The enabler team will comprise experts on data architecture, data analytics, data engineering and data governance.
  • Domains will ingest their very own operational data — which they sit very near and understand — and that will be . Data products are owned by the domain, who is responsible for its operations, quality and uplift during its entire lifecycle. Effective accountability to make sure effective data.
The sharing of knowledge products across the mesh. Source: Data Mesh Architecture (with permission)
  • Remember those ‘multicultural food days’ in school, where everyone brought their delicious dishes and shared them at a self-serve table? The teacher’s minimalist role was to oversee operations and ensure every little thing went easily. In the same vein, mesh’s newly streamlined central data team endeavour to supply and maintain a domain-agnostic ‘buffet table’ of diverse data products from which to self-serve. Business teams can perform their very own evaluation with little overhead and offer up their very own data products to their peers. A delicious data feast where everyone can be the chef.
  • . Each domain will self-govern their very own data and be empowered to walk on the beat of its own drum — like European Union member states. On certain matters where it is sensible to unite and standardise, they may strike agreements with other domains on global policies, equivalent to documentation standards, interoperability and security in a — just like the European Parliament —in order that individual domains can easily discover, understand, use and integrate data products available on the mesh.

Here’s the exciting bit — when will our mesh hit ?

This serves as a useful benchmark to aim for to attest that your data mesh journey has reached a threshold level of maturity.

A very good time to pop the champagne!

Data mesh is a comparatively latest idea, having only been invented around 2018 by architect .

It has gained significant momentum in the info architecture and analytics communities as an increasing variety of organisations grapple with the scalability problems of a centralised data lake.

By moving away from an where data is controlled by a single team and towards a decentralised model where data is owned and managed by the teams that use it probably the most, different parts of the organisation can work independently — with greater autonomy and agility — while still ensuring that the info is consistent, reliable and well-governed.

Data mesh promotes a culture of accountability, ownership and collaboration, where that’s proudly shared across the corporate in a seamless and controlled manner.

The aim is attaining a very scalable and versatile data architecture that aligns with the needs of recent organisations where data is central to driving business value and innovation.

Summarising the 4 Principles of Data Mesh. Credit: Z. Dehghani at MartinFowler.com (with permission)

My company’s own journey towards data mesh is anticipated to take a few years for the principal migration, and longer for full maturity.

We’re working on three major parts concurrently:

  • An uplift from our Cloudera stack on Microsoft Azure to native cloud services on Azure . More info here.
  • An initial array of foundational data products are being rolled out, which will be used and re-assembled in different mixtures like Lego bricks to form larger more worthwhile data products.
  • We’re decentralising our data lake to a goal state of at the least five nodes.

What a ride it has been. Once I began half a decade ago, we were just getting began constructing out our data lake using Apache Hadoop on top of on-prem infrastructure.

Countless challenges and invaluable lessons have shaped our journey.

Like several determined team, we fail fast and fail forward. Five short years later, we now have completely transformed our enterprise data landscape.

Who knows what things will seem like in one other five years? I look ahead to it.

Find me on Linkedin, Twitter & YouTube.

Gain to Medium here and directly support my writing.

  • AI Revolution: Fast-paced Intro to Machine Learning — here
  • ChatGPT & GPT-4: How OpenAI Won the NLU War — here
  • Generative AI Art: Midjourney & Stable Diffusion Explained — here
  • Power of Data Storytelling — Sell Stories, Not Data — here
  • From Data Warehouses & Data Lakes to Data Mesh — here
  • Data Warehouses & Data Modelling — a Quick Crash Course — here
  • From Data Lakes to Data Mesh: A Guide to Latest Architecture — here
  • Data Products: Constructing a Strong Foundation for Analytics — here
  • Cloud Computing 101: Harness Cloud for Your Business — here
  • Power BI — From Data Modelling to Stunning Reports — here
  • Machine Learning versus Mechanistic Modelling — here
  • Popular Machine Learning Performance Metrics Explained — here
  • Way forward for Work: Is Your Profession Secure in Age of AI — here
  • Beyond ChatGPT: Seek for a Truly Intelligence Machine — here
  • Regression: Predict House Prices using Python — here
  • Classification: Predict Worker Churn using Python — here
  • Python Jupyter Notebooks versus Dataiku DSS — here

LEAVE A REPLY

Please enter your comment!
Please enter your name here