Introduction
Within the “” (!), understanding data and AI architecture has never been more critical. Nevertheless something many leaders overlook is the importance of information team structure.
While lots of you reading this probably discover as data team, something most don’t realise is how limiting that mindset might be.
Indeed, different team structures and skill requirements significantly impact an organisation’s ability to really use Data and AI to drive meaningful results. To grasp this, it’s helpful to consider an analogy.
Imagine a two-person household. John works from home and Jane goes to the office. There’s a bunch of house admin Jane relies on John to do, which is so much easier since he’s the one at home more often than not.
Jane and John have kids and after they’re grown up a bit John has twice as much admin to do! Thankfully, the youngsters are trained to do the fundamentals; they’ll wash up, tidy and even occasionally do a little bit of hoovering with some coercion.
As the youngsters grow up, John’s parents move in. They’re pretty old, so John takes care of them, but fortunately, the youngsters are principally self-sufficient at this point. Over time John’s role has modified quite a bit! But he’s all the time made it one completely satisfied, family unit — due to John and Jane.
Back to data — John is a bit like the info team, and everybody else is a website expert. They depend on John, but in alternative ways. This has modified so much over time, and if it hadn’t it might have been a disaster.
In the remainder of this text, we’ll explore John’s journey from a Centralised, through Hub-and-spoke to a Platform mesh-style data team.
Centralised teams
A central team is liable for a variety of things that shall be familiar to you:
- Core data platform and architecture: the frameworks and tooling used to facilitate Data and AI workloads.
- Data and AI engineering: centralising and cleansing datasets; structuring unstructured data for AI workloads
- BI: constructing dashboards to visualise insights
- AI and ML: the training and deployment of models on the aforementioned clean data
- Advocating for the worth of information and training people to grasp methods to use BI tools
That is a variety of work for a couple of people! In truth, it’s practically unattainable to nail all of this without delay. It’s best to maintain things small and manageable, specializing in a couple of key use cases and leveraging powerful tooling to get a head start early.
You would possibly even get a nanny or au Pair to assist with the work (on this case — consultants).
But this pattern has flaws. It is simple to fall into the silo trap, a scenario where the central team change into an enormous bottleneck for Data and AI requests. Data Teams also need to accumulate domain knowledge from domain experts to effectively answer requests, which can also be time-consuming and hard.
A technique out is to expand the team. More people means more output. Nevertheless, there are higher more modern approaches that could make things go even faster.
But there is simply one John. So what can he do?

Partially decentralised or hub and spoke
The partially decentralised setup is a sexy model for medium-sized organisations or small, tech-first ones where there are technical skills outside of the info team.
The best form has the info team maintaining BI infrastructure, but not the content itself. That is left to ‘power users’ that take this into their very own hands and construct the BI themselves.
This, after all, runs into every kind of issues, corresponding to the silo trap, data discovery, governance, and confusion. Confusion is particularly painful when people who find themselves told to self-serve attempt to fail because of a lack of awareness of the info.
An increasingly popular approach is for extra layers of the stack to be opened up. There may be the rise of the analytics engineer and data analysts are increasingly taking up more responsibility. This includes using tools, doing data modelling, constructing end-to-end pipelines, and advocating to the business.
This has led to enormous problems when implemented incorrectly. You wouldn’t let your five-year-old son take care of the care of your elders and handle the home unattended.
Specifically, a scarcity of basic data modelling principles and data warehouse engines results in model sprawl and spiralling costs. There are two classic examples.

One is when multiple people attempt to define the identical thing, corresponding to revenue. marketing, finance, and product all have a distinct version. This results in inevitable arguments at quarterly business reviews when every department reports with a distinct number — evaluation paralysis.
The opposite is rolling counts. Let’s say finance wants revenue for the month, but product desires to know what it’s on a rolling seven-day basis. “That’s easy,” says the analyst. “I’ll just create some materialised views with these metrics in them”.
As any data engineer knows, this rolling counts operation is pretty expensive, especially if the granularity must be by day or hour, since you then need a calendar to ‘fan out’ the model. Before you understand it there are rolling_30_day_sales
, rolling_7_day_sales
, rolling_45_day_sales
and so forth. These models cost an order of magnitude greater than was required.
Simply asking for the bottom granularity required (every day), materialising that, and creating views downstream can solve this problem but would require some central resource.
An early Hub and Spoke model will need to have a transparent delineation of responsibility if the knowledge outside the info team is young or juvenile.

As teams grow, legacy, code-only frameworks like Apache Airflow also give rise to an issue: a scarcity of visibility. People outside the info team looking for to grasp what’s going shall be reliant on additional tools to grasp what happens end-to-end, since legacy UIs don’t aggregate metadata from different sources.
It’s imperative to surface this information to domain experts. How over and over have you ever been told the ‘data doesn’t look right’, only to grasp after tracing all the things manually that it was a problem on the info producer side?
By increasing visibility, domain experts are connected on to owners of source data or processes, which allows fixes to be faster. This removes unnecessary load, context switching, and tickets for the info team.
Hub and spoke (pure)
A pure hub and spoke is a bit like delegating your teenage children with specific responsibilities inside clear guardrails. You don’t just give them tasks to do like taking the bins out and cleansing their room — you ask for what you wish, like a “clean and tidy room,” and also you trust them to do it. Incentives work well here.
In a pure hub and spoke approach, the info team administers the platform and lets others use it. They construct the frameworks for constructing and deploying AI and Data pipelines, and manage access control.
Domain experts can construct stuff end-to-end in the event that they have to. This implies they’ll move data, model it, orchestrate the pipeline, and activate it with AI or dashboards as they see fit.
Often, the central team may even do a little bit of this. Where data models across domains are complex and overlapping, they need to almost all the time take ownership of delivering core data models. The tail mustn’t wag the dog.

This starts to resemble an information product mindset — while a finance team could take ownership for investing and cleansing ERP data, the central team would own a vital data products like the purchasers table or invoices table.
This structure could be very powerful as it is extremely collaborative. It often works provided that domain teams have a fairly high degree of technical proficiency.
Platforms that allow use of code and no-code together are really useful here, otherwise a tough technical dependency on the central team will all the time exist.
One other characteristic of this pattern is training and support. The central team or hub will spend a while supporting and upskilling the spokes to construct AI and Data workflows efficiently inside guardrails.
Again, providing visibility here is tough with legacy orchestration frameworks. Central teams shall be burdened with keeping metadata stores up-to-date, like Data Catalogs, so business users can understand what is occurring.
The choice — upskilling domain experts to have deep python expertise learning frameworks with steep learning curves, is even harder to tug off.
Platform mesh/data product
The natural endpoint in our theoretical household journey takes us to the much-criticised Data Mesh or Platform Mesh approach.
On this household, everyone is anticipated to know what their responsibilities are. Children are all grown up and might be relied on to maintain the home so as and take care of its inhabitants. There may be close collaboration and everybody works together seamlessly.
Sounds pretty idealistic, don’t you’re thinking that!?
In practice, it’s rarely this easy. Allowing satellite teams to make use of their very own infrastructure and construct whatever they need is a surefire solution to lose control and slow things down.
Even in the event you were to standardise tooling across teams, best practices would still suffer.
I’ve spoken to countless teams in massive organisations corresponding to retail chains or airlines, and avoiding a mesh is just not an option because multiple business divisions rely upon one another.
These teams use different tools. Some leverage Airflow instances and legacy frameworks built by consultants years ago. Others use the newest tech and a full, bloated, Modern Data Stack.
All of them struggle with the identical problem; collaboration, communication, and orchestrating flows across different teams.
Implementing a single overarching platform for constructing Data and AI workflows here can assist. A unified control plane is sort of like an orchestrator of orchestrators, that aggregates metadata across different places and shows end to finish lineage across domains.
Naturally it makes for an efficient control plane where anyone can gather to debug failed pipelines, communicate, and recuperate — all without counting on a central Data Engineering Team who would otherwise be a bottleneck.
There are clear analogies for this in software engineering. Often, code leads to logs which can be collated by a single tool corresponding to DataDog. These platforms provide a single place to see all the things happening (or not happening), alerts, and collaboration for incident resolution.
Summary
Organisations are like families. As much as we like the concept of 1, big, completely satisfied, self-sufficient family, there are sometimes responsibilities we want to bear to make things work out initially.
As they mature, members catch up with to independence, like John’s kids. Others find their place as dependent but loyal stakeholders, like John’s parents.
Organisations are not any different. Data Teams are maturing away from do-ers in Centralised Teams to Enablers in Hub and Spoke architectures. Eventually, most organisations can have dozens if not a whole lot of people who find themselves pioneering Data and AI workflows in their very own spokes.
Once this happens, it’s likely that how Data and AI is utilized in small, agile organisations will resemble the complexity of much larger enterprises where collaboration and orchestration across different teams is inevitable.
Understanding where organisations are in relation to those patterns is imperative. Attempting to force a Data-as-Product mindset on an immature company, or sticking to a big central team in a big and mature organisation will lead to disaster.
Good luck 🍀