Federated Learning, Part 1: The Basics of Training Models Where the Data Lives

I the concept of federated learning (FL) through a comic by Google in 2019. It was a superb piece and did a fantastic job at explaining how products can improve without sending user data to the cloud. Recently, I actually have been wanting to grasp the technical side of this field in additional detail. Training data has turn into such a vital commodity because it is crucial for constructing good models but a whole lot of this gets unused since it is fragmented, unstructured or locked inside silos.

As I began exploring this field, I discovered the Flower framework to be essentially the most straightforward and beginner-friendly strategy to start in FL. It’s open source, the documentation is evident, and the community around it is extremely energetic and helpful. It’s certainly one of the rationale for my renewed interest on this field.

This text is the primary a part of a series where I explore federated learning in additional depth, covering what it’s, the way it is implemented, the open problems it faces, and why it matters in privacy-sensitive settings. In the following instalments, I’ll go deeper into practical implementation with the Flower framework, discuss privacy in federated learning and examine how these ideas extend to more advanced use cases.

When Centralised Machine learning just isn’t ideal

We all know AI models depend upon large amounts of information, yet much of essentially the most useful data is sensitive, distributed, and hard to access. Think of information inside hospitals, phones, cars, sensors, and other edge systems. Privacy concerns, local rules, limited storage, and network limits make moving this data to a central place very difficult and even unattainable. Consequently, large amounts of precious data remain unused. In healthcare, this problem is particularly visible. Hospitals generate tens of petabytes of information yearly, yet studies estimate that as much as 97% of this data goes unused.

Traditional machine learning assumes that every one training data will be collected in a single place, normally on a centralized server or data center. This works when data will be freely moved, however it breaks down when data is private or protected. In practice, centralised training also will depend on stable connectivity, enough bandwidth, and low latency, that are difficult to ensure in distributed or edge environments.

In such cases, two common decisions appear. One option is to not use the information in any respect, which suggests precious information stays locked inside silos.

The opposite option is to let each local entity train a model by itself data and share only what the model learns, while the raw data never leaves its original location. This second option forms the premise of federated learning, which allows models to learn from distributed data without moving it. A widely known example is Google Gboard on Android, where features like next-word prediction and Smart Compose run across tons of of hundreds of thousands of devices.

Federated Learning: Moving the Model to the Data

Federated learning will be regarded as a collaborative machine learning setup where training happens without collecting data in a single central place. Before taking a look at how it really works under the hood, let’s see just a few real-world examples that show why this approach matters in high-risk settings, spanning domains from healthcare to security-sensitive environments.

Healthcare

In healthcare, federated learning enabled early COVID screening through Curial AI, a system trained across multiple NHS hospitals using routine vital signs and blood tests. Because patient data couldn’t be shared across hospitals, training was done locally at each site and only model updates were exchanged. The resulting global model generalized higher than models trained at individual hospitals, especially when evaluated on unseen sites.

Medical Imaging

Federated learning can also be being explored in medical imaging. Researchers at UCL and Moorfields Eye Hospital are using it to fine-tune large vision foundation models on sensitive eye scans that can’t be centralized.

Defense

Beyond healthcare, federated learning can also be being applied in security-sensitive domains similar to defense and aviation. Here, models are trained on distributed physiological and operational data that must remain local.

Several types of Federated Learning

At a high-level, Federated learning will be grouped into just a few common types based on who the clients are and how the information is split.

• Cross-Device vs Cross-Silo Federated Learning

Cross-device federated learning involves use of many purchasers which can go as much as hundreds of thousands, like personal devices or phones, each with a small amount of local data and unreliable connectivity. At a given time, nonetheless, only a small fraction of devices take part in any given round. Google Gboard is a typical example of this setup.

Cross-silo federated learning, then again, involves a much smaller variety of clients, normally organizations like hospitals or banks. Each client holds a big dataset and has stable compute and connectivity. Most real-world enterprise and healthcare use cases appear like cross-silo federated learning.

• Horizontal vs Vertical Federated Learning

Visualization of Horizontal and Vertical Federated learning strategies

Horizontal federated learning describes how data is split across clients. On this case, all clients share the identical feature space, but each holds different samples. For instance, multiple hospitals may record the identical medical variables, but for various patients. That is essentially the most common type of federated learning.

Vertical federated learning is used when clients share the identical set of entities but have different features. For instance, a hospital and an insurance provider may each have data in regards to the same individuals, but with different attributes. Training, on this case requires secure coordination because feature spaces differ, and this setup is less common than horizontal federated learning.

How Federated Learning works

Federated learning follows a straightforward, repeated process coordinated by a central server and executed by multiple clients that hold data locally, as shown within the diagram below.

Training in federated learning proceeds through repeated federated learning rounds. In each round, the server selects a small random subset of clients, sends them the present model weights, and waits for updates. Each client trains the model locally using stochastic gradient descent, normally for several local epochs by itself batches, and returns only the updated weights. At a high level it follows the next five steps:

Initialisation

A world model is created on the server, which acts because the coordinator. The model could also be randomly initialized or start from a pretrained state.

2. Model distribution

In each round, the server selects a set of clients(based on random sampling or a predefined strategy) which participate in training and sends them the present global model weights. These clients will be phones, IoT devices or individual hospitals.

3. Local training

Each chosen client then trains the model locally using its own data. The info never leaves the client and all computation happens on device or inside a company like hospital or a bank.

4. Model update communication

After the local training, clients send only the updated model parameters (could possibly be weights or gradients) back to the server while raw data is shared at any point.

5. Aggregation

The server aggregates the client updates to supply a brand new global model. While Federated Averaging (Fed Avg) is a typical approach for aggregation, other strategies are also used. The updated model is then sent back to clients, and the method repeats until convergence.

Federated learning is an iterative process and every go through this loop is known as a round. Training a federated model normally requires many rounds, sometimes tons of, depending on aspects similar to model size, data distribution and the issue being solved.

Mathematical Intuition behind Federated Averaging

The workflow described above will also be written more formally. The figure below shows the unique Federated Averaging (Fed Avg) algorithm from Google’s seminal paper. This algorithm later became the major reference point and demonstrated that federated learning can work in practice. This formulation actually became the reference point for many federated learning systems today.

The unique Federated Averaging algorithm, showing the server–client training loop and weighted aggregation of local models | Source: Communication-Efficient Learning of Deep Networks from Decentralized Data

The unique Federated Averaging algorithm, showing the server–client training loop and weighted aggregation of local models.
On the core of Federated Averaging is the aggregation step, where the server updates the worldwide model by taking a weighted average of the locally trained client models. This will be written as:

Mathematical representation of the Federated Averaging algorithm

This equation makes it clear how each client contributes to the worldwide model. Clients with more local data have a bigger influence, while those with fewer samples contribute proportionally less. In practice, this easy idea is the rationale why Fed Avg became the default baseline for federated learning.

An easy NumPy implementation

Let’s take a look at a minimal example where five clients have been chosen. For the sake of simplicity, we assume that every client has already finished local training and returned its updated model weights together with the variety of samples it used. Using these values, the server computes a weighted sum that produces the brand new global model for the following round. This mirrors the Fed Avg equation directly, without introducing training or client-side details.

import numpy as np

# Client models after local training (w_{t+1}^k)
client_weights = [
    np.array([1.0, 0.8, 0.5]),     # client 1
    np.array([1.2, 0.9, 0.6]),     # client 2
    np.array([0.9, 0.7, 0.4]),     # client 3
    np.array([1.1, 0.85, 0.55]),   # client 4
    np.array([1.3, 1.0, 0.65])     # client 5
]

# Variety of samples at each client (n_k)
client_sizes = [50, 150, 100, 300, 4000]

# m_t = total variety of samples across chosen clients S_t
m_t = sum(client_sizes) # 50+150+100+300+400

# Initialize global model w_{t+1}
w_t_plus_1 = np.zeros_like(client_weights[0])

# FedAvg aggregation:

# w_{t+1} = sum_{k in S_t} (n_k / m_t) * w_{t+1}^k
# (50/1000) * w_1 + (150/1000) * w_2 + ...

for w_k, n_k in zip(client_weights, client_sizes):
    w_t_plus_1 += (n_k / m_t) * w_k

print("Aggregated global model w_{t+1}:", w_t_plus_1)
-------------------------------------------------------------
Aggregated global model w_{t+1}: [1.27173913 0.97826087 0.63478261]

How the aggregation is computed

Simply to put things into perspective, we will expand the aggregation step for just two clients and see how the numbers line up.

Challenges in Federated Learning Environments

Federated learning comes with its own set of challenges. One in every of the main issues when implementing it’s that the information across clients is usually non-IID (non-independent and identically distributed). This implies different clients may even see very different data distributions which in turn can slow training and make the worldwide model less stable. As an illustration, Hospitals in a federation can serve different populations that may follow different patterns.

Federated systems can involve anything from just a few organizations to hundreds of thousands of devices and managing participation, dropouts and aggregation becomes tougher because the system scales.

While federated learning keeps raw data local, it doesn’t fully solve privacy by itself. Model updates can still leak private information if not protected and so extra privacy methods are sometimes needed. Finally, communication generally is a source of bottleneck. Since networks will be slow or unreliable and sending frequent updates will be costly.

Conclusion and what’s next

In this text, we understood how federated learning works at a high level and likewise walked through a simply Numpy implementation. Nonetheless, as an alternative of writing the core logic by hand, there are frameworks like Flower which provides a straightforward and versatile strategy to construct federated learning systems. In the following part, we’ll utilise Flower to do the heavy lifting for us in order that we will give attention to the model and the information somewhat than the mechanics of federated learning. We’ll even have a take a look at federated LLMs, where model size, communication cost, and privacy constraints turn into much more vital.

Note: All images, unless otherwise stated, are created by the creator.

Federated Learning, Part 1: The Basics of Training Models Where the Data Lives

When Centralised Machine learning just isn’t ideal