From Centralized to Federated Learning

-

Federated Learning (FL) is a technique to coach Machine Learning (ML) models in a distributed setting [1]. The thought is that clients (for instance hospitals) need to cooperate without sharing their private and sensitive data. Each client holds their private data in FL and trains an ML model on it. Then a central server collects and aggregates the model parameters, thus constructing a world model based on information from all the information distribution. Ideally, this serves as privacy protection by design.

An extended line of research has been done to grasp FL’s efficiency, privacy, and fairness. Here we’ll concentrate on the benchmark datasets used to guage horizontal FL methods where the clients share the identical task and data type but they’ve their individual data samples.

If you need to know more about Federated Learning and what I work on, visit our research lab website!

Photo by JJ Ying on Unsplash

There are three kinds of datasets within the literature:

  1. Real FL scenario: an application where FL is a needed method. It has natural distributions and sensitive data. Nevertheless, given the character of FL if you need to keep the information locally you won’t publish the dataset online for benchmarking. Subsequently it is tough to search out a dataset of this type. OpenMinded behind PySyft tries to arrange an FL community of universities and research labs to host data in a more realistic scenario. Moreover, there are applications where the privacy-awareness has risen recently. So there may be publicly available data while the demand for FL exists. One application is sensible electricity meters [2].
  2. FL benchmark datasets: these datasets are designed to function FL benchmarks. The distribution is realistic, however the sensitivity of the information is questionable as they’re built from publicly available origins. One example is creating an FL dataset from Reddit posts using the users as clients and distributing it to 1 user as one partition. The LEAF project proposed more datasets like this [3].
  3. Distributing standard datasets: there are a few well-known datasets like CIFAR and ImageNet for images for instance used as a benchmark in lots of Machine Learning works. Here FL scientists define a distribution in keeping with their research questions. It is sensible to make use of this method if the subject is well-studied on a regular ML scenario and one wants to match their FL algorithm to centralized SOTA. Nevertheless, this artificial distribution doesn’t reveal every problem with the distribution skew. For instance, if the clients collect images with very different cameras or in several lighting conditions.

Because the last category isn’t distributed by design, there are several ways past research works split them. In the remaining of this story, I’ll summarise distribution techniques used for the CIFAR dataset in a federated scenario.

CIFAR dataset

The CIFAR-10 and CIFAR-100 datasets contain 32×32 coloured images labeled to mutually exclusive classes [4]. The CIFAR-10 has 10 classes of 6000 images and the CIFAR-100 has 100 classes of 600 images. They’re utilized in many image classification tasks and one can access dozens of models evaluated on them, even browsing them using a leaderboard on PapersWithCode.

Uniform distribution

This is taken into account to be identically and independently distributed (IID) data. Data points are randomly allocated to clients.

Single (n-) class clients

Data points allocated for a particular client come from the identical class or classes. It may be recognized as an extreme non-IID setting. Examples of this distribution are in [1,5–8]. The work first naming the setting as Federated Learning [1] uses 200 single-class sets and offers two sets to every client making them 2-class clients. [5–7] use 2-class clients.

[9] builds on the hierarchical classes in CIFAR-100: clients have data points from one subclass in each superclass. This fashion within the classification task for superclasses has clients with samples from each (super)class, yet a distribution skew is simulated as the information points are from different subclasses. For instance, one client has access to lions while the opposite has tiger images, the superclass task is to categorize each as large carnivores.

Dominant class clients

[5] also uses a combination of uniform and 2-class clients, which implies half of the information points come from the two dominant classes, and the remaining are uniformly chosen from the opposite classes. [10] uses an 80%-20% partition 80% chosen from a single dominant class and the remaining is uniformly chosen from the opposite classes.

Dirichlet distribution

To grasp the Dirichlet distribution, I follow the instance of this blog post. Let’s say one wants to provide a dice, with θ=(1/6,1/6,1/6,1/6,1/6,1/6) probabilities for every #1–6. Nevertheless, in point of fact, nothing will be perfect, so each die can be a bit skewed. 4 a bit more likely and three a bit less likely for instance. The Dirichlet distribution describes this variety with a parameter vector α=(α₁,α₂,..,α₆). Larger αᵢ strengthens the burden of that number and the larger overall sum of the αᵢ values ensures more similar sampled probabilities (dice). Turning back to the dice example, to have a good die each αᵢ must be equal, and the larger the α value the higher manufactured the dice are. Because it is a multivariate generalization of the beta distribution, let’s display some examples of the beta distribution (Dirichlet distribution with two dice):

Different beta distributions (Dirichlet distribution for two variables) — Figure by the creator

I reproduced the visualization in [11], using the identical α value for αᵢ each. This is known as a symmetric Dirichlet distribution. We are able to see that because the α value decreases it’s more likely that there can be unbalanced dice. The figures below show the Dirichlet distribution for various α values. Here each row represents a category, each column is a client and the world of the circles is proportionate to the possibilities.

Distribution over classes: Sampling 20 clients for 10 classes using different Dirichlet distribution α values — Figure by the creator

Distribution over classes: The samples for every client are drawn independently with class distribution following the Dirichlet method. [11, 16] use this version of the Dirichlet distribution.

Distribution over classes: normalized sum of samples by class (10) and by client (20) — Figure by the creator

Each client has a predetermined variety of samples, however the classes are chosen randomly, thus the ultimate total class representation can be unbalanced. Within the clients, α→∞ is the prior (uniform) distribution while α→0 means single-class clients.

Distribution over clients: Sampling 20 clients for 10 classes using different Dirichlet distribution α values — Figure by the creator

Distribution over clients: if we all know the entire variety of samples in a category and the variety of clients, we will distribute the samples to the clients class by class. This may lead to clients having a unique variety of samples (which may be very typical in FL), while the worldwide class distribution is balanced. [12] use this variation of the Dirichlet distribution.

Distribution over clients: normalized sum of samples by class (10) and by client (20) — Figure by the creator

While works like [11–16] follow and cite one another using Dirichlet distribution, they use the 2 different methods. Moreover, the several experiments use different α values which may end up in very different performances. [11,12] uses α=0.1 and [13-15] uses α=0.5, [16] gives an summary of various α values. These design selections lose the unique principle of using the identical benchmark dataset to guage algorithms.

Asymmetric Dirichlet distribution: one can use different αᵢ values to simulate more resourceful clients. For instance, the figure below is produced using 1/i for the ith client. It isn’t represented within the literature to my knowledge, as a substitute, Zipf distribution is utilized in [17].

Asymmetric Dirichlet distribution with αᵢ=1/i — Figure by the creator

Zipf distribution

[17] uses a mixture of Zipf and Dirichlet distributions. It uses the Zipf distribution to find out the variety of samples at each client after which selects the category distribution using the Dirichlet.

Probability for rank k within the Zipf distribution where is the Riemann Zeta function

Within the Zipf (zeta) distribution the frequency of an item is inversely proportional to its rank in a frequency table. Zipf’s law will be observed in lots of real-world datasets, for instance regarding the word frequency in language corpora [18].

Sampling items using the Zipf distribution — Figure by the creator following the numpy documentation on Zipf

Benchmarking federated learning methods is a difficult task. Ideally, one uses predefined real federated datasets. Nevertheless, if a certain scenario needs to be simulated without a great existing dataset to cover it, one can use data distribution techniques. Proper documentation for reproducibility and motivation of the design selection is very important. Here I summarized probably the most common methods already in use for FL algorithm evaluation. Visit this Colab notebook for the codes used for this story!

[1] McMahan, B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017, April). Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics (pp. 1273–1282). PMLR.

[2] Savi, M., & Olivadese, F. (2021). Short-term energy consumption forecasting at the sting: A federated learning approach. IEEE Access, 9, 95949–95969.

[3] Caldas, S., Duddu, S. M. K., Wu, P., Li, T., Konečný, J., McMahan, H. B., … & Talwalkar, A. (2019). Leaf: A benchmark for federated settings. Workshop on Federated Learning for Data Privacy and Confidentiality

[4] Krizhevsky, A. (2009). Learning Multiple Layers of Features from Tiny Images. Master’s thesis, University of Tront.

[5] Liu, W., Chen, L., Chen, Y., & Zhang, W. (2020). Accelerating federated learning via momentum gradient descent. IEEE Transactions on Parallel and Distributed Systems, 31(8), 1754–1766.

[6] Zhang, L., Luo, Y., Bai, Y., Du, B., & Duan, L. Y. (2021). Federated learning for non-iid data via unified feature learning and optimization objective alignment. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 4420–4428).

[7] Zhang, J., Guo, S., Ma, X., Wang, H., Xu, W., & Wu, F. (2021). Parameterized knowledge transfer for personalized federated learning. Advances in Neural Information Processing Systems, 34, 10092–10104.

[8] Zhao, Y., Li, M., Lai, L., Suda, N., Civin, D., & Chandra, V. (2018). Federated learning with non-iid data. arXiv preprint arXiv:1806.00582.

[9] Li, D., & Wang, J. (2019). Fedmd: Heterogenous federated learning via model distillation. arXiv preprint arXiv:1910.03581.

[10] Wang, H., Kaplan, Z., Niu, D., & Li, B. (2020, July). Optimizing federated learning on non-iid data with reinforcement learning. In IEEE INFOCOM 2020-IEEE Conference on Computer Communications (pp. 1698–1707). IEEE.

[11] Lin, T., Kong, L., Stich, S. U., & Jaggi, M. (2020). Ensemble distillation for robust model fusion in federated learning. Advances in Neural Information Processing Systems, 33, 2351–2363.

[12] Luo, M., Chen, F., Hu, D., Zhang, Y., Liang, J., & Feng, J. (2021). No fear of heterogeneity: Classifier calibration for federated learning with non-iid data. Advances in Neural Information Processing Systems, 34, 5972–5984.

[13] Yurochkin, M., Agarwal, M., Ghosh, S., Greenewald, K., Hoang, N., & Khazaeni, Y. (2019, May). Bayesian nonparametric federated learning of neural networks. In International conference on machine learning (pp. 7252–7261). PMLR.

[14] Wang, H., Yurochkin, M., Sun, Y., Papailiopoulos, D., & Khazaeni, Y. (2020) Federated Learning with Matched Averaging. In International Conference on Learning Representations.

[15] Li, Q., He, B., & Song, D. (2021). Model-contrastive federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 10713–10722).

[16] Hsu, T. M. H., Qi, H., & Brown, M. (2019). Measuring the results of non-identical data distribution for federated visual classification. arXiv preprint arXiv:1909.06335.

[17] Wadu, M. M., Samarakoon, S., & Bennis, M. (2021). Joint client scheduling and resource allocation under channel uncertainty in federated learning. IEEE Transactions on Communications, 69(9), 5962–5974.

[18] Fagan, Stephen; Gençay, Ramazan (2010), “An introduction to textual econometrics”, in Ullah, Aman; Giles, David E. A. (eds.), Handbook of Empirical Economics and Finance, CRC Press, pp. 133–153

admin

What are your thoughts on this topic?
Let us know in the comments below.

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments

Share this article

Recent posts

Could We Achieve AGI Inside 5 Years? NVIDIA’s CEO Jensen Huang Believes It’s Possible

Within the dynamic field of artificial intelligence, the search for Artificial General Intelligence (AGI) represents a pinnacle of innovation, promising to redefine the interplay...

MS reveals a part of 'Customized Co-Pilot'… “Testing in progress… coming soon”

A few of the 'Customized Co-Pilot' that Microsoft (MS) announced in January has been released. In addition they announced that they plan to...

Impact of Rising Sea Levels on Coastal Residential Real Estate Assets

Using scenario based stress testing to discover medium (2050) and long run (2100) sea level rise risksThis project utilizes a scenario based qualitative stress...

Create a speaking and singing video with a single photo…”Produce mouth shapes, facial expressions, and movements.”

https://www.youtube.com/watch?v=9KuCy0W5s4o Alibaba introduced a man-made intelligence (AI) system that creates realistic speaking and singing videos from a single photo. It's the follow-up to the...

Recent comments

binance us registrácia on The Path to AI Maturity – 2023 LXT Report
Do NeuroTest work on The Stacking Ensemble Method
AeroSlim Weight loss price on NIA holds AI Ethics Idea Contest Awards Ceremony
skapa binance-konto on LLMs and the Emerging ML Tech Stack
бнанс рестраця для США on Model Evaluation in Time Series Forecasting
Bonus Pendaftaran Binance on Meet Our Fleet
Créer un compte gratuit on About Me — How I give AI artists a hand
To tài khon binance on China completely blocks ‘Chat GPT’
Regístrese para obtener 100 USDT on Reducing bias and improving safety in DALL·E 2
crystal teeth whitening on What babies can teach AI
binance referral bonus on DALL·E API now available in public beta
www.binance.com prihlásení on Neural Networks and Life
Büyü Yapılmışsa Nasıl Bozulur on Introduction to PyTorch: from training loop to prediction
yıldızname on OpenAI Function Calling
Kısmet Bağlılığını Çözmek İçin Dua on Examining Flights within the U.S. with AWS and Power BI
Kısmet Bağlılığını Çözmek İçin Dua on How Meta’s AI Generates Music Based on a Reference Melody
Kısmet Bağlılığını Çözmek İçin Dua on ‘이루다’의 스캐터랩, 기업용 AI 시장에 도전장
uçak oyunu bahis on Thanks!
para kazandıran uçak oyunu on Make Machine Learning Work for You
medyum on Teaching with AI
aviator oyunu oyna on Machine Learning for Beginners !
yıldızname on Final DXA-nation
adet kanı büyüsü on ‘Fake ChatGPT’ app on the App Store
Eşini Eve Bağlamak İçin Dua on LLMs and the Emerging ML Tech Stack
aviator oyunu oyna on AI as Artist’s Augmentation
Büyü Yapılmışsa Nasıl Bozulur on Some Guy Is Trying To Turn $100 Into $100,000 With ChatGPT
Eşini Eve Bağlamak İçin Dua on Latest embedding models and API updates
Kısmet Bağlılığını Çözmek İçin Dua on Jorge Torres, Co-founder & CEO of MindsDB – Interview Series
gideni geri getiren büyü on Joining the battle against health care bias
uçak oyunu bahis on A faster method to teach a robot
uçak oyunu bahis on Introducing the GPT Store
para kazandıran uçak oyunu on Upgrading AI-powered travel products to first-class
para kazandıran uçak oyunu on 10 Best AI Scheduling Assistants (September 2023)
aviator oyunu oyna on 🤗Hugging Face Transformers Agent
Kısmet Bağlılığını Çözmek İçin Dua on Time Series Prediction with Transformers
para kazandıran uçak oyunu on How China is regulating robotaxis
bağlanma büyüsü on MLflow on Cloud
para kazandıran uçak oyunu on Can The 2024 US Elections Leverage Generative AI?
Canbar Büyüsü on The reverse imitation game
bağlanma büyüsü on The NYU AI School Returns Summer 2023
para kazandıran uçak oyunu on Beyond ChatGPT; AI Agent: A Recent World of Staff
Büyü Yapılmışsa Nasıl Bozulur on The Murky World of AI and Copyright
gideni geri getiren büyü on ‘Midjourney 5.2’ creates magical images
Büyü Yapılmışsa Nasıl Bozulur on Microsoft launches the brand new Bing, with ChatGPT inbuilt
gideni geri getiren büyü on MemCon 2023: We’ll Be There — Will You?
adet kanı büyüsü on Meet the Fellow: Umang Bhatt
aviator oyunu oyna on Meet the Fellow: Umang Bhatt
abrir uma conta na binance on The reverse imitation game
código de indicac~ao binance on Neural Networks and Life
Larry Devin Vaughn Wall on How China is regulating robotaxis
Jon Aron Devon Bond on How China is regulating robotaxis
otvorenie úctu na binance on Evolution of Blockchain by DLC
puravive reviews consumer reports on AI-Driven Platform Could Streamline Drug Development
puravive reviews consumer reports on How OpenAI is approaching 2024 worldwide elections
www.binance.com Registrácia on DALL·E now available in beta