This tutorial will show methods to leverage Hugging Face to federate the training of language models over multiple clients using Flower. More specifically, we’ll fine-tune a pre-trained Transformer model (distilBERT) for sequence classification over a dataset of IMDB rankings. The top goal is to detect if a movie rating is positive or negative.
A notebook can also be available here but as an alternative of running on multiple separate clients it utilizes the simulation functionality of Flower (using flwr['simulation']) to be able to emulate a federated setting inside Google Colab (this also implies that as an alternative of calling start_server we’ll call start_simulation, and that a number of other modifications are needed).
Dependencies
To follow along this tutorial you have to to put in the next packages: datasets, evaluate, flwr, torch, and transformers. This will be done using pip:
pip install datasets evaluate flwr torch transformers
Standard Hugging Face workflow
Handling the information
To fetch the IMDB dataset, we’ll use Hugging Face’s datasets library. We then must tokenize the information and create PyTorch dataloaders, that is all done within the load_data function:
import random
import torch
from datasets import load_dataset
from torch.utils.data import DataLoader
from transformers import AutoTokenizer, DataCollatorWithPadding
DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
CHECKPOINT = "distilbert-base-uncased"
def load_data():
"""Load IMDB data (training and eval)"""
raw_datasets = load_dataset("imdb")
raw_datasets = raw_datasets.shuffle(seed=42)
del raw_datasets["unsupervised"]
tokenizer = AutoTokenizer.from_pretrained(CHECKPOINT)
def tokenize_function(examples):
return tokenizer(examples["text"], truncation=True)
train_population = random.sample(range(len(raw_datasets["train"])), 100)
test_population = random.sample(range(len(raw_datasets["test"])), 100)
tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)
tokenized_datasets["train"] = tokenized_datasets["train"].select(train_population)
tokenized_datasets["test"] = tokenized_datasets["test"].select(test_population)
tokenized_datasets = tokenized_datasets.remove_columns("text")
tokenized_datasets = tokenized_datasets.rename_column("label", "labels")
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
trainloader = DataLoader(
tokenized_datasets["train"],
shuffle=True,
batch_size=32,
collate_fn=data_collator,
)
testloader = DataLoader(
tokenized_datasets["test"], batch_size=32, collate_fn=data_collator
)
return trainloader, testloader
trainloader, testloader = load_data()
Training and testing the model
Once we’ve got a way of making our trainloader and testloader, we will maintain the training and testing. This may be very much like any PyTorch training or testing loop:
from evaluate import load as load_metric
from transformers import AdamW
def train(net, trainloader, epochs):
optimizer = AdamW(net.parameters(), lr=5e-5)
net.train()
for _ in range(epochs):
for batch in trainloader:
batch = {k: v.to(DEVICE) for k, v in batch.items()}
outputs = net(**batch)
loss = outputs.loss
loss.backward()
optimizer.step()
optimizer.zero_grad()
def test(net, testloader):
metric = load_metric("accuracy")
loss = 0
net.eval()
for batch in testloader:
batch = {k: v.to(DEVICE) for k, v in batch.items()}
with torch.no_grad():
outputs = net(**batch)
logits = outputs.logits
loss += outputs.loss.item()
predictions = torch.argmax(logits, dim=-1)
metric.add_batch(predictions=predictions, references=batch["labels"])
loss /= len(testloader.dataset)
accuracy = metric.compute()["accuracy"]
return loss, accuracy
Creating the model itself
To create the model itself, we’ll just load the pre-trained distillBERT model using Hugging Face’s AutoModelForSequenceClassification :
from transformers import AutoModelForSequenceClassification
net = AutoModelForSequenceClassification.from_pretrained(
CHECKPOINT, num_labels=2
).to(DEVICE)
Federating the instance
The thought behind Federated Learning is to coach a model between multiple clients and a server without having to share any data. This is finished by letting each client train the model locally on its data and send its parameters back to the server, which then aggregates all of the clients’ parameters together using a predefined strategy. This process is made quite simple through the use of the Flower framework. For those who need a more complete overview, you should definitely take a look at this guide: What’s Federated Learning?
Creating the IMDBClient
To federate our example to multiple clients, we first need to write down our Flower client class (inheriting from flwr.client.NumPyClient). This may be very easy, as our model is a regular PyTorch model:
from collections import OrderedDict
import flwr as fl
class IMDBClient(fl.client.NumPyClient):
def get_parameters(self, config):
return [val.cpu().numpy() for _, val in net.state_dict().items()]
def set_parameters(self, parameters):
params_dict = zip(net.state_dict().keys(), parameters)
state_dict = OrderedDict({k: torch.Tensor(v) for k, v in params_dict})
net.load_state_dict(state_dict, strict=True)
def fit(self, parameters, config):
self.set_parameters(parameters)
print("Training Began...")
train(net, trainloader, epochs=1)
print("Training Finished.")
return self.get_parameters(config={}), len(trainloader), {}
def evaluate(self, parameters, config):
self.set_parameters(parameters)
loss, accuracy = test(net, testloader)
return float(loss), len(testloader), {"accuracy": float(accuracy)}
The get_parameters function lets the server get the client’s parameters. Inversely, the set_parameters function allows the server to send its parameters to the client. Finally, the fit function trains the model locally for the client, and the evaluate function tests the model locally and returns the relevant metrics.
We are able to now start client instances using:
fl.client.start_numpy_client(
server_address="127.0.0.1:8080",
client=IMDBClient(),
)
Starting the server
Now that we’ve got a approach to instantiate clients, we’d like to create our server to be able to aggregate the outcomes. Using Flower, this will be done very easily by first selecting a technique (here, we’re using FedAvg, which can define the worldwide weights as the typical of all of the clients’ weights at each round) after which using the flwr.server.start_server function:
def weighted_average(metrics):
accuracies = [num_examples * m["accuracy"] for num_examples, m in metrics]
losses = [num_examples * m["loss"] for num_examples, m in metrics]
examples = [num_examples for num_examples, _ in metrics]
return {"accuracy": sum(accuracies) / sum(examples), "loss": sum(losses) / sum(examples)}
strategy = fl.server.strategy.FedAvg(
fraction_fit=1.0,
fraction_evaluate=1.0,
evaluate_metrics_aggregation_fn=weighted_average,
)
fl.server.start_server(
server_address="0.0.0.0:8080",
config=fl.server.ServerConfig(num_rounds=3),
strategy=strategy,
)
The weighted_average function is there to supply a approach to aggregate the metrics distributed amongst the clients (mainly this permits us to display a pleasant average accuracy and loss for each round).
Putting every little thing together
If you wish to take a look at every little thing put together, it is best to take a look at the code example we wrote for the Flower repo: https://github.com/adap/flower/tree/primary/examples/quickstart-huggingface.
After all, this can be a very basic example, and lots will be added or modified, it was simply to showcase how simply we could federate a Hugging Face workflow using Flower.
Note that in this instance we used PyTorch, but we could have thoroughly used TensorFlow.
