SetFit is significantly more sample efficient and robust to noise than standard fine-tuning.
Few-shot learning with pretrained language models has emerged as a promising solution to each data scientist’s nightmare: coping with data that has few to no labels 😱.
Along with our research partners at Intel Labs and the UKP Lab, Hugging Face is happy to introduce SetFit: an efficient framework for few-shot fine-tuning of Sentence Transformers. SetFit achieves high accuracy with little labeled data – for instance, with only 8 labeled examples per class on the Customer Reviews (CR) sentiment dataset, SetFit is competitive with fine-tuning RoBERTa Large on the complete training set of 3k examples 🤯!
In comparison with other few-shot learning methods, SetFit has several unique features:
🗣 No prompts or verbalisers: Current techniques for few-shot fine-tuning require handcrafted prompts or verbalisers to convert examples right into a format that is suitable for the underlying language model. SetFit dispenses with prompts altogether by generating wealthy embeddings directly from a small variety of labeled text examples.
🏎 Fast to coach: SetFit doesn’t require large-scale models like T0 or GPT-3 to realize high accuracy. In consequence, it is usually an order of magnitude (or more) faster to coach and run inference with.
🌎 Multilingual support: SetFit will be used with any Sentence Transformer on the Hub, which suggests you possibly can classify text in multiple languages by simply fine-tuning a multilingual checkpoint.
For more details, take a look at our paper, data, and code. On this blog post, we’ll explain how SetFit works and easy methods to train your very own models. Let’s dive in!
How does it work?
SetFit is designed with efficiency and ease in mind. SetFit first fine-tunes a Sentence Transformer model on a small variety of labeled examples (typically 8 or 16 per class). That is followed by training a classifier head on the embeddings generated from the fine-tuned Sentence Transformer.
SetFit’s two-stage training process
SetFit takes advantage of Sentence Transformers’ ability to generate dense embeddings based on paired sentences. Within the initial fine-tuning phase stage, it makes use of the limited labeled input data by contrastive training, where positive and negative pairs are created by in-class and out-class selection. The Sentence Transformer model then trains on these pairs (or triplets) and generates dense vectors per example. Within the second step, the classification head trains on the encoded embeddings with their respective class labels. At inference time, the unseen example passes through the fine-tuned Sentence Transformer, generating an embedding that when fed to the classification head outputs a category label prediction.
And just by switching out the bottom Sentence Transformer model to a multilingual one, SetFit can function seamlessly in multilingual contexts. In our experiments, SetFit’s performance shows promising results on classification in German, Japanese, Mandarin, French and Spanish, in each in-language and cross linguistic settings.
Benchmarking SetFit
Although based on much smaller models than existing few-shot methods, SetFit performs on par or higher than state-of-the-art few-shot regimes on quite a lot of benchmarks. On RAFT, a few-shot classification benchmark, SetFit Roberta (using the all-roberta-large-v1 model) with 355 million parameters outperforms PET and GPT-3. It places just below average human performance and the 11 billion parameter T-few – a model 30 times the dimensions of SetFit Roberta. SetFit also outperforms the human baseline on 7 of the 11 RAFT tasks.
| Rank | Method | Accuracy | Model Size |
|---|---|---|---|
| 2 | T-Few | 75.8 | 11B |
| 4 | Human Baseline | 73.5 | N/A |
| 6 | SetFit (Roberta Large) | 71.3 | 355M |
| 9 | PET | 69.6 | 235M |
| 11 | SetFit (MP-Net) | 66.9 | 110M |
| 12 | GPT-3 | 62.7 | 175 B |
Outstanding methods on the RAFT leaderboard (as of September 2022)
On other datasets, SetFit shows robustness across quite a lot of tasks. As shown within the figure below, with just 8 examples per class, it typically outperforms PERFECT, ADAPET and fine-tuned vanilla transformers. SetFit also achieves comparable results to T-Few 3B, despite being prompt-free and 27 times smaller.
Comparing Setfit performance against other methods on 3 classification datasets.
Fast training and inference
Comparing training cost and average performance for T-Few 3B and SetFit (MPNet), with 8 labeled examples per class.
Since SetFit achieves high accuracy with relatively small models, it’s blazing fast to coach and at much lower cost. As an example, training SetFit on an NVIDIA V100 with 8 labeled examples takes just 30 seconds, at a value of $0.025. By comparison, training T-Few 3B requires an NVIDIA A100 and takes 11 minutes, at a value of around $0.7 for a similar experiment – an element of 28x more. In actual fact, SetFit can run on a single GPU just like the ones found on Google Colab and you possibly can even train SetFit on CPU in only a number of minutes! As shown within the figure above, SetFit’s speed-up comes with comparable model performance. Similar gains are also achieved for inference and distilling the SetFit model can bring speed-ups of 123x 🤯.
Training your personal model
To make SetFit accessible to the community, we have created a small setfit library that means that you can train your personal models with just a number of lines of code.
The very first thing to do is install it by running the next command:
pip install setfit
Next, we import SetFitModel and SetFitTrainer, two core classes that streamline the SetFit training process:
from datasets import load_dataset
from sentence_transformers.losses import CosineSimilarityLoss
from setfit import SetFitModel, SetFitTrainer
Now, let’s download a text classification dataset from the Hugging Face Hub. We’ll use the SentEval-CR dataset, which is a dataset of customer reviews:
dataset = load_dataset("SetFit/SentEval-CR")
To simulate a real-world scenario with just a number of labeled examples, we’ll sample 8 examples per class from the training set:
train_ds = dataset["train"].shuffle(seed=42).select(range(8 * 2))
test_ds = dataset["test"]
Now that now we have a dataset, the following step is to load a pretrained Sentence Transformer model from the Hub and instantiate a SetFitTrainer. Here we use the paraphrase-mpnet-base-v2 model, which we found to provide great results across many datasets:
model = SetFitModel.from_pretrained("sentence-transformers/paraphrase-mpnet-base-v2")
trainer = SetFitTrainer(
model=model,
train_dataset=train_ds,
eval_dataset=test_ds,
loss_class=CosineSimilarityLoss,
batch_size=16,
num_iterations=20,
num_epochs=1
)
The last step is to coach and evaluate the model:
trainer.train()
metrics = trainer.evaluate()
And that is it – you’ve got now trained your first SetFit model! Remember to push your trained model to the Hub 🙂
trainer.push_to_hub("my-awesome-setfit-model")
While this instance showed how this will be done with one specific variety of base model, any Sentence Transformer model may very well be switched in for various performance and tasks. As an example, using a multilingual Sentence Transformer body can extend few-shot classification to multilingual settings.
Next steps
We have shown that SetFit is an efficient method for few-shot classification tasks. In the approaching months, we’ll be exploring how well the tactic generalizes to tasks like natural language inference and token classification. Within the meantime, we’re excited to see how industry practitioners apply SetFit to their use cases – if you’ve got any questions or feedback, open a problem on our GitHub repo 🤗.
Comfortable few-shot learning!
