Hyperparameter Search with Transformers and Ray Tune

A guest blog post by Richard Liaw from the Anyscale team

With innovative research implementations, 1000’s of trained models easily accessible, the Hugging Face transformers library has change into critical to the success and growth of natural language processing today.

For any machine learning model to realize good performance, users often have to implement some type of parameter tuning. Yet, nearly everyone (1, 2) either finally ends up disregarding hyperparameter tuning or opting to do a simplistic grid search with a small search space.

Nonetheless, easy experiments are capable of show the good thing about using a complicated tuning technique. Below is a recent experiment run on a BERT model from Hugging Face transformers on the RTE dataset. Genetic optimization techniques like PBT can provide large performance improvements compared to plain hyperparameter optimization techniques.

Algorithm	Best Val Acc.	Best Test Acc.	Total GPU min	Total $ cost
Grid Search	74%	65.4%	45 min	$2.30
Bayesian Optimization +Early Stop	77%	66.9%	104 min	$5.30
Population-based Training	78%	70.5%	48 min	$2.45

In the event you’re leveraging Transformers, you’ll wish to have a approach to easily access powerful hyperparameter tuning solutions without giving up the customizability of the Transformers framework.

Within the Transformers 3.1 release, Hugging Face Transformers and Ray Tune teamed up to supply a straightforward yet powerful integration. Ray Tune is a well-liked Python library for hyperparameter tuning that gives many state-of-the-art algorithms out of the box, together with integrations with the best-of-class tooling, equivalent to Weights and Biases and tensorboard.

To reveal this latest Hugging Face + Ray Tune integration, we leverage the Hugging Face Datasets library to effective tune BERT on MRPC.

To run this instance, please first run:

pip install "ray[tune]" transformers datasets scipy sklearn torch

Simply plug in certainly one of Ray’s standard tuning algorithms by just adding a couple of lines of code.

from datasets import load_dataset, load_metric
from transformers import (AutoModelForSequenceClassification, AutoTokenizer,
                          Trainer, TrainingArguments)

tokenizer = AutoTokenizer.from_pretrained('distilbert-base-uncased')
dataset = load_dataset('glue', 'mrpc')
metric = load_metric('glue', 'mrpc')

def encode(examples):
    outputs = tokenizer(
        examples['sentence1'], examples['sentence2'], truncation=True)
    return outputs

encoded_dataset = dataset.map(encode, batched=True)

def model_init():
    return AutoModelForSequenceClassification.from_pretrained(
        'distilbert-base-uncased', return_dict=True)

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = predictions.argmax(axis=-1)
    return metric.compute(predictions=predictions, references=labels)




training_args = TrainingArguments(
    "test", evaluation_strategy="steps", eval_steps=500, disable_tqdm=True)
trainer = Trainer(
    args=training_args,
    tokenizer=tokenizer,
    train_dataset=encoded_dataset["train"],
    eval_dataset=encoded_dataset["validation"],
    model_init=model_init,
    compute_metrics=compute_metrics,
)



trainer.hyperparameter_search(
    direction="maximize", 
    backend="ray", 
    n_trials=10 
)

By default, each trial will utilize 1 CPU, and optionally 1 GPU if available.
You may leverage multiple GPUs for a parallel hyperparameter search
by passing in a resources_per_trial argument.

You can even easily swap different parameter tuning algorithms equivalent to HyperBand, Bayesian Optimization, Population-Based Training:

To run this instance, first run: pip install hyperopt

from ray.tune.suggest.hyperopt import HyperOptSearch
from ray.tune.schedulers import ASHAScheduler

trainer = Trainer(
    args=training_args,
    tokenizer=tokenizer,
    train_dataset=encoded_dataset["train"],
    eval_dataset=encoded_dataset["validation"],
    model_init=model_init,
    compute_metrics=compute_metrics,
)

best_trial = trainer.hyperparameter_search(
    direction="maximize",
    backend="ray",
    
    
    search_alg=HyperOptSearch(metric="objective", mode="max"),
    
    
    scheduler=ASHAScheduler(metric="objective", mode="max"))

It also works with Weights and Biases out of the box!

Try it out today:

In the event you liked this blog post, make sure to take a look at:

Source link

Hyperparameter Search with Transformers and Ray Tune

A guest blog post by Richard Liaw from the Anyscale team

Try it out today:

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Transformer-based Encoder-Decoder Models

Scaling Feature Engineering Pipelines with Feast and Ray

Mixing generative AI with physics to create personal items that work in the true world

Porting fairseq wmt19 translation system to transformers

Breaking the Host Memory Bottleneck: How Peer Direct Transformed Gaudi’s Cloud Performance

Hyperparameter Search with Transformers and Ray Tune

A guest blog post by Richard Liaw from the Anyscale team

Try it out today:

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.