Graphcore and Hugging Face have significantly expanded the range of Machine Learning modalities and tasks available in Hugging Face Optimum, an open-source library for Transformers performance optimization. Developers now have convenient access to a big selection of off-the-shelf Hugging Face Transformer models, optimised to deliver the most effective possible performance on Graphcore’s IPU.
Including the BERT transformer model made available shortly after Optimum Graphcore launched, developers can now access 10 models covering Natural Language Processing (NLP), Speech and Computer Vision, which include IPU configuration files and ready-to-use pre-trained and fine-tuned model weights.
Recent Optimum models
Computer vision
ViT (Vision Transformer) is a breakthrough in image recognition that uses the transformer mechanism as its primary component. When images are input to ViT, they’re divided into small patches much like how words are processed in language systems. Each patch is encoded by the Transformer (Embedding) after which may be processed individually.
NLP
GPT-2 (Generative Pre-trained Transformer 2) is a text generation transformer model pretrained on a really large corpus of English data in a self-supervised fashion. This implies it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it could use plenty of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely, it’s trained to generate texts from a prompt by guessing the following word in sentences.
RoBERTa (Robustly optimized BERT approach) is a transformer model that (like GPT-2) is pretrained on a big corpus of English data in a self-supervised fashion. More precisely, RoBERTa it was pretrained with the masked language modeling (MLM) objective. Taking a sentence, the model randomly masks 15% of the words within the input then runs the whole masked sentence through the model and has to predict the masked words. Roberta may be used for masked language modeling, but is usually intended to be fine-tuned on a downstream task.
DeBERTa (Decoding-enhanced BERT with disentangled attention) is a pretrained neural language model for NLP tasks. DeBERTa adapts the 2018 BERT and 2019 RoBERTa models using two novel techniques—a disentangled attention mechanism and an enhanced mask decoder—significantly improving the efficiency of model pretraining and performance of downstream tasks.
BART is a transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. BART is pre-trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the unique text. BART is especially effective when fine-tuned for text generation (e.g. summarization, translation) but in addition works well for comprehension tasks (e.g. text classification, query answering).
LXMERT (Learning Cross-Modality Encoder Representations from Transformers) is a multimodal transformer model for learning vision and language representations. It has three encoders: object relationship encoder, a language encoder, and a cross-modality encoder. It’s pretrained via a mixture of masked language modeling, visual-language text alignment, ROI-feature regression, masked visual-attribute modeling, masked visual-object modeling, and visual-question answering objectives. It has achieved state-of-the-art results on the VQA and GQA visual-question-answering datasets.
T5 (Text-to-Text Transfer Transformer) is a revolutionary recent model that may take any text and convert it right into a machine learning format for translation, query answering or classification. It introduces a unified framework that converts all text-based language problems right into a text-to-text format for transfer learning. By doing so, it has simplified a solution to use the identical model, objective function, hyperparameters, and decoding procedure across a various set of NLP tasks.
Speech
HuBERT (Hidden-Unit BERT) is a self-supervised speech recognition model pretrained on audio, learning a combined acoustic and language model over continuous inputs. The HuBERT model either matches or improves upon the state-of-the-art wav2vec 2.0 performance on the Librispeech (960h) and Libri-light (60,000h) benchmarks with 10min, 1h, 10h, 100h, and 960h fine-tuning subsets.
Wav2Vec2 is a pretrained self-supervised model for automatic speech recognition. Using a novel contrastive pretraining objective, Wav2Vec2 learns powerful speech representations from large amounts of unlabelled speech data, followed by fine-tuning on a small amount of transcribed speech data, outperforming the most effective semi-supervised methods while being conceptually simpler.
Hugging Face Optimum Graphcore: constructing on a solid partnership
Graphcore joined the Hugging Face Hardware Partner Program in 2021 as a founding member, with each firms sharing the common goal of lowering the barriers for innovators searching for to harness the ability of machine intelligence.
Since then, Graphcore and Hugging Face have worked together extensively to make training of transformer models on IPUs fast and straightforward, with the primary Optimum Graphcore model (BERT) being made available last yr.
Transformers have proven to be extremely efficient for a big selection of functions, including feature extraction, text generation, sentiment evaluation, translation and plenty of more. Models like BERT are widely utilized by Graphcore customers in an enormous array of applications including cybersecurity, voice call automation, drug discovery, and translation.
Optimizing their performance in the actual world requires considerable time, effort and skills which might be beyond the reach of many firms and organizations. In providing an open-source library of transformer models, Hugging Face has directly addressed these issues. Integrating IPUs with HuggingFace also allows developers to leverage not only the models, but in addition datasets available within the HuggingFace Hub.
Developers can now use Graphcore systems to coach 10 various kinds of state-of-the-art transformer models and access hundreds of datasets with minimal coding complexity. With this partnership, we’re providing users with the tools and ecosystem to simply download and fine-tune state-of-the-art pretrained models to varied domains and downstream tasks.
Bringing Graphcore’s latest hardware and software to the table
While members of Hugging Face’s ever-expanding user base have already been in a position to profit from the speed, performance, and power- and cost-efficiency of IPU technology, a mixture of recent hardware and software releases from Graphcore will unlock much more potential.
On the hardware front, the Bow IPU — announced in March and now shipping to customers — is the primary processor on the earth to make use of Wafer-on-Wafer (WoW) 3D stacking technology, taking the well-documented advantages of the IPU to the following level. Featuring ground-breaking advances in compute architecture and silicon implementation, communication and memory, each Bow IPU delivers as much as 350 teraFLOPS of AI compute—a formidable 40% increase in performance—and as much as 16% more power efficiency in comparison with the previous generation IPU. Importantly, Hugging Face Optimum users can switch seamlessly from previous generation IPUs to Bow processors, as no code changes are required.
Software also plays a significant role in unlocking the IPU’s capabilities, so naturally Optimum offers a plug-and-play experience with Graphcore’s easy-to-use Poplar SDK — which itself has received a serious 2.5 update. Poplar makes it easy to coach state-of-the-art models on state-of-the-art hardware, due to its full integration with standard machine learning frameworks, including PyTorch, PyTorch Lightning, and TensorFlow—in addition to orchestration and deployment tools resembling Docker and Kubernetes. Making Poplar compatible with these widely used, third-party systems allows developers to simply port their models from their other compute platforms and begin benefiting from the IPU’s advanced AI capabilities.
Start with Hugging Face’s Optimum Graphcore models
In the event you’re all in favour of combining the advantages of IPU technology with the strengths of transformer models, you possibly can download the most recent range of Optimum Graphcore models from the Graphcore organization on the Hub, or access the code from the Optimum GitHub repo. Our Getting Began blog post will guide you thru each step to begin experimenting with IPUs.
Moreover, Graphcore has built an in depth page of developer resources, where you could find the IPU Model Garden—a repository of deployment-ready ML applications including computer vision, NLP, graph networks and more—alongside an array of documentation, tutorials, how-to-videos, webinars, and more. You may as well access Graphcore’s GitHub repo for more code references and tutorials.
To learn more about using Hugging Face on Graphcore, head over to our partner page!
