ONNX Runtime is a cross-platform machine learning tool that will be used to speed up a wide selection of models, particularly those with ONNX support.
Hugging Face ONNX Runtime Support
There are over 130,000 ONNX-supported models on Hugging Face, an open source community that permits users to construct, train, and deploy tons of of hundreds of publicly available machine learning models.
These ONNX-supported models, which include many increasingly popular large language models (LLMs) and cloud models, can leverage ONNX Runtime to enhance performance, together with other advantages.
For instance, using ONNX Runtime to speed up the whisper-tiny model can improve average latency per inference, with an as much as 74.30% gain over PyTorch.
ONNX Runtime works closely with Hugging Face to make sure that the most well-liked models on the positioning are supported.
In total, over 90 Hugging Face model architectures are supported by ONNX Runtime, including the 11 hottest architectures (where popularity is decided by the corresponding variety of models uploaded to the Hugging Face Hub):
| Model Architecture | Approximate No. of Models |
|---|---|
| BERT | 28180 |
| GPT2 | 14060 |
| DistilBERT | 11540 |
| RoBERTa | 10800 |
| T5 | 10450 |
| Wav2Vec2 | 6560 |
| Stable-Diffusion | 5880 |
| XLM-RoBERTa | 5100 |
| Whisper | 4400 |
| BART | 3590 |
| Marian | 2840 |
Learn More
To learn more about accelerating Hugging Face models with ONNX Runtime, try our recent post on the Microsoft Open Source Blog.
