The Transformers Library: standardizing model definitions

-



TLDR: Going forward, we’re aiming for Transformers to be the pivot across frameworks: if a model architecture is
supported by transformers, you’ll be able to expect it to be supported in the remaining of the ecosystem.


Transformers was created in 2019, shortly following the discharge of the BERT Transformer model. Since then, we have
constantly aimed so as to add state-of-the-art architectures, initially focused on NLP, then growing to Audio and
computer vision. Today, transformers is the default library for LLMs and VLMs within the Python ecosystem.

Transformers now supports 300+ model architectures, with a median of ~3 latest architectures added every week.
We’ve got aimed for these architectures to be released in a timely manner; having day-0 support for essentially the most
sought-after architectures (Llamas, Qwens, GLMs, etc.).



A model-definition library

Transformers standardizing model definitions

Over time, Transformers has turn into a central component within the ML ecosystem, becoming some of the complete
toolkits by way of model diversity; it’s integrated in all popular training frameworks similar to Axolotl,
Unsloth, DeepSpeed, FSDP, PyTorch-Lightning, TRL, Nanotron, etc.

Recently, we have been working hand in hand with the preferred inference engines (vLLM, SGLang, TGI, …) for them
to make use of transformers as a backend. The worth added is critical: as soon as a model is added to transformers,
it becomes available in these inference engines, while benefiting from the strengths each engine provides: inference optimizations, specialized kernels, dynamic batching, etc.

For example, here is how you’ll work with the transformers backend in vLLM:

from vllm import LLM

llm = LLM(model="new-transformers-model", model_impl="transformers")

That is all it takes for a brand new model to enjoy super-fast and production-grade serving with vLLM!

Read more about it within the vLLM documentation.


We have also been working very closely with llama.cpp and
MLX in order that the implementations between transformers
and these modeling libraries have great interoperability. For instance, because of a major community effort,
it’s now very easy to load GGUF files in transformers for
further fine-tuning. Conversely, transformers models will be easily
converted to GGUF files to be used with
llama.cpp.

The identical is true for MLX, where the transformers’ safetensors files are directly compatible with MLX’s models.

We’re super proud that the transformers format is being adopted by the community, bringing plenty of interoperability
all of us profit from. Train a model with Unsloth, deploy it with SGLang, and export it to llama.cpp to run locally! We
aim to maintain supporting the community going forward.



Striving for even simpler model contributions

To make it easier for the community to make use of transformers as a reference for model definitions, we attempt to
significantly reduce the barrier to model contributions. We’ve got been doing this effort for a number of years, but we’ll
speed up significantly over the following few weeks:

  • The modeling code of every model can be further simplified; with clear, concise APIs for an important
    components (KV cache, different Attention functions, kernel optimization)
  • We’ll deprecate redundant components in favor of getting an easy, single approach to use our APIs: encouraging
    efficient tokenization by deprecating slow tokenizers, and similarly using the fast vectorized vision processors.
  • We’ll proceed to bolster the work around modular model definitions, with the goal for brand spanking new models to require absolute
    minimal code changes. 6000 line contributions, 20 files changes for brand spanking new models are a thing of the past.



How does this affect you?



What this implies for you, as a model user

As a model user, in the longer term you need to see much more interoperability within the tools that you simply use.

This doesn’t mean that we intend to lock you in using transformers in your experiments; moderately, it signifies that
because of this modeling standardization, you’ll be able to expect the tools that you simply use for training, for inference, and for
production, to efficiently work together.



What this implies for you, as a model creator

As a model creator, because of this a single contribution will get your model available in all downstream libraries that
have integrated that modeling implementation. We’ve got seen this again and again through the years: releasing a model
is stressful and integrating in all vital libraries is commonly a major time-sink.

By standardizing the model implementation in a community-driven manner, we hope to lower the barrier of contributions
to the sphere across libraries.


We firmly consider this renewed direction will help standardize an ecosystem which is commonly susceptible to fragmentation.
We might love to listen to your feedback on the direction the team has decided to take; and of changes we could do to get
there. Please come and see us over on the
transformers-community support tab on the Hub!



Source link

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x