The Hugging Face Hub is an enormous repository, currently hosting
750K+ public models,
offering a various range of pre-trained models for various machine
learning frameworks. Amongst these,
346,268
(as of the time of writing) models are built using the favored
Transformers library.
The KerasHub library recently added an
integration with the Hub compatible with a primary batch of
33 models.
In this primary version, users of KerasHub were limited to only the
KerasHub-based models available on the Hugging Face Hub.
from keras_hub.models import GemmaCausalLM
gemma_lm = GemmaCausalLM.from_preset(
"hf://google/gemma-2b-keras"
)
They were in a position to train/fine-tune the model and upload it back to
the Hub (notice that the model continues to be a Keras model).
model.save_to_preset("./gemma-2b-finetune")
keras_hub.upload_preset(
"hf://username/gemma-2b-finetune",
"./gemma-2b-finetune"
)
They were missing out on the extensive collection of over 300K
models created with the transformers library. Figure 1 shows 4k
Gemma models within the Hub.
Nonetheless, what if we told you you could now access and use these
300K+ models with KerasHub, significantly expanding your model
selection and capabilities?
from keras_hub.models import GemmaCausalLM
gemma_lm = GemmaCausalLM.from_preset(
"hf://google/gemma-2b"
)
We’re thrilled to announce a big step forward for the Hub
community: Transformers and KerasHub now have a shared model save
format. Which means that models of the transformers library on the
Hugging Face Hub can now even be loaded directly into KerasHub – immediately
making an enormous range of fine-tuned models available to KerasHub users.
Initially, this integration focuses on enabling using
Gemma (1 and a couple of), Llama 3, and PaliGemma models, with plans
to expand compatibility to a wider range of architectures within the near future.
Use a wider range of frameworks
Because KerasHub models can seamlessly use TensorFlow, JAX,
or PyTorch backends, because of this an enormous range of model
checkpoints can now be loaded into any of those frameworks in a single
line of code. Found an incredible checkpoint on Hugging Face, but you want
you would deploy it to TFLite for serving or port it into JAX to do
research? Now you may!
How one can use it
Using the combination requires updating your Keras versions
$ pip install -U -q keras-hub
$ pip install -U keras>=3.3.3
Once updated, trying out the combination is so simple as:
from keras_hub.models import Llama3CausalLM
causal_lm = Llama3CausalLM.from_preset(
"hf://NousResearch/Hermes-2-Pro-Llama-3-8B"
)
causal_lm.summary()
Under the Hood: How It Works
Transformers models are stored as a set of config files in JSON format,
a tokenizer (normally also a .JSON file), and a set of
safetensors weights
files. The actual modeling code is contained within the Transformers
library itself. Which means that cross-loading a Transformers checkpoint
into KerasHub is comparatively straightforward so long as each libraries
have modeling code for the relevant architecture. All we want to do is
map config variables, weight names, and tokenizer vocabularies from one
format to the opposite, and we create a KerasHub checkpoint from a
Transformers checkpoint, or vice-versa.
All of that is handled internally for you, so you may give attention to trying
out the models moderately than converting them!
Common Use Cases
Generation
A primary use case of language models is to generate text. Here is an
example to load a transformers model and generate latest tokens using
the .generate method from KerasHub.
from keras_hub.models import Llama3CausalLM
causal_lm = Llama3CausalLM.from_preset(
"hf://NousResearch/Hermes-2-Pro-Llama-3-8B"
)
prompts = [
"""<|im_start|>system
You are a sentient, superintelligent artificial general intelligence, here to teach and assist me.<|im_end|>
<|im_start|>user
Write a short story about Goku discovering kirby has teamed up with Majin Buu to destroy the world.<|im_end|>
<|im_start|>assistant""",
]
causal_lm.generate(prompts, max_length=200)[0]
Changing precision
You may change the precision of your model using keras.config like so
import keras
keras.config.set_dtype_policy("bfloat16")
from keras_hub.models import Llama3CausalLM
causal_lm = Llama3CausalLM.from_preset(
"hf://NousResearch/Hermes-2-Pro-Llama-3-8B"
)
Using the checkpoint with JAX backend
To check drive a model using JAX, you may leverage Keras to run the
model with the JAX backend. This may be achieved by simply switching
Keras’s backend to JAX. Here’s how you should use the model inside the
JAX environment.
import os
os.environ["KERAS_BACKEND"] = "jax"
from keras_hub.models import Llama3CausalLM
causal_lm = Llama3CausalLM.from_preset(
"hf://NousResearch/Hermes-2-Pro-Llama-3-8B"
)
Gemma 2
We’re pleased to tell you that the Gemma 2 models are also
compatible with this integration.
from keras_hub.models import GemmaCausalLM
causal_lm = keras_hub.models.GemmaCausalLM.from_preset(
"hf://google/gemma-2-9b"
)
PaliGemma
It’s also possible to use any PaliGemma safetensor checkpoint in your KerasHub pipeline.
from keras_hub.models import PaliGemmaCausalLM
pali_gemma_lm = PaliGemmaCausalLM.from_preset(
"hf://gokaygokay/sd3-long-captioner"
)
What’s Next?
That is only the start. We envision expanding this integration to
encompass a fair wider range of Hugging Face models and architectures.
Stay tuned for updates and be sure you explore the incredible potential
that this collaboration unlocks!
I would love to take this chance to thank
Matthew Carrigan and
Matthew Watson for his or her
assist in all the process.
