Total noob’s intro to Hugging Face Transformers

Welcome to “A Total Noob’s Introduction to Hugging Face Transformers,” a guide designed specifically for those trying to understand the bare basics of using open-source ML. Our goal is to demystify what Hugging Face Transformers is and the way it really works, not to show you right into a machine learning practitioner, but to enable higher understanding of and collaboration with those that are. That being said, the perfect solution to learn is by doing, so we’ll walk through an easy worked example of running Microsoft’s Phi-2 LLM in a notebook on a Hugging Face space.

You may wonder, with the abundance of tutorials on Hugging Face already available, why create one other? The reply lies in accessibility: most existing resources assume some technical background, including Python proficiency, which may prevent non-technical individuals from grasping ML fundamentals. As someone who got here from the business side of AI, I recognize that the educational curve presents a barrier and desired to offer a more approachable path for like-minded learners.

Subsequently, this guide is tailored for a non-technical audience keen to higher understand open-source machine learning without having to learn Python from scratch. We assume no prior knowledge and can explain concepts from the bottom up to make sure clarity. When you’re an engineer, you’ll find this guide a bit basic, but for beginners, it’s a great place to begin.

Let’s get stuck in… but first some context.

What’s Hugging Face Transformers?

Hugging Face Transformers is an open-source Python library that gives access to 1000’s of pre-trained Transformers models for natural language processing (NLP), computer vision, audio tasks, and more. It simplifies the strategy of implementing Transformer models by abstracting away the complexity of coaching or deploying models in lower level ML frameworks like PyTorch, TensorFlow and JAX.

What’s a library?

A library is just a set of reusable pieces of code that will be integrated into projects to implement functionality more efficiently without the necessity to write down your individual code from scratch.

Notably, the Transformers library provides re-usable code for implementing models in common frameworks like PyTorch, TensorFlow and JAX. This re-usable code will be accessed by calling upon functions (also referred to as methods) throughout the library.

What’s the Hugging Face Hub?

The Hugging Face Hub is a collaboration platform that hosts an enormous collection of open-source models and datasets for machine learning, consider it being like Github for ML. The hub facilitates sharing and collaborating by making it easy so that you can discover, learn, and interact with useful ML assets from the open-source community. The hub integrates with, and is used along side the Transformers library, as models deployed using the Transformers library are downloaded from the hub.

What are Hugging Face Spaces?

Spaces from Hugging Face is a service available on the Hugging Face Hub that gives a simple to make use of GUI for constructing and deploying web hosted ML demos and apps. The service lets you quickly construct ML demos, upload your individual apps to be hosted, and even select various pre-configured ML applications to deploy immediately.

Within the tutorial we’ll be deploying considered one of the pre-configured ML applications, a JupyterLab notebook, by choosing the corresponding docker container.

What’s a notebook?

Notebooks are interactive applications that can help you write and share live executable code interwoven with complementary narrative text. Notebooks are especially useful for Data Scientists and Machine Learning Engineers as they can help you experiment with code in realtime and simply review and share the outcomes.

Create a Hugging Face account

Go to hf.co, click “Sign Up” and create an account in case you don’t have already got one

Add your billing information

Inside your HF account go to Settings > Billing, add your bank card to the payment information section

Why do we’d like your bank card?

In an effort to run most LLMs you will need a GPU, which unfortunately aren’t free, you may nevertheless rent these from Hugging Face. Don’t worry, it shouldn’t cost you much. The GPU required for this tutorial, an NVIDIA A10G, only costs a few dollars per hour.

Create a Space to host your notebook

On hf.co go to Spaces > Create Recent

Configure your Space

Set your selected space name
Select Docker > JupyterLab to pick out the pre-configured notebook app
Select Space Hardware as “Nvidia A10G Small”
Every part else will be left as default
Select “Create Space”

What’s a docker template?

A Docker template is a predefined blueprint for a software environment that features the obligatory software and configurations, enabling developers to simply and rapidly deploy applications in a consistent and isolated way.

Why do I want to pick out a GPU Space Hardware?

By default, our Space comes with a complimentary CPU, which is advantageous for some applications. Nevertheless, the numerous computations required by LLMs profit significantly from being run in parallel to enhance speed, which is something GPUs are great at.

It is also vital to decide on a GPU with enough memory to store the model and providing spare working memory. In our case, an A10G Small with 24GB is enough for Phi-2.

After the Space has finished constructing, you will note a log in screen. When you left the token as default within the template, you may log in with “huggingface”. Otherwise, just use the token you set

Create a brand new notebook

Inside the “Launcher” tab, select the highest “Python 3” square under the “Notebook” heading, it will create a brand new notebook environment that has Python already installed

Install required packages

In your recent notebook you’ll need to put in the PyTorch and Transformers libraries, as they don’t come pre-installed within the environment .
This will be done by entering the !pip command + library name in your notebook. Click the play button to execute the code and watch because the libraries are installed (Alternatively: Hit CMD + Return / CTRL + Enter)

!pip install torch
!pip install transformers

What’s !pip install?

!pip is a command that installs Python packages from the Python Package Index (PyPI) an internet repository of libraries available to be used in a Python environment. It allows us to increase the functionality of Python applications by incorporating a wide selection of third-party add-ons.

If we’re using Transformers, why do we’d like Pytorch too?

Hugging Face is a library that’s built on top of other frameworks like Pytorch, Tensorflow and JAX. On this case we’re using Transformers with Pytorch and so need to put in it to access it’s functionality.

Import the AutoTokenizer and AutoModelForCausalLM classes from Transformers

Enter the next code on a brand new line and run it

from transformers import AutoTokenizer, AutoModelForCausalLM

What’s a Class?

Consider Classes as code recipes for creating these items called Objects. They’re useful because they permit us to save lots of Objects with a mixture of properties and functions. This in turn simplifies coding as all of the knowledge and operations needed for particular topics are accessible from the identical place. We’ll be using these Classes to create two Objects: a model and a tokenizer Object.

Why do I want to import the Class again after installing Transformers?

Although Transformers is already installed, the particular Classes inside Transformers usually are not routinely available to be used in your environment. Python requires us to explicitly import individual Classes because it helps avoid naming conflicts and ensures that only the obligatory parts of a library are loaded into your current working context.

Define which model you need to run

To detail the model you need to download and run from the Hugging Face Hub, it is advisable to specify the name of the model repo in your code
We do that by setting a variable equal to the model name, on this case we determine to call the variable model_id
We’ll use Microsoft’s Phi-2, a small but surprisingly capable model which will be found at https://huggingface.co/microsoft/phi-2. Note: Phi-2 is a base not an instruction tuned model and so will respond unusually in case you try to make use of it for chat

model_id = "microsoft/phi-2"

What’s an instruction tuned model?

An instruction-tuned language model is a sort of model that has been further trained from its base version to know and reply to commands or prompts given by a user, improving its ability to follow instructions. Base models are capable of autocomplete text, but often don’t reply to commands in a useful way. We’ll see this later once we attempt to prompt Phi.

Create a model object and cargo the model

To load the model from the Hugging Face Hub into our local environment we’d like to instantiate the model object. We do that by passing the “model_id” which we defined within the last step into the argument of the “.from_pretrained” method on the AutoModelForCausalLM Class.
Run your code and grab a drink, the model may take just a few minutes to download

model = AutoModelForCausalLM.from_pretrained(model_id)

What’s an argument?

An argument is input information that’s passed to a function to ensure that it to compute an output. We pass an argument right into a function by placing it between the function brackets. On this case the model ID is the only real argument, although functions can have multiple arguments, or none.

What’s a Method?

A Method is one other name for a function that specifically uses information from a selected Object or Class. On this case the .from_pretrained method uses information from the Class and the model_id to create a brand new model object.

Create a tokenizer object and cargo the tokenizer

To load the tokenizer you now must create a tokenizer object. To do that again pass the model_id as an argument into the .from_pretrained method on the AutoTokenizer Class.
Note there are some additional arguments, for the needs of this instance they aren’t vital to know so we won’t explain them.

tokenizer = AutoTokenizer.from_pretrained(model_id, add_eos_token=True, padding_side='left')

What’s a tokenizer?

A tokenizer is a tool that splits sentences into smaller pieces of text (tokens) and assigns each token a numeric value called an input id. This is required because our model only understands numbers, so we first must convert (a.k.a encode) the text right into a format the model can understand. Each model has it’s own tokenizer vocabulary, it’s vital to make use of the identical tokenizer that the model was trained on or it’ll misinterpret the text.

Create the inputs for the model to process

Define a brand new variable input_text that can take the prompt you need to give the model. On this case I asked “Who’re you?” but you may select whatever you like.
Pass the brand new variable as an argument to the tokenizer object to create the input_ids
Pass a second argument to the tokenizer object, return_tensors="pt", this ensures the token_id is represented as the proper type of vector for the model version we’re using (i.e. in Pytorch not Tensorflow)

input_text = "Who're you?"
input_ids = tokenizer(input_text, return_tensors="pt")

Run generation and decode the output

Now the input in the appropriate format we’d like to pass it into the model, we do that by calling the .generate method on the model object passing the input_ids as an argument and assigning it to a brand new variable outputs. We also set a second argument max_new_tokens equal to 100, this limts the variety of tokens the model will generate.
The outputs usually are not human readable yet, to return them to text we must decode the output. We will do that with the .decode method and saving that to the variable decoded_outputs
Finally, passing the decoded_output variable into the print function allows us to see the model output in our notebook.
Optional: Pass the outputs variable into the print function to see how they compare to the decoded outputs

outputs = model.generate(input_ids["input_ids"], max_new_tokens=100)
decoded_outputs = tokenizer.decode(outputs[0])
print(decoded_outputs)

Why do I want to decode?

Models only understand numbers, so once we provided our input_ids as vectors it returned an output in the identical format. To return those outputs to text we’d like to reverse the initial encoding we did using the tokenizer.

Why does the output read like a story?

Do not forget that Phi-2 is a base model that hasn’t been instruction tuned for conversational uses, as such it’s effectively a large auto-complete model. Based in your input it’s predicting what it thinks is most definitely to return next based on all the online pages, books and other content it has seen previously.

Congratulations, you’ve got run inference in your very first LLM!

I hope that working through this instance helped you to higher understand the world of open-source ML. If you need to proceed your ML learning journey, I like to recommend the recent Hugging Face course we released in partnership with DeepLearning AI.

Source link

Total noob’s intro to Hugging Face Transformers

What’s Hugging Face Transformers?

What’s a library?

What’s the Hugging Face Hub?

What are Hugging Face Spaces?

What’s a notebook?

Why do we’d like your bank card?

What’s a docker template?

Why do I want to pick out a GPU Space Hardware?

What’s !pip install?

If we’re using Transformers, why do we’d like Pytorch too?

What’s a Class?

Why do I want to import the Class again after installing Transformers?

What’s an instruction tuned model?

What’s an argument?

What’s a Method?

What’s a tokenizer?

Why do I want to decode?

Why does the output read like a story?

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Scale Synthetic Data and Physical AI Reasoning with NVIDIA Cosmos World Foundation Models

Personalized Restaurant Rating with a Two-Tower Embedding Variant

The way to Construct Agentic RAG with Hybrid Search

Google brings Gemini to the road

Scaling Vector Search: Comparing Quantization and Matryoshka Embeddings for 80% Cost Reduction

Total noob’s intro to Hugging Face Transformers

What’s Hugging Face Transformers?

What’s a library?

What’s the Hugging Face Hub?

What are Hugging Face Spaces?

What’s a notebook?

Why do we’d like your bank card?

What’s a docker template?

Why do I want to pick out a GPU Space Hardware?

What’s !pip install?

If we’re using Transformers, why do we’d like Pytorch too?

What’s a Class?

Why do I want to import the Class again after installing Transformers?

What’s an instruction tuned model?

What’s an argument?

What’s a Method?

What’s a tokenizer?

Why do I want to decode?

Why does the output read like a story?

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.