Complete Beginner’s Guide to Hugging Face LLM Tools

Artificial Intelligence

Complete Beginner’s Guide to Hugging Face LLM Tools

admin

September 20, 2023

Complete Beginner’s Guide to Hugging Face LLM Tools

Hugging Face is an AI research lab and hub that has built a community of students, researchers, and enthusiasts. In a brief span of time, Hugging Face has garnered a considerable presence within the AI space. Tech giants including Google, Amazon, and Nvidia have bolstered AI startup Hugging Face with significant investments, making its valuation $4.5 billion.

On this guide, we’ll introduce transformers, LLMs and the way the Hugging Face library plays a vital role in fostering an opensource AI community. We’ll also walk through the essential features of Hugging Face, including pipelines, datasets, models, and more, with hands-on Python examples.

Transformers in NLP

In 2017, Cornell University published an influential paper that introduced transformers. These are deep learning models utilized in NLP. This discovery fueled the event of enormous language models like ChatGPT.

Large language models or LLMs are AI systems that use transformers to know and create human-like text. Nonetheless, creating these models is dear, often requiring thousands and thousands of dollars, which limits their accessibility to large firms.

Hugging Face, began in 2016, goals to make NLP models accessible to everyone. Despite being a business company, it offers a variety of open-source resources helping people and organizations to affordably construct and use transformer models. Machine learning is about teaching computers to perform tasks by recognizing patterns, while deep learning, a subset of machine learning, creates a network that learns independently. Transformers are a variety of deep learning architecture that effectively and flexibly uses input data, making it a well-liked selection for constructing large language models as a result of lesser training time requirements.

How Hugging Face Facilitates NLP and LLM Projects

Hugging Face has made working with LLMs simpler by offering:

A variety of pre-trained models to select from.
Tools and examples to fine-tune these models to your specific needs.
Easy deployment options for various environments.

An amazing resource available through Hugging Face is the Open LLM Leaderboard. Functioning as a comprehensive platform, it systematically monitors, ranks, and gauges the efficiency of a spectrum of Large Language Models (LLMs) and chatbots, providing a discerning evaluation of the advancements within the open-source domain

LLM Benchmarks measures models through 4 metrics:

AI2 Reasoning Challenge (25-shot) — a series of questions around elementary science syllabus.
HellaSwag (10-shot) — a commonsense inference test that, though easy for humans this metric is a big challenge for cutting-edge models.
MMLU (5-shot) — a multifaceted evaluation touching upon a text model’s proficiency across 57 diverse domains, encompassing basic math, law, and computer science, amongst others.
TruthfulQA (0-shot) — a tool to establish the tendency of a model to echo continuously encountered online misinformation.

The benchmarks, that are described using terms akin to “25-shot”, “10-shot”, “5-shot”, and “0-shot”, indicate the variety of prompt examples that a model is given in the course of the evaluation process to gauge its performance and reasoning abilities in various domains. In “few-shot” paradigms, models are supplied with a small variety of examples to assist guide their responses, whereas in a “0-shot” setting, models receive no examples and must rely solely on their pre-existing knowledge to reply appropriately.

Components of Hugging Face

Pipelines

‘pipelines‘ are a part of Hugging Face’s transformers library a feature that helps in the straightforward utilization of pre-trained models available within the Hugging Face repository. It provides an intuitive API for an array of tasks, including sentiment evaluation, query answering, masked language modeling, named entity recognition, and summarization.

Pipelines integrate three central Hugging Face components:

Tokenizer: Prepares your text for the model by converting it right into a format the model can understand.
Model: That is the center of the pipeline where the actual predictions are made based on the preprocessed input.
Post-processor: Transforms the model’s raw predictions right into a human-readable form.

These pipelines not only reduce extensive coding but in addition offer a user-friendly interface to perform various NLP tasks.

Transformer Applications using the Hugging Face library

A highlight of the Hugging Face library is the Transformers library, which simplifies NLP tasks by connecting a model with mandatory pre and post-processing stages, streamlining the evaluation process. To put in and import the library, use the next commands:

pip install -q transformers
from transformers import pipeline

Having done that, you possibly can execute NLP tasks starting with sentiment evaluation, which categorizes text into positive or negative sentiments. The library’s powerful pipeline() function serves as a hub encompassing other pipelines and facilitating task-specific applications in audio, vision, and multimodal domains.

Practical Applications

Text Classification

Text classification becomes a breeze with Hugging Face’s pipeline() function. Here’s how you possibly can initiate a text classification pipeline:

classifier = pipeline("text-classification")

For a hands-on experience, feed a string or list of strings into your pipeline to acquire predictions, which will be neatly visualized using Python’s Pandas library. Below is a Python snippet demonstrating this:

sentences = ["I am thrilled to introduce you to the wonderful world of AI.",
"Hopefully, it won't disappoint you."]
# Get classification results for every sentence within the list
results = classifier(sentences)
# Loop through each result and print the label and rating
for i, end in enumerate(results):
print(f"Result {i + 1}:")
print(f" Label: {result['label']}")
print(f" Rating: {round(result['score'], 3)}n")

Output

Result 1: 
Label: POSITIVE 
Rating: 1.0 
Result 2: 
Label: POSITIVE 
Rating: 0.996

Named Entity Recognition (NER)

NER is pivotal in extracting real-world objects termed ‘named entities’ from the text. Utilize the NER pipeline to discover these entities effectively:

ner_tagger = pipeline("ner", aggregation_strategy="easy")
text = "Elon Musk is the CEO of SpaceX."
outputs = ner_tagger(text)
print(outputs)

Output

 Result 1: Label: POSITIVE Rating: 1.0 Result 2: Label: POSITIVE Rating: 0.996

Query Answering

Query answering involves extracting precise answers to specific questions from a given context. Initialize a question-answering pipeline and input your query and context to get the specified answer:

reader = pipeline("question-answering")
text = "Hugging Face is an organization creating tools for NLP. It is predicated in Recent York and was founded in 2016."
query = "Where is Hugging Face based?"
outputs = reader(query=query, context=text)
print(outputs)

Output

 {'rating': 0.998, 'start': 51, 'end': 60, 'answer': 'Recent York'}

Hugging Face’s pipeline function offers an array of pre-built pipelines for various tasks, apart from text classification, NER, and query answering. Below are details on a subset of obtainable tasks:

Table: Hugging Face Pipeline Tasks

Task	Description	Pipeline Identifier
Text Generation	Generate text based on a given prompt	pipeline(task=”text-generation”)
Summarization	Summarize a lengthy text or document	pipeline(task=”summarization”)
Image Classification	Label an input image	pipeline(task=”image-classification”)
Audio Classification	Categorize audio data	pipeline(task=”audio-classification”)
Visual Query Answering	Answer a question using each a picture and an issue	pipeline(task=”vqa”)

For detailed descriptions and more tasks, discuss with the pipeline documentation on Hugging Face’s website.

Why Hugging Face is shifting its concentrate on Rust

Hugging face Safetensors and tokenizer Rust

Hugging face Safetensors and tokenizer GitHub Page

The Hugging Face (HF) ecosystem began utilizing Rust in its libraries akin to safesensors and tokenizers.

Hugging Face has very recently also released a recent machine-learning framework called Candle. Unlike traditional frameworks that use Python, Candle is built with Rust. The goal behind using Rust is to reinforce performance and simplify the user experience while supporting GPU operations.

The important thing objective of Candle is to facilitate serverless inference, making the deployment of lightweight binaries possible and removing Python from the production workloads, which might sometimes decelerate processes as a result of its overheads. This framework comes as an answer to beat the problems encountered with full machine learning frameworks like PyTorch which can be large and slow when creating instances on a cluster.

Let’s explore why Rust is becoming a popular selection way more than Python.

Speed and Performance – Rust is thought for its incredible speed, outperforming Python, which is traditionally utilized in machine learning frameworks. Python’s performance can sometimes be slowed down as a result of its Global Interpreter Lock (GIL), but Rust doesn’t face this issue, promising faster execution of tasks and, subsequently, improved performance in projects where it’s implemented.
Safety – Rust provides memory safety guarantees and not using a garbage collector, a facet that is important in ensuring the protection of concurrent systems. This plays an important role in areas like safetensors where safety in handling data structures is a priority.

Safetensors

Safetensors profit from Rust’s speed and safety features. Safetensors involves the manipulation of tensors, a fancy mathematical entity, and having Rust ensures that the operations are usually not just fast, but in addition secure, avoiding common bugs and security issues that might arise from memory mishandling.

Tokenizer

Tokenizers handle the breaking down of sentences or phrases into smaller units, akin to words or terms. Rust aids on this process by speeding up the execution time, ensuring that the tokenization process will not be just accurate but in addition swift, enhancing the efficiency of natural language processing tasks.

On the core of Hugging Face’s tokenizer is the concept of subword tokenization, striking a fragile balance between word and character-level tokenization to optimize information retention and vocabulary size. It functions through the creation of subtokens, akin to “##ing” and “##ed”, retaining semantic richness while avoiding a bloated vocabulary.

Subword tokenization involves a training phase to discover probably the most efficacious balance between character and word-level tokenization. It goes beyond mere prefix and suffix rules, requiring a comprehensive evaluation of language patterns in extensive text corpora to design an efficient subword tokenizer. The generated tokenizer is adept at handling novel words by breaking them down into known subwords, maintaining a high level of semantic understanding.

Tokenization Components

Normalization and pre-tokenization Hugging face

https://huggingface.co/learn/nlp-course/chapter6/4

The tokenizers library divides the tokenization process into several steps, each addressing a definite facet of tokenization. Let’s delve into these components:

Normalizer: Takes initial transformations on the input string, applying mandatory adjustments akin to lowercase conversion, Unicode normalization, and stripping.
PreTokenizer: Liable for fragmenting the input string into pre-segments, determining the splits based on predefined rules, akin to space delineations.
Model: Oversees the invention and creation of subtokens, adapting to the specifics of your input data and offering training capabilities.
Post-Processor: Enhances construction features to facilitate compatibility with many transformer-based models, like BERT, by adding tokens akin to [CLS] and [SEP].

To start with Hugging Face tokenizers, install the library using the command pip install tokenizers and import it into your Python environment. The library can tokenize large amounts of text in little or no time, thereby saving precious computational resources for more intensive tasks like model training.

The tokenizers library uses Rust which inherits C++’s syntactical similarity while introducing novel concepts in programming language design. Coupled with Python bindings, it ensures you benefit from the performance of a lower-level language while working in a Python environment.

Datasets

Hugging face datasets

Datasets are the bedrock of AI projects. Hugging Face offers a wide range of datasets, suitable for a variety of NLP tasks, and more. To utilize them efficiently, understanding the technique of loading and analyzing them is important. Below is a well-commented Python script demonstrating how you can explore datasets available on Hugging Face:

from datasets import load_dataset
# Load a dataset
dataset = load_dataset('squad')
# Display the primary entry
print(dataset[0])

This script uses the load_dataset function to load the SQuAD dataset, which is a well-liked selection for question-answering tasks.

Leveraging Pre-trained Models and bringing all of it together

Hugging Face Models

Pre-trained models form the backbone of many deep learning projects, enabling researchers and developers to jumpstart their initiatives without ranging from scratch. Hugging Face facilitates the exploration of a various range of pre-trained models, as shown within the code below:

from transformers import AutoModelForQuestionAnswering, AutoTokenizer
# Load the pre-trained model and tokenizer
model = AutoModelForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
tokenizer = AutoTokenizer.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
# Display the model's architecture
print(model)

With the model and tokenizer loaded, we will now proceed to create a function that takes a bit of text and an issue as inputs and returns the reply extracted from the text. We are going to utilize the tokenizer to process the input text and query right into a format that’s compatible with the model, after which we are going to feed this processed input into the model to get the reply:

def get_answer(text, query):
    # Tokenize the input text and query
    inputs = tokenizer(query, text, return_tensors="pt", max_length=512, truncation=True)
    outputs = model(**inputs)
    # Get the beginning and end scores for the reply
    answer_start = torch.argmax(outputs.start_logits)
    answer_end = torch.argmax(outputs.end_logits) + 1
    answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(inputs['input_ids'][0][answer_start:answer_end]))
    return answer

Within the code snippet, we import mandatory modules from the transformers package, then load a pre-trained model and its corresponding tokenizer using the from_pretrained method. We decide a BERT model fine-tuned on the SQuAD dataset.

Let’s have a look at an example use case of this function where we have now a paragraph of text and we would like to extract a selected answer to an issue from it:

text = """
The Eiffel Tower, situated in Paris, France, is some of the iconic landmarks on this planet. It was designed by Gustave Eiffel and accomplished in 1889. The tower stands at a height of 324 meters and was the tallest man-made structure on this planet on the time of its completion.
"""
query = "Who designed the Eiffel Tower?"
# Get the reply to the query
answer = get_answer(text, query)
print(f"The reply to the query is: {answer}")
# Output: The reply to the query is: Gustave Eiffel

On this script, we construct a get_answer function that takes a text and an issue, tokenizes them appropriately, and leverages the pre-trained BERT model to extract the reply from the text. It demonstrates a practical application of Hugging Face’s transformers library to construct a straightforward yet powerful question-answering system. To understand the concepts well, it is suggested to have a hands-on experimentation using a Google Colab Notebook.

Conclusion

Through its extensive range of open-source tools, pre-trained models, and user-friendly pipelines, it enables each seasoned professionals and newcomers to delve into the expansive world of AI with a way of ease and understanding. Furthermore, the initiative to integrate Rust, owing to its speed and safety features, underscores Hugging Face’s commitment to fostering innovation while ensuring efficiency and security in AI applications. The transformative work of Hugging Face not only democratizes access to high-level AI tools but in addition nurtures a collaborative environment for learning and development within the AI space, facilitating a future where AI is accessible to