Home Artificial Intelligence Large Language Models as Zero-shot Labelers Introduction What’s Zero-Shot Learning? Zero-Shot Learning with Large Language Models Conclusion

Large Language Models as Zero-shot Labelers Introduction What’s Zero-Shot Learning? Zero-Shot Learning with Large Language Models Conclusion

2
Large Language Models as Zero-shot Labelers
Introduction
What’s Zero-Shot Learning?
Zero-Shot Learning with Large Language Models
Conclusion

Using LLMs to acquire labels for supervised models

Photo by h heyerlein on Unsplash

Labeling data is a critical step in constructing supervised machine learning models, as the amount and quality of labels is usually the principal factor that determines model performance.

Nonetheless, labeling data might be very time-consuming and expensive, especially for complex tasks that involve domain knowledge or reading large amounts of knowledge.

In recent times, large language models (LLMs) have emerged as a strong solution for obtaining labels on text data. Through zero-shot learning, we will obtain labels on unlabeled data using only the output of the LLM, fairly than having to ask a human to acquire the labels. This will significantly lower the fee of obtaining labels, and makes the method way more salable.

In this text, we’ll further explore the concept of zero-shot learning and the way LLMs might be used for this purpose.

Photo by Alex Knight on Unsplash

Zero-shot learning (ZSL) is a an issue setup in machine learning through which the model is asked to resolve a prediction task that it was not trained on. This often involves recognizing or classifying data into concepts it had not explicitly seen during training.

In traditional supervised learning, this isn’t possible, because the model can only output predictions for tasks it was trained on (i.e. had labels for). Nonetheless, within the ZSL paradigm, models can generalize to an arbitrary unseen task, and perform at an inexpensive level. Note that normally, a supervised model trained on a given task will still outperform a model using ZSL, so ZSL is more often used before supervised labels are available.

One of the crucial promising applications of ZSL is in data labeling, where it might significantly reduce the fee of obtaining labels. If a model is in a position to routinely classify data into categories without having been trained on that task, it might be used to generate labels for a downstream supervised model. These labels might be used to bootstrap a supervised model, in a paradigm just like lively learning or human-in-the-loop machine learning.

Photo by Arseny Togulev on Unsplash

LLMs like GPT-3 are powerful tools for ZSL because their robust pre-training process allows them to have a holistic understanding of natural language that isn’t based on a certain supervised task’s labels.

Embedding search

The contextual embeddings of a LLM are in a position to capture the semantic concepts in a given piece of text, which make them very useful for ZSL.

Libraries comparable to sentence-transformers offer LLMs which have been trained in such a way that semantically similar pieces of text can have embeddings which have a small distance from one another.

If now we have the embeddings for a number of labeled pieces of knowledge, we will use a nearest-neighbor search to seek out pieces of unlabeled data with similar embeddings.

If two pieces of text are very close to one another within the embedding space, then they likely have the identical label.

In-context learning

In-context learning is an emergent ability of LLMs that enables them to learn to resolve recent tasks just by seeing input-output pairs. No parameter updates of the model are needed for it to give you the option to learn arbitrary recent tasks.

We are able to use this ability to acquire labels by simply providing a number of input-output pairs for our downstream task, and permit the model to supply labels for unlabeled data points.

Within the context of ZSL, which means we will provide a number of handcrafted examples of text with their associated supervised labels, and have the model learn the labeling function on-the-fly.

On this trivial case, we train ChatGPT to categorise whether or not a sentence is about frogs via in-context learning.

Generative models

Recent advances in alignment methods comparable to RLHF (Reinforcement Learning from Human Feedback) in generative LLMs have made it possible to easily ask the model to label data for you.

Models comparable to ChatGPT are in a position to provide labels for input data by simply replying (in language) with the specified label. Their vast knowledge of the world obtained through pre-training on such large amounts of knowledge have endowed these models with the flexibility to resolve novel tasks using only their semantic understanding of the query being asked.

This process might be automated using open-sourced models comparable to FLAN-T5 by asking the model to reply with only items in your label set (e.g. “Respond with ‘Yes’ or ‘No’”), and checking which option has the very best output probability after asking the model for labels.

ChatGPT is in a position to not only provide a label, but in addition explain its logic for obtaining said label.

Labeling data is a critical step in supervised machine learning, but it might be costly to acquire large amounts of labeled data.

With zero-shot learning and LLMs, we will significantly reduce the fee of label acquisition.

LLMs pre-trained on huge amounts of knowledge encode a semantic understanding of the world’s information that allow them to have high performance on arbitrary, unseen tasks. These models can routinely label data for us with high accuracy, allowing us to bootstrap supervised models at a low price.

2 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here