Few-Shot Aspect Based Sentiment Evaluation using SetFit

SetFitABSA is an efficient technique to detect the sentiment towards specific points inside the text.

Aspect-Based Sentiment Evaluation (ABSA) is the duty of detecting the sentiment towards specific points inside the text. For instance, within the sentence, “This phone has an awesome screen, but its battery is simply too small”, the aspect terms are “screen” and “battery” and the sentiment polarities towards them are Positive and Negative, respectively.

ABSA is widely utilized by organizations for extracting worthwhile insights by analyzing customer feedback towards points of services or products in various domains. Nonetheless, labeling training data for ABSA is a tedious task due to the fine-grained nature (token level) of manually identifying points inside the training samples.

Intel Labs and Hugging Face are excited to introduce SetFitABSA, a framework for few-shot training of domain-specific ABSA models; SetFitABSA is competitive and even outperforms generative models corresponding to Llama2 and T5 in few-shot scenarios.

In comparison with LLM based methods, SetFitABSA has two unique benefits:

🗣 No prompts needed: few-shot in-context learning with LLMs requires handcrafted prompts which make the outcomes brittle, sensitive to phrasing and depending on user expertise. SetFitABSA dispenses with prompts altogether by generating wealthy embeddings directly from a small variety of labeled text examples.

🏎 Fast to coach: SetFitABSA requires only a handful of labeled training samples; as well as, it uses a straightforward training data format, eliminating the necessity for specialised tagging tools. This makes the info labeling process fast and straightforward.

On this blog post, we’ll explain how SetFitABSA works and how one can train your very own models using the SetFit library. Let’s dive in!

How does it work?

SetFitABSA’s three-stage training process

SetFitABSA is comprised of three steps. Step one extracts aspect candidates from the text, the second yields the points by classifying the aspect candidates as points or non-aspects, and the ultimate step associates a sentiment polarity to every extracted aspect. Steps two and three are based on SetFit models.

Training

1. Aspect candidate extraction

On this work we assume that points, which are often features of services and products, are mostly nouns or noun compounds (strings of consecutive nouns). We use spaCy to tokenize and extract nouns/noun compounds from the sentences within the (few-shot) training set. Since not all extracted nouns/noun compounds are points, we confer with them as aspect candidates.

2. Aspect/Non-aspect classification

Now that we’ve aspect candidates, we’d like to coach a model to have the opportunity to tell apart between nouns which are points and nouns which are non-aspects. For this purpose, we’d like training samples with aspect/no-aspect labels. This is finished by considering points within the training set as True points, while other non-overlapping candidate points are considered non-aspects and due to this fact labeled as False:

Training sentence: “Waiters aren’t friendly however the cream pasta is out of this world.”
Tokenized: [Waiters, are, n’t, friendly, but, the, cream, pasta, is, out, of, this, world, .]
Extracted aspect candidates: [Waiters, are, n’t, friendly, but, the, cream, pasta, is, out, of, this, world, .]
Gold labels from training set, in BIO format: [B-ASP, O, O, O, O, O, B-ASP, I-ASP, O, O, O, O, O, .]
Generated aspect/non-aspect Labels: [Waiters, are, n’t, friendly, but, the, cream, pasta, is, out, of, this, world, .]

Now that we’ve all of the aspect candidates labeled, how can we use it to coach the candidate aspect classification model? In other words, how can we use SetFit, a sentence classification framework, to categorise individual tokens? Well, that is the trick: each aspect candidate is concatenated with your complete training sentence to create a training instance using the next template:

aspect_candidate:training_sentence

Applying the template to the instance above will generate 3 training instances – two with True labels representing aspect training instances, and one with False label representing non-aspect training instance:

Text	Label
Waiters:Waiters aren’t friendly however the cream pasta is out of this world.	1
cream pasta:Waiters aren’t friendly however the cream pasta is out of this world.	1
world:Waiters aren’t friendly however the cream pasta is out of this world.	0
…	…

After generating the training instances, we’re able to use the facility of SetFit to coach a few-shot domain-specific binary classifier to extract points from an input text review. This will probably be our first fine-tuned SetFit model.

3. Sentiment polarity classification

Once the system extracts the points from the text, it must associate a sentiment polarity (e.g., positive, negative or neutral) to every aspect. For this purpose, we use a 2nd SetFit model and train it similarly to the aspect extraction model as illustrated in the next example:

Training sentence: “Waiters aren’t friendly however the cream pasta is out of this world.”
Tokenized: [Waiters, are, n’t, friendly, but, the, cream, pasta, is, out, of, this, world, .]
Gold labels from training set: [NEG, O, O, O, O, O, POS, POS, O, O, O, O, O, .]

Text	Label
Waiters:Waiters aren’t friendly however the cream pasta is out of this world.	NEG
cream pasta:Waiters aren’t friendly however the cream pasta is out of this world.	POS
…	…

Note that versus the aspect extraction model, we do not include non-aspects on this training set since the goal is to categorise the sentiment polarity towards real points.

Running inference

At inference time, the test sentence passes through the spaCy aspect candidate extraction phase, leading to test instances using the template aspect_candidate:test_sentence. Next, non-aspects are filtered by the aspect/non-aspect classifier. Finally, the extracted points are fed to the sentiment polarity classifier that predicts the sentiment polarity per aspect.

In practice, this implies the model can receive normal text as input, and output points and their sentiments:

Model Input:

"their dinner specials are implausible."

Model Output:

[{'span': 'dinner specials', 'polarity': 'positive'}]

Benchmarking

SetFitABSA was benchmarked against the recent state-of-the-art work by AWS AI Labs and Salesforce AI Research that finetune T5 and GPT2 using prompts. To get a more complete picture, we also compare our model to the Llama-2-chat model using in-context learning.
We use the favored Laptop14 and Restaurant14 ABSA datasets from the Semantic Evaluation Challenge 2014 (SemEval14).
SetFitABSA is evaluated each on the intermediate task of aspect term extraction (SB1) and on the total ABSA task of aspect extraction together with their sentiment polarity predictions (SB1+SB2).

Model size comparison

Model	Size (params)
Llama-2-chat	7B
T5-base	220M
GPT2-base	124M
GPT2-medium	355M
SetFit (MPNet)	2x 110M

Note that for the SB1 task, SetFitABSA is 110M parameters, for SB2 it’s 110M parameters, and for SB1+SB2 SetFitABSA consists of 220M parameters.

Performance comparison

We see a transparent advantage of SetFitABSA when the number of coaching instances is low, despite being 2x smaller than T5 and x3 smaller than GPT2-medium. Even compared to Llama 2, which is x64 larger, the performance is on par or higher.

SetFitABSA vs GPT2

SetFitABSA vs T5

Note that for fair comparison, we conducted comparisons with SetFitABSA against precisely the dataset splits utilized by the varied baselines (GPT2, T5, etc.).

SetFitABSA vs Llama2

We notice that increasing the variety of in-context training samples for Llama2 didn’t lead to improved performance. This phenomenon has been shown for ChatGPT before, and we predict it must be further investigated.

Training your personal model

SetFitABSA is an element of the SetFit framework. To coach an ABSA model, start by installing setfit with the absa option enabled:

python -m pip install -U "setfit[absa]"

Moreover, we must install the en_core_web_lg spaCy model:

python -m spacy download en_core_web_lg

We proceed by preparing the training set. The format of the training set is a Dataset with the columns text, span, label, ordinal:

text: The complete sentence or text containing the points.
span: A facet from the total sentence. Could be multiple words. For instance: “food”.
label: The (polarity) label corresponding to the aspect span. For instance: “positive”. The label names may be chosen arbitrarily when tagging the collected training data.
ordinal: If the aspect span occurs multiple times within the text, then this ordinal represents the index of those occurrences. Often that is just 0, as each aspect often appears just once within the input text.

For instance, the training text “Restaurant with wonderful food but worst service I ever seen” incorporates two points, so will add two lines to the training set table:

Text	Span	Label	Ordinal
Restaurant with wonderful food but worst service I ever seen	food	positive	0
Restaurant with wonderful food but worst service I ever seen	service	negative	0
…	…	…	…

Once we’ve the training dataset ready we are able to create an ABSA trainer and execute the training. SetFit models are fairly efficient to coach, but as SetFitABSA involves two models trained sequentially, it is suggested to make use of a GPU for training to maintain the training time low. For instance, the next training script trains a full SetFitABSA model in about 10 minutes with the free Google Colab T4 GPU.

from datasets import load_dataset
from setfit import AbsaTrainer, AbsaModel



train_dataset = load_dataset("tomaarsen/setfit-absa-semeval-restaurants", split="train[:128]")


model = AbsaModel.from_pretrained("sentence-transformers/paraphrase-mpnet-base-v2")


trainer = AbsaTrainer(model, train_dataset=train_dataset)

trainer.train()

That is it! We have now trained a domain-specific ABSA model. We will save our trained model to disk or upload it to the Hugging Face hub. Keep in mind that the model incorporates two submodels, so each is given its own path:

model.save_pretrained(
    "models/setfit-absa-model-aspect", 
    "models/setfit-absa-model-polarity"
)

model.push_to_hub(
    "tomaarsen/setfit-absa-paraphrase-mpnet-base-v2-restaurants-aspect",
    "tomaarsen/setfit-absa-paraphrase-mpnet-base-v2-restaurants-polarity"
)

Now we are able to use our trained model for inference. We start by loading the model:

from setfit import AbsaModel

model = AbsaModel.from_pretrained(
    "tomaarsen/setfit-absa-paraphrase-mpnet-base-v2-restaurants-aspect",
    "tomaarsen/setfit-absa-paraphrase-mpnet-base-v2-restaurants-polarity"
)

Then, we use the predict API to run inference. The input is a listing of strings, each representing a textual review:

preds = model.predict([
    "Best pizza outside of Italy and really tasty.",
    "The food variations are great and the prices are absolutely fair.",
    "Unfortunately, you have to expect some waiting time and get a note with a waiting number if it should be very full."
])

print(preds)

For more details on training options, saving and loading models, and inference see the SetFit docs.

References

Maria Pontiki, Dimitris Galanis, John Pavlopoulos, Harris Papageorgiou, Ion Androutsopoulos, and Suresh Manandhar. 2014. SemEval-2014 task 4: Aspect based sentiment evaluation. In Proceedings of the eighth International Workshop on Semantic Evaluation (SemEval 2014), pages 27–35.
Siddharth Varia, Shuai Wang, Kishaloy Halder, Robert Vacareanu, Miguel Ballesteros, Yassine Benajiba, Neha Anna John, Rishita Anubhai, Smaranda Muresan, Dan Roth, 2023 “Instruction Tuning for Few-Shot Aspect-Based Sentiment Evaluation”. https://arxiv.org/abs/2210.06629
Ehsan Hosseini-Asl, Wenhao Liu, Caiming Xiong, 2022. “A Generative Language Model for Few-shot Aspect-Based Sentiment Evaluation”. https://arxiv.org/abs/2204.05356
Lewis Tunstall, Nils Reimers, Unso Eun Search engine optimisation Jo, Luke Bates, Daniel Korat, Moshe Wasserblat, Oren Pereg, 2022. “Efficient Few-Shot Learning Without Prompts”. https://arxiv.org/abs/2209.11055

Source link

Few-Shot Aspect Based Sentiment Evaluation using SetFit

How does it work?

Training

Running inference

Benchmarking

Model size comparison

Performance comparison

Training your personal model

References

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Zero-Waste Agentic RAG: Designing Caching Architectures to Minimize Latency and LLM Costs at Scale

Context Engineering as Your Competitive Edge

Constructing Telco Reasoning Models for Autonomous Networks with NVIDIA NeMo

5 Latest Digital Twin Products Developers Can Use to Construct 6G Networks

Claude Skills and Subagents: Escaping the Prompt Engineering Hamster Wheel

Few-Shot Aspect Based Sentiment Evaluation using SetFit

How does it work?

Training

Running inference

Benchmarking

Model size comparison

Performance comparison

Training your personal model

References

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.