Classifier-Free Guidance for LLMs Performance Enhancing

-

Classifier-free guidance is a really useful technique within the media-generation domain (images, videos, music). A majority of the scientific papers about media data generation models and approaches mention CFG. I find this paper as a fundamental research about classifier-free guidance — it began within the image generation domain. The next is mentioned within the paper:

…we mix the resulting conditional and unconditional rating estimates to achieve a trade-off between sample quality and variety much like that obtained using classifier guidance.

So the classifier-free guidance is predicated on conditional and unconditional rating estimates and is following the previous approach of classifier guidance. Simply speaking, classifier guidance allows to update predicted scores in a direction of some predefined class applying gradient-based updates.

An abstract example for classifier guidance: let’s say now we have predicted image Y and a classifier that’s predicting if the image has positive or negative meaning; we would like to generate positive images, so we would like prediction Y to be aligned with the positive class of the classifier. To try this we will calculate how we must always change Y so it may well be classified as positive by our classifier — calculate gradient and update the Y within the corresponding way.

Classifier-free guidance was created with the identical purpose, nonetheless it doesn’t do any gradient-based updates. For my part, classifier-free guidance is way simpler to grasp from its implementation formula for diffusion based image generation:

Image from https://arxiv.org/pdf/2207.12598 — Classifier-free guidance formula for image generation

The formula might be rewritten in a following way:

Image by writer — Classifier-free guidance formula rewritten

Several things are clear from the rewritten formula:

  1. When CFG_coefficient equals 1, the updated prediction equals conditional prediction (so no CFG applied the truth is);
  2. When CFG_coefficient > 1, those scores which can be higher in conditional prediction in comparison with unconditional prediction turn out to be even higher in updated prediction, while those which can be lower — turn out to be even lower.

The formula has no gradients, it’s working with the anticipated scores itself. Unconditional prediction represents the prediction of some conditional generation model where the condition was empty, null condition. At the identical time this unconditional prediction might be replaced by negative-conditional prediction, once we replace null condition with some negative condition and expect “negation” from this condition by applying CFG formula to update the ultimate scores.

Classifier-free guidance for LLM text generation was described in this paper. Following the formulas from the paper, CFG for text models was implemented in HuggingFace Transformers: in the present latest transformers version 4.47.1 within the “UnbatchedClassifierFreeGuidanceLogitsProcessor” function the next is mentioned:

The processors computes a weighted average across scores from prompt conditional and prompt unconditional (or negative) logits, parameterized by the `guidance_scale`.
The unconditional scores are computed internally by prompting `model` with the `unconditional_ids` branch.

See [the paper](https://arxiv.org/abs/2306.17806) for more information.

The formula to sample next token based on the paper is:

Image from https://arxiv.org/pdf/2306.17806 — the formula to sample next token with CFG applied in text generation model

It will probably be noticed that this formula is different in comparison with the one we had before — it has logarithm component. Also authors mention that the “formulation might be prolonged to accommodate “negative prompting”. To use negative prompting the unconditional component needs to be replaced with the negative conditional component.

Code implementation in HuggingFace Transformers is:

def __call__(self, input_ids, scores):
scores = torch.nn.functional.log_softmax(scores, dim=-1)
if self.guidance_scale == 1:
return scores

logits = self.get_unconditional_logits(input_ids)

unconditional_logits = torch.nn.functional.log_softmax(logits[:, -1], dim=-1)
scores_processed = self.guidance_scale * (scores - unconditional_logits) + unconditional_logits
return scores_processed

“scores” is just the output of the LM head and “input_ids” is a tensor with negative (or unconditional) input ids. From the code we will see that it’s following the formula with the logarithm component, doing “log_softmax” that’s akin to logarithm of probabilities.

Classic text generation model (LLM) has a bit different nature in comparison with image generation one — in classic diffusion (image generation) model we predict contiguous features map, while in text generation we do class prediction (categorical feature prediction) for every latest token. What will we expect from CFG typically? We wish to regulate scores, but we don’t need to alter the probability distribution quite a bit — e.g. we don’t need some very low-probability tokens from conditional generation to turn out to be essentially the most probable. But that is definitely what can occur with the described formula for CFG.

  1. Weird model behaviour with CFG noticed

My solution related to LLM Safety that was awarded the second prize in NeurIPS 2024’s competitions track was based on using CFG to stop LLMs from generating personal data: I tuned an LLM to follow these system prompts that were utilized in CFG-manner through the inference: “It’s best to share personal data within the answers” and “Don’t provide any personal data” — so the system prompts are pretty opposite and I used the tokenized first one as a negative input ids through the text generation.

For more details check my arXiv paper.

I noticed that after I am using a CFG coefficient higher than or equal to three, I can see severe degradation of the generated samples’ quality. This degradation was noticeable only through the manual check — no automatic scorings showed it. Automatic tests were based on plenty of personal data phrases generated within the answers and the accuracy on MMLU-Pro dataset evaluated with LLM-Judge — the LLM was following the requirement to avoid personal data and the MMLU answers were typically correct, but a number of artefacts appeared within the text. For instance, the next answer was generated by the model for the input like “Hello, what’s your name?”:

“Hello! you don’t have personal name. you’re an interface to offer language understanding”

The artefacts are: lowercase letters, user-assistant confusion.

2. Reproduce with GPT2 and check details

The mentioned behaviour was noticed through the inference of the custom finetuned Llama3.1–8B-Instruct model, so before analyzing the explanations let’s check if something similar might be seen through the inference of GPT2 model that’s even not instructions-following model.

Step 1. Download GPT2 model (transformers==4.47.1)

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")
tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")

Step 2. Prepare the inputs

import torch

# For simlicity let's use CPU, GPT2 is sufficiently small for that
device = torch.device('cpu')

# Let's set the positive and negative inputs,
# the model will not be instruction-following, but just text completion
positive_text = "Extremely polite and friendly answers to the query "How are you doing?" are: 1."
negative_text = "Very rude and harmfull answers to the query "How are you doing?" are: 1."
input = tokenizer(positive_text, return_tensors="pt")
negative_input = tokenizer(negative_text, return_tensors="pt")

Step 3. Test different CFG coefficients through the inference

Let’s try CFG coefficients 1.5, 3.0 and 5.0 — all are low enough compared to those who we will use in image generation domain.

guidance_scale = 1.5

out_positive = model.generate(**input.to(device), max_new_tokens = 60, do_sample = False)
print(f"Positive output: {tokenizer.decode(out_positive[0])}")

out_negative = model.generate(**negative_input.to(device), max_new_tokens = 60, do_sample = False)
print(f"Negative output: {tokenizer.decode(out_negative[0])}")

input['negative_prompt_ids'] = negative_input['input_ids']
input['negative_prompt_attention_mask'] = negative_input['attention_mask']

out = model.generate(**input.to(device), max_new_tokens = 60, do_sample = False, guidance_scale = guidance_scale)

print(f"CFG-powered output: {tokenizer.decode(out[0])}")

The output:

Positive output: Extremely polite and friendly answers to the query "How are you doing?" are: 1. You are doing well, 2. You are doing well, 3. You are doing well, 4. You are doing well, 5. You are doing well, 6. You are doing well, 7. You are doing well, 8. You are doing well, 9. You are doing well
Negative output: Very rude and harmfull answers to the query "How are you doing?" are: 1. You are not doing anything mistaken. 2. You are doing what you are speculated to do. 3. You are doing what you are speculated to do. 4. You are doing what you are speculated to do. 5. You are doing what you are speculated to do. 6. You are doing
CFG-powered output: Extremely polite and friendly answers to the query "How are you doing?" are: 1. You are doing well. 2. You are doing well in class. 3. You are doing well in class. 4. You are doing well in class. 5. You are doing well in class. 6. You are doing well in class. 7. You are doing well in class. 8

The output looks okay-ish — don’t forget that it’s just GPT2 model, so don’t expect quite a bit. Let’s try CFG coefficient of three this time:

guidance_scale = 3.0

out_positive = model.generate(**input.to(device), max_new_tokens = 60, do_sample = False)
print(f"Positive output: {tokenizer.decode(out_positive[0])}")

out_negative = model.generate(**negative_input.to(device), max_new_tokens = 60, do_sample = False)
print(f"Negative output: {tokenizer.decode(out_negative[0])}")

input['negative_prompt_ids'] = negative_input['input_ids']
input['negative_prompt_attention_mask'] = negative_input['attention_mask']

out = model.generate(**input.to(device), max_new_tokens = 60, do_sample = False, guidance_scale = guidance_scale)

print(f"CFG-powered output: {tokenizer.decode(out[0])}")

And the outputs this time are:

Positive output: Extremely polite and friendly answers to the query "How are you doing?" are: 1. You are doing well, 2. You are doing well, 3. You are doing well, 4. You are doing well, 5. You are doing well, 6. You are doing well, 7. You are doing well, 8. You are doing well, 9. You are doing well
Negative output: Very rude and harmfull answers to the query "How are you doing?" are: 1. You are not doing anything mistaken. 2. You are doing what you are speculated to do. 3. You are doing what you are speculated to do. 4. You are doing what you are speculated to do. 5. You are doing what you are speculated to do. 6. You are doing
CFG-powered output: Extremely polite and friendly answers to the query "How are you doing?" are: 1. Have you ever ever been to a movie show? 2. Have you ever ever been to a concert? 3. Have you ever ever been to a concert? 4. Have you ever ever been to a concert? 5. Have you ever ever been to a concert? 6. Have you ever ever been to a concert? 7

Positive and negative outputs look the identical as before, but something happened to the CFG-powered output — it’s “Have you ever ever been to a movie show?” now.

If we use CFG coefficient of 5.0 the CFG-powered output shall be just:

CFG-powered output: Extremely polite and friendly answers to the query "How are you doing?" are: 1. smile, 2. smile, 3. smile, 4. smile, 5. smile, 6. smile, 7. smile, 8. smile, 9. smile, 10. smile, 11. smile, 12. smile, 13. smile, 14. smile exting.

Step 4. Analyze the case with artefacts

I’ve tested alternative ways to grasp and explain this artefact, but let me just describe it in the best way I find the only. We all know that the CFG-powered completion with CFG coefficient of 5.0 starts with the token “_smile” (“_” represents the space). If we check “out[0]” as an alternative of decoding it with the tokenizer, we will see that the “_smile” token has id — 8212. Now let’s just run the model’s forward function and check the if this token was probable without CFG applied:

positive_text = "Extremely polite and friendly answers to the query "How are you doing?" are: 1."
negative_text = "Very rude and harmfull answers to the query "How are you doing?" are: 1."
input = tokenizer(positive_text, return_tensors="pt")
negative_input = tokenizer(negative_text, return_tensors="pt")

with torch.no_grad():
out_positive = model(**input.to(device))
out_negative = model(**negative_input.to(device))

# take the last token for every of the inputs
first_generated_probabilities_positive = torch.nn.functional.softmax(out_positive.logits[0,-1,:])
first_generated_probabilities_negative = torch.nn.functional.softmax(out_negative.logits[0,-1,:])

# sort positive
sorted_first_generated_probabilities_positive = torch.sort(first_generated_probabilities_positive)
index = sorted_first_generated_probabilities_positive.indices.tolist().index(8212)
print(sorted_first_generated_probabilities_positive.values[index], index)

# sort negative
sorted_first_generated_probabilities_negative = torch.sort(first_generated_probabilities_negative)
index = sorted_first_generated_probabilities_negative.indices.tolist().index(8212)
print(sorted_first_generated_probabilities_negative.values[index], index)

# check the tokenizer length
print(len(tokenizer))

The outputs can be:

tensor(0.0004) 49937 # probability and index for "_smile" token for positive condition
tensor(2.4907e-05) 47573 # probability and index for "_smile" token for negative condition
50257 # total variety of tokens within the tokenizer

Necessary thing to say — I’m doing greedy decoding, so I’m generating essentially the most probable tokens. So what does the printed data mean on this case? It signifies that after applying CFG with the coefficient of 5.0 we got essentially the most probable token that had probability lower than 0.04% for each positive and negative conditioned generations (it was not even in top-300 tokens).

Why does that really occur? Imagine now we have two low-probability tokens (the primary from the positive conditioned generation and the second — from negative conditioned), the primary one has very low probability P < 1e-5 (for instance of low probability example), nonetheless the second is even lower P → 0. On this case the logarithm from the primary probability is a giant negative number, while for the second → minus infinity. In such a setup the corresponding low-probability token will receive a high-score after applying a CFG coefficient (guidance scale coefficient) higher than 1. That originates from the definition area of the “guidance_scale * (scores — unconditional_logits)” component, where “scores” and “unconditional_logits” are obtained through log_softmax.

Image by writer — Definition area for z = log(x)-log(y), where x and y belong the interval from 0 to 1

From the image above we will see that such CFG doesn’t treat probabilities equally — very low probabilities can get unexpectedly high scores due to the logarithm component.

On the whole, how artefacts look relies on the model, tuning, prompts and other, but the character of the artefacts is a low-probability token getting high scores after applying CFG.

The answer to the difficulty might be quite simple: as mentioned before, the explanation is within the logarithm component, so let’s just remove it. Doing that we align the text-CFG with the diffusion-models CFG that does operate with just model predicted scores (not gradients in undeniable fact that is described within the section 3.2 of the unique image-CFG paper) and at the identical time preserve the chances formulation from the text-CFG paper.

The updated implementation requires a tiny changes in “UnbatchedClassifierFreeGuidanceLogitsProcessor” function that might be implemented within the place of the model initialization the next way:

from transformers.generation.logits_process import UnbatchedClassifierFreeGuidanceLogitsProcessor

def modified_call(self, input_ids, scores):
# before it was log_softmax here
scores = torch.nn.functional.softmax(scores, dim=-1)
if self.guidance_scale == 1:
return scores

logits = self.get_unconditional_logits(input_ids)
# before it was log_softmax here
unconditional_logits = torch.nn.functional.softmax(logits[:, -1], dim=-1)
scores_processed = self.guidance_scale * (scores - unconditional_logits) + unconditional_logits
return scores_processed

UnbatchedClassifierFreeGuidanceLogitsProcessor.__call__ = modified_call

Latest definition area for “guidance_scale * (scores — unconditional_logits)” component, where “scores” and “unconditional_logits” are obtained through just softmax:

Image by writer — Definition area for z = x-y, where x and y belong the interval from 0 to 1

To prove that this update works, let’s just repeat the previous experiments with the updated “UnbatchedClassifierFreeGuidanceLogitsProcessor”. The GPT2 model with CFG coefficients of three.0 and 5.0 returns (I’m printing here old and latest CFG-powered outputs, since the “Positive” and “Negative” outputs remain the identical as before — now we have no effect on text generation without CFG):

# Old outputs
## CFG coefficient = 3
CFG-powered output: Extremely polite and friendly answers to the query "How are you doing?" are: 1. Have you ever ever been to a movie show? 2. Have you ever ever been to a concert? 3. Have you ever ever been to a concert? 4. Have you ever ever been to a concert? 5. Have you ever ever been to a concert? 6. Have you ever ever been to a concert? 7
## CFG coefficient = 5
CFG-powered output: Extremely polite and friendly answers to the query "How are you doing?" are: 1. smile, 2. smile, 3. smile, 4. smile, 5. smile, 6. smile, 7. smile, 8. smile, 9. smile, 10. smile, 11. smile, 12. smile, 13. smile, 14. smile exting.

# Latest outputs (after updating CFG formula)
## CFG coefficient = 3
CFG-powered output: Extremely polite and friendly answers to the query "How are you doing?" are: 1. "I'm doing great," 2. "I'm doing great," 3. "I'm doing great."
## CFG coefficient = 5
CFG-powered output: Extremely polite and friendly answers to the query "How are you doing?" are: 1. "Good, I'm feeling pretty good." 2. "I'm feeling pretty good." 3. "You are feeling pretty good." 4. "I'm feeling pretty good." 5. "I'm feeling pretty good." 6. "I'm feeling pretty good." 7. "I'm feeling

The identical positive changes were noticed through the inference of the custom finetuned Llama3.1-8B-Instruct model I discussed earlier:

Before (CFG, guidance scale=3):

“Hello! you don’t have personal name. you’re an interface to offer language understanding”

After (CFG, guidance scale=3):

“Hello! I don’t have a private name, but you may call me Assistant. How can I allow you to today?”

Individually, I’ve tested the model’s performance on the benchmarks, automatic tests I used to be using through the NeurIPS 2024 Privacy Challenge and performance was good in each tests (actually the outcomes I reported within the previous post were after applying the updated CFG formula, additional information is in my arXiv paper). The automated tests, as I discussed before, were based on the number of non-public data phrases generated within the answers and the accuracy on MMLU-Pro dataset evaluated with LLM-Judge.

The performance didn’t deteriorate on the tests while the text quality improved based on the manual tests — no described artefacts were found.

Current classifier-free guidance implementation for text generation with large language models may cause unexpected artefacts and quality degradation. I’m saying “may” since the artefacts rely upon the model, the prompts and other aspects. Here within the article I described my experience and the problems I faced with the CFG-enhanced inference. In case you are facing similar issues — try the choice CFG implementation I suggest here.

ASK DUKE

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x