Do you discover it difficult to inform if text was written by a human or generated by
AI? Having the ability to discover AI-generated content is important to promoting trust
in information, and helping to handle problems reminiscent of misattribution and
misinformation. Today, Google DeepMind and Hugging
Face are excited to launch
SynthID Text in Transformers
v4.46.0, releasing later today. This technology permits you to apply watermarks
to AI-generated text using a
logits processor
for generation tasks, and detect those watermarks with a
classifier.
Take a look at the SynthID Text
paper in Nature for the
complete technical details of this algorithm, and Google’s
Responsible GenAI Toolkit
for more on the way to apply SynthID Text in your products.
How it really works
The first goal of SynthID Text is to encode a watermark into AI-generated text
in a way that helps you establish if text was generated out of your LLM without
affecting how the underlying LLM works or negatively impacting generation
quality. Google DeepMind has developed a watermarking technique that uses a
pseudo-random function, called a g-function, to reinforce the generation process
of any LLM such that the watermark is imperceptible to humans but is visible to
a trained model. This has been implemented as a
generation utility
that’s compatible with any LLM without modification using the
model.generate() API, together with an
end-to-end example
of the way to train detectors to acknowledge watermarked text. Take a look at the
research paper that has
more complete details in regards to the SynthID Text algorithm.
Configuring a watermark
Watermarks are
configured using a dataclass
that parameterizes the g-function and the way it’s applied within the tournament
sampling process. Each model you employ must have its own watermarking
configuration that needs to be stored securely and privately, otherwise your
watermark could also be replicable by others.
You have to define two parameters in every watermarking configuration:
-
The
keysparameter is a listing integers which might be used to compute g-function
scores across the model’s vocabulary. Using 20 to 30 unique, randomly
generated numbers is really useful to balance detectability against generation
quality. -
The
ngram_lenparameter is used to balance robustness and detectability; the
larger the worth the more detectable the watermark might be, at the associated fee of
being more brittle to changes. A superb default value is 5, but it surely must be
not less than 2.
You possibly can further configure the watermark based in your performance needs. See the
SynthIDTextWatermarkingConfig class
for more information.
The research paper
includes additional analyses of how specific configuration values affect
watermark performance.
Applying a watermark
Applying a watermark is a simple change to your existing generation
calls. When you define your configuration, pass a
SynthIDTextWatermarkingConfig object because the watermarking_config= parameter
to model.generate() and all generated text will carry the watermark. Take a look at
the SynthID Text Space for
an interactive example of watermark application, and see when you can tell.
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
SynthIDTextWatermarkingConfig,
)
tokenizer = AutoTokenizer.from_pretrained('repo/id')
model = AutoModelForCausalLM.from_pretrained('repo/id')
watermarking_config = SynthIDTextWatermarkingConfig(
keys=[654, 400, 836, 123, 340, 443, 597, 160, 57, ...],
ngram_len=5,
)
tokenized_prompts = tokenizer(["your prompts here"])
output_sequences = model.generate(
**tokenized_prompts,
watermarking_config=watermarking_config,
do_sample=True,
)
watermarked_text = tokenizer.batch_decode(output_sequences)
Detecting a watermark
Watermarks are designed to be detectable by a trained classifier but
imperceptible to humans. Every watermarking configuration you employ together with your
models must have a detector trained to acknowledge the mark.
The essential detector training process is:
- Choose a watermarking configuration.
- Collect a detector training set split between watermarked or not, and training
or test, we recommend a minimum of 10k examples. - Generate non-watermarked outputs together with your model.
- Generate watermarked outputs together with your model.
- Train your watermark detection classifier.
- Productionize your model with the watermarking configuration and associated detector.
A
Bayesian detector class
is provided in Transformers, together with an
end-to-end example
of the way to train a detector to acknowledge watermarked text using a particular
watermarking configuration. Models that use the identical tokenizer may also share
watermarking configuration and detector, thus sharing a typical watermark, so
long because the detector’s training set includes examples from all models that share
a watermark.
This trained detector could be uploaded to a personal HF Hub to make it accessible
across your organization. Google’s
Responsible GenAI Toolkit
has more on the way to productionize SynthID Text in your products.
Limitations
SynthID Text watermarks are robust to some transformations, reminiscent of cropping
pieces of text, modifying just a few words, or mild paraphrasing, but this method
does have limitations.
- Watermark application is less effective on factual responses, as there may be less
opportunity to reinforce generation without decreasing accuracy. - Detector confidence scores could be greatly reduced when an AI-generated text is
thoroughly rewritten, or translated to a different language.
SynthID Text just isn’t built to directly stop motivated adversaries from causing
harm. Nonetheless, it may make it harder to make use of AI-generated content for malicious
purposes, and it may be combined with other approaches to provide higher coverage
across content types and platforms.
Acknowledgements
The authors would love to thank Robert Stanforth and Tatiana Matejovicova for
their contributions to this work.
