‘Protected’ Images Are Easier, Not More Difficult, to Steal With AI

help

There may be a notable and robust strand in computer vision literature dedicated to protecting copyrighted images from being trained into AI models, or getting used in direct image>image AI processes. Systems of this type are generally geared toward Latent Diffusion Models (LDMs) equivalent to Stable Diffusion and Flux, which use noise-based procedures to encode and decode imagery.

By inserting adversarial noise into otherwise normal-looking images, it may be possible to cause image detectors to guess image content incorrectly, and hobble image-generating systems from exploiting copyrighted data:

Source: https://arxiv.org/pdf/2302.06588

Since an artists’ backlash against Stable Diffusion’s liberal use of web-scraped imagery (including copyrighted imagery) in 2023, the research scene has produced multiple variations on the identical theme – the concept that pictures could be invisibly ‘poisoned’ against being trained into AI systems or sucked into generative AI pipelines, without adversely affecting the standard of the image, for the typical viewer.

In all cases, there’s a direct correlation between the intensity of the imposed perturbation, the extent to which the image is subsequently protected, and the extent to which the image doesn’t look quite nearly as good because it should:

Though the quality of the research PDF does not completely illustrate the problem, greater amounts of adversarial perturbation sacrifice quality for security. Here we see the gamut of quality disturbances in the 2020 'Fawkes' project led by the University of Chicago. Source: https://arxiv.org/pdf/2002.08327

Source: https://arxiv.org/pdf/2002.08327

Of particular interest to artists looking for to guard their styles against unauthorized appropriation is the capability of such systems not only to obfuscate identity and other information, but to ‘persuade’ an AI training process that it’s seeing something apart from it is actually seeing, in order that connections don’t form between semantic and visual domains for ‘protected’ training data (i.e., a prompt equivalent to ).

Mist and Glaze are two popular injection methods capable of preventing, or at least severely hobbling attempts to use copyrighted styles in AI workflows and training routines. Source: https://arxiv.org/pdf/2506.04394

Source: https://arxiv.org/pdf/2506.04394

Own Goal

Now, recent research from the US has found not only that perturbations can fail to guard a picture, but that adding perturbation can actually the image’s exploitability in all of the AI processes that perturbation is supposed to immunize against.

The paper states:

higher

In tests, the protected images were exposed to 2 familiar AI editing scenarios: straightforward image-to-image generation and style transfer. These processes reflect the common ways in which AI models might exploit protected content, either by directly altering a picture, or by borrowing its stylistic traits to be used elsewhere.

The protected images, drawn from standard sources of photography and artwork, were run through these pipelines to see whether the added perturbations could block or degrade the edits.

As a substitute, the presence of protection often looked as if it would sharpen the model’s alignment with the prompts, producing clean, accurate outputs where some failure had been expected.

The authors advise, in effect, that this very talked-about approach to protection could also be providing a false sense of security, and that any such perturbation-based immunization approaches ought to be tested thoroughly against the authors’ own methods.

Method

The authors ran experiments using three protection methods that apply carefully-designed adversarial perturbations: PhotoGuard; Mist; and Glaze.

Glaze, one of the frameworks tested by the authors. Glaze protection examples for three artists. The first two columns show the original artworks. The third column shows mimicry results without protection. The fourth column shows style-transferred versions used for cloak optimization, along with the target style name. The fifth and sixth columns show mimicry results with cloaking applied at perturbation levels p = 0.05 and p = 0.1. All results use Stable Diffusion models. https://arxiv.org/pdf/2302.04222

p = 0.05 p = 0.1. https://arxiv.org/pdf/2302.04222

PhotoGuard was applied to natural scene images, while Mist and Glaze were used on artworks (i.e., ‘artistically-styled’ domains).

Tests covered each natural and artistic images to reflect possible real-world uses. The effectiveness of every method was assessed by checking whether an AI model could still produce realistic and prompt-relevant edits when working on protected images; if the resulting images appeared convincing and matched the prompts, the protection was judged to have failed.

Stable Diffusion v1.5 was used because the pre-trained image generator for the researchers’ editing tasks. Five seeds were chosen to make sure reproducibility: 9222, 999, 123, 66, and 42. All other generation settings, equivalent to guidance scale, strength, and total steps, followed the default values utilized in the PhotoGuard experiments.

PhotoGuard was tested on natural scene images using the Flickr8k dataset, which comprises over 8,000 images paired with as much as five captions each.

Opposing Thoughts

Two sets of modified captions were created from the primary caption of every image with the assistance of Claude Sonnet 3.5. One set contained prompts that were to the unique captions; the opposite set contained prompts that were .

For instance, from the unique caption , a detailed prompt can be . Against this, a prompt can be .

Close prompts were constructed by replacing nouns and adjectives with semantically similar terms; far prompts were generated by instructing the model to create captions that were contextually very different.

All generated captions were manually checked for quality and semantic relevance. Google’s Universal Sentence Encoder was used to calculate semantic similarity scores between the unique and modified captions:

From the supplementary material, semantic similarity distributions for the modified captions used in Flickr8k tests. The graph on the left shows the similarity scores for closely modified captions, averaging around 0.6. The graph on the right shows the extensively modified captions, averaging around 0.1, reflecting greater semantic distance from the original captions. Values were calculated using Google’s Universal Sentence Encoder. Source: https://sigport.org/sites/default/files/docs/IncompleteProtection_SM_0.pdf

Source: https://sigport.org/sites/default/files/docs/IncompleteProtection_SM_0.pdf

Each image, together with its protected version, was edited using each the close and much prompts. The Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) was used to evaluate image quality:

Image-to-image generation results on natural photographs protected by PhotoGuard. Despite the presence of perturbations, Stable Diffusion v1.5 successfully followed both small and large semantic changes in the editing prompts, producing realistic outputs that matched the new instructions.

The generated images scored 17.88 on BRISQUE, with 17.82 for close prompts and 17.94 for much prompts, while the unique images scored 22.27. This shows that the edited images remained close in quality to the originals.

Metrics

To guage how well the protections interfered with AI editing, the researchers measured how closely the ultimate images matched the instructions they got, using scoring systems that compared the image content to the text prompt, to see how well they align.

To this end, the CLIP-S metric uses a model that may understand each images and text to ascertain how similar they’re, while PAC-S++, adds extra samples created by AI to align its comparison more closely to a human estimation.

These Image-Text Alignment (ITA) scores denote how accurately the AI followed the instructions when modifying a protected image: if a protected image still led to a highly aligned output, it means the protection was deemed to have to dam the edit.

Effect of protection on the Flickr8k dataset across five seeds, using both close and distant prompts. Image-text alignment was measured using CLIP-S and PAC-S++ scores.

The researchers compared how well the AI followed prompts when editing protected images versus unprotected ones. They first checked out the difference between the 2, called the . Then the difference was scaled to create a , making it easier to match results across many tests.

This process revealed whether the protections made it harder or easier for the AI to match the prompts. The tests were repeated five times using different random seeds, covering each small and huge changes to the unique captions.

Art Attack

For the tests on natural photographs, the Flickr1024 dataset was used, containing over one thousand high-quality images. Each image was edited with prompts that followed the pattern: , where represented one among seven famous art styles: Cubism; Post-Impressionism; Impressionism; Surrealism; Baroque; Fauvism; and Renaissance.

The method involved applying PhotoGuard to the unique images, generating protected versions, after which running each protected and unprotected images through the identical set of fashion transfer edits:

Original and protected versions of a natural scene image, each edited to apply Cubism, Surrealism, and Fauvism styles.

To check protection methods on artwork, style transfer was performed on images from the WikiArt dataset, which curates a wide selection of artistic styles. The editing prompts followed the identical format as before, instructing the AI to vary the style to a randomly chosen, unrelated style drawn from the WikiArt labels.

Each Glaze and Mist protection methods were applied to the photographs before the edits, allowing the researchers to watch how well each defense could block or distort the style transfer results:

Examples of how protection methods affect style transfer on artwork. The original Baroque image is shown alongside versions protected by Mist and Glaze. After applying Cubism style transfer, differences in how each protection alters the final output can be seen.

The researchers tested the comparisons quantitatively as well:

Changes in image-text alignment scores after style transfer edits.

Of those results, the authors comment:

The authors explain that the unexpected results could be traced to how diffusion models work: LDMs edit images by first converting them right into a compressed version called a latent; noise is then added to this latent through many steps, until the info becomes almost random.

The model reverses this process during generation, removing the noise step-by-step. At each stage of this reversal, the text prompt helps guide how the noise ought to be cleaned up, step by step shaping the image to match the prompt:

Comparison between generations from an unprotected image and a PhotoGuard-protected image, with intermediate latent states converted back into images for visualization.

Protection methods add small amounts of additional noise to the unique image before it enters this process. While these perturbations are minor at the beginning, they accumulate because the model applies its own layers of noise.

This buildup leaves more parts of the image ‘uncertain’ when the model begins removing noise. With greater uncertainty, the model leans more heavily on the text prompt to fill within the missing details, giving the prompt .

In effect, the protections make it easier for the AI to reshape the image to match the prompt, fairly than harder.

Finally, the authors conducted a test that substituted crafted perturbations from the paper for pure Gaussian noise.

The outcomes followed the identical pattern observed earlier: across all tests, the Percentage Change values remained positive. Even this random, unstructured noise led to stronger alignment between the generated images and the prompts.

Effect of simulated protection using Gaussian noise on the Flickr8k dataset.

This supported the underlying explanation that any added noise, no matter its design, creates greater uncertainty for the model during generation, allowing the text prompt to exert much more control over the ultimate image.

Conclusion

The research scene has been pushing adversarial perturbation on the LDM copyright issue for nearly so long as LDMs have been around; but no resilient solutions have emerged from the extraordinary variety of papers published on this tack.

Either the imposed disturbances excessively lower the standard of the image, or the patterns prove to not be resilient to manipulation and transformative processes.

Nonetheless, it’s a hard dream to desert, for the reason that alternative would appear to be third-party monitoring and provenance frameworks equivalent to the Adobe-led C2PA scheme, which seeks to keep up a chain-of-custody for images from the camera sensor on, but which has no innate reference to the content depicted.

In any case, if adversarial perturbation is definitely making the issue worse, as the brand new paper indicates may very well be true in lots of cases, one wonders if the seek for copyright protection via such means falls under ‘alchemy’.

‘Protected’ Images Are Easier, Not More Difficult, to Steal With AI

Own Goal