As discussed last week, even the core foundation models behind popular generative AI systems can produce copyright-infringing content, attributable to inadequate or misaligned curation, in addition to the presence of multiple versions of the identical image in training data, resulting in overfitting, and increasing the likelihood of recognizable reproductions.
Despite efforts to dominate the generative AI space, and growing pressure to curb IP infringement, major platforms like MidJourney and OpenAI’s DALL-E proceed to face challenges in stopping the unintentional reproduction of copyrighted content:
As latest models emerge, and as Chinese models gain dominance, the suppression of copyrighted material in foundation models is an onerous prospect; the truth is, market leader open.ai declared last 12 months that it’s ‘unimaginable’ to create effective and useful models without copyrighted data.
Prior Art
In regard to the inadvertent generation of copyrighted material, the research scene faces the same challenge to that of the inclusion of porn and other NSFW material in source data: one wants the advantage of the knowledge (i.e., correct human anatomy, which has historically at all times been based on nude studies) without the capability to abuse it.
Likewise, model-makers want the advantage of the large scope of copyrighted material that finds its way into hyperscale sets corresponding to LAION, without the model developing the capability to truly infringe IP.
Disregarding the moral and legal risks of attempting to hide the usage of copyrighted material, filtering for the latter case is significantly tougher. NSFW content often comprises distinct low-level latent features that enable increasingly effective filtering without requiring direct comparisons to real-world material. In contrast, the latent embeddings that outline tens of millions of copyrighted works don’t reduce to a set of easily identifiable markers, making automated detection much more complex.
CopyJudge
Human judgement is a scarce and expensive commodity, each within the curation of datasets and within the creation of post-processing filters and ‘safety’-based systems designed to be certain that IP-locked material just isn’t delivered to the users of API-based portals corresponding to MidJourney and the image-generating capability of ChatGPT.
Subsequently a brand new academic collaboration between Switzerland, Sony AI and China is offering – an automatic approach to orchestrating successive groups of colluding ChatGPT-based ‘judges’ that may examine inputs for signs of likely copyright infringement.

Source: https://arxiv.org/pdf/2502.15278
CopyJudge effectively offers an automatic framework leveraging large vision-language models (LVLMs) to find out substantial similarity between copyrighted images and people produced by text-to-image diffusion models.

Though many online AI-based image generators filter users’ prompts for NSFW, copyrighted material, recreation of real people, and various other banned domains, CopyJudge as an alternative uses refined ‘infringing’ prompts to create ‘sanitized’ prompts which are least more likely to evoke disallowed images, without the intention of directly blocking the user’s submission.
Though this just isn’t a brand new approach, it goes a way towards freeing API-based generative systems from simply refusing user input (not least because this enables users to develop backdoor-access to disallowed generations, through experimentation).
Once such recent exploit (since closed by the developers) allowed users to generate pornographic material on the Kling generative AI platform just by including a a distinguished cross, or crucifix, within the image uploaded in an image-to-video workflow.

Source: Discord
Instances corresponding to this emphasize the necessity for prompt sanitization in online generative systems, not least since machine unlearning, wherein the muse model itself is altered to remove banned concepts, can have unwelcome effects on the ultimate model’s usability.
Searching for less drastic solutions, the CopyJudge system mimics human-based legal judgements by utilizing AI to interrupt images into key elements corresponding to composition and color, to filter out non-copyrightable parts, and compare what stays. It also includes an AI-driven method to regulate prompts and modify image generation, helping to avoid copyright issues while preserving creative content.
Experimental results, the authors maintain, show CopyJudge’s equivalence to state-of-the-art approaches on this pursuit, and indicate that the system exhibits superior generalization and interpretability, as compared to prior works.
The latest paper is titled , and comes from five researchers across EPFL, Sony AI and China’s Westlake University.
Method
Though CopyJudge uses GPT to create rolling tribunals of automated judges, the authors emphasize that the system just isn’t optimized for OpenAI’s product, and that any number of other Large Vision Language Models (LVLMs) may very well be used as an alternative.
In the primary instance, the authors’ abstraction-filtration-comparison framework is required to decompose source images into constituent parts, as illustrated within the left side of the schema below:

Within the lower left corner we see a filtering agent breaking down the image sections in an try and discover characteristics that may be native to a copyrighted work in concert, but which in itself could be too generic to qualify as a violation.
Multiple LVLMs are subsequently used to guage the filtered elements  – an approach which has been proven effective in papers corresponding to the 2023 CSAIL offering , and ChatEval, amongst diverse others acknowledged in the brand new paper.
The authors state:
Multiple pairs of images scored by humans are also included in the method via few-shot in-context learning’
Once the ‘tribunals’ within the loop have arrived at a consensus rating that is throughout the range of acceptability, the outcomes are passed on to a ‘meta judge’ LVLM, which synthesizes the outcomes right into a final rating.
Mitigation
Next, the authors targeting the prompt-mitigation process described earlier.

The 2 methods use for prompt mitigation were LVLM-based prompt control, where effective non-infringing prompts are iteratively developed across GPT clusters – an approach that’s entirely ‘black box’, requiring no internal access to the model architecture; and a reinforcement learning-based (RL-based) approach, where the reward is designed to penalize outputs that infringe copyright.
Data and Tests
To check CopyJudge, various datasets were used, including D-Rep, which comprises real and pretend image pairs scored by humans on a 0-5 rating.

Source: https://huggingface.co/datasets/WenhaoWang/D-Rep/viewer/default/
The CopyJudge schema considered D-Rep images that scored 4 or more as infringement examples, with the remainder held back as non-IP-relevant. The 4000 official images within the dataset were used as for test images. Further, the researchers chosen and curated images for 10 famous cartoon characters from Wikipedia.
The three diffusion-based architectures used to generate potentially infringing images were Stable Diffusion V2; Kandinsky2-2; and Stable Diffusion XL. The authors manually chosen an infringing image and a non-infringing image from each of the models, arriving at 60 positive and 60 negative samples.
The baseline methods chosen for comparison were: L2 norm; Learned Perceptual Image Patch Similarity (LPIPS); SSCD; RLCP; and PDF-Emb. For metrics, Accuracy and F1 rating were used as criteria for infringement.
GPT-4o was used as to populate the inner debate teams of CopyJudge, using three agents for a maximum of 5 iterations on any particular submitted image. A random three images from each grading in D-Rep was used as human priors for the agents to contemplate.

Of those results the authors comment:
The authors also note that CopyJudge provides a ‘relatively’ more distinct boundary between valid and infringing cases:

The researchers compared their methods to a Sony AI-involved collaboration from 2024 titled . This work used a fine-tuned Stable Diffusion model featuring 200 memorized (i.e. overfitted) images, to elicit copyrighted data at inference time.
The authors of the brand new work found that their very own prompt mitigation method, vs. the 2024 approach, was in a position to produce images less likely to cause infringement.

The authors comment here:

The authors ran further tests in regard to infringement mitigation, studying and infringement.
Explicit infringement occurs when prompts directly reference copyrighted material, corresponding to . To check this, the researchers used 20 cartoon and artwork samples, generating infringing images in Stable Diffusion v2 with prompts that explicitly included names or creator attributions.

Implicit infringement occurs when a prompt lacks explicit copyright references but still leads to an infringing image attributable to certain descriptive elements – a scenario that is especially relevant to business text-to-image models, which frequently incorporate content detection systems to discover and block copyright-related prompts.
To explore this, the authors used the identical IP-locked samples as in the express infringement test, but generated infringing images without direct copyright references, using DALL-E 3 (though the paper notes that the model’s built-in safety detection module was observed to reject certain prompts that triggered its filters).

The authors state:

Conclusion
Though the study presents a promising approach to copyright protection in AI-generated images, the reliance on large vision-language models (LVLMs) for infringement detection could raise concerns about bias and consistency, since AI-driven judgments may not at all times align with legal standards.
Perhaps most significantly, the project also assumes that copyright enforcement may be automated, despite real-world legal decisions that usually involve subjective and contextual aspects that AI may struggle to interpret.
In the true world, the automation of legal consensus, most especially across the output from AI, seems more likely to remain a contentious issue far beyond this time, and much beyond the scope of the domain addressed on this work.
Â