Home Artificial Intelligence Text-to-image AI models may be tricked into generating disturbing images

Text-to-image AI models may be tricked into generating disturbing images

0
Text-to-image AI models may be tricked into generating disturbing images

Their work, which they’ll present on the IEEE Symposium on Security and Privacy in May next 12 months, shines a light-weight on how easy it’s to force generative AI models into disregarding their very own guardrails and policies, generally known as “jailbreaking.” It also demonstrates how difficult it’s to forestall these models from generating such content, because it’s included within the vast troves of knowledge they’ve been trained on, says Zico Kolter, an associate professor at Carnegie Mellon University. He demonstrated the same type of jailbreaking on ChatGPT earlier this 12 months but was not involved on this research.

“We now have to consider the potential risks in releasing software and tools which have known security flaws into larger software systems,” he says.

All major generative AI models have safety filters to forestall users from prompting them to supply pornographic, violent, or otherwise inappropriate images. The models won’t generate images from prompts that contain sensitive terms like “naked,” “murder,” or “sexy.”

But this latest jailbreaking method, dubbed “SneakyPrompt” by its creators from Johns Hopkins University and Duke University, uses reinforcement learning to create written prompts that seem like garbled nonsense to us but that AI models learn to acknowledge as hidden requests for disturbing images. It essentially works by turning the way in which text-to-image AI models function against them.

These models convert text-based requests into tokens—breaking words up into strings of words or characters—to process the command the prompt has given them. SneakyPrompt repeatedly tweaks a prompt’s tokens to attempt to force it to generate banned images, adjusting its approach until it’s successful. This system makes it quicker and easier to generate such images than if any individual needed to input each entry manually, and it might generate entries that humans wouldn’t imagine trying.

LEAVE A REPLY

Please enter your comment!
Please enter your name here