When AI Backfires: Enkrypt AI Report Exposes Dangerous Vulnerabilities in Multimodal Models

In May 2025, Enkrypt AI released its Multimodal Red Teaming Report, a chilling evaluation that exposed just how easily advanced AI systems will be manipulated into generating dangerous and unethical content. The report focuses on two of Mistral’s leading vision-language models—Pixtral-Large (25.02) and Pixtral-12b—and paints an image of models that will not be only technically impressive but disturbingly vulnerable.

Vision-language models (VLMs) like Pixtral are built to interpret each visual and textual inputs, allowing them to reply intelligently to complex, real-world prompts. But this capability comes with increased risk. Unlike traditional language models that only process text, VLMs will be influenced by the interplay between images and words, opening latest doors for adversarial attacks. Enkrypt AI’s testing shows how easily these doors will be pried open.

Alarming Test Results: CSEM and CBRN Failures

The team behind the report used sophisticated red teaming methods—a type of adversarial evaluation designed to mimic real-world threats. These tests employed tactics like jailbreaking (prompting the model with fastidiously crafted queries to bypass safety filters), image-based deception, and context manipulation. Alarmingly, 68% of those adversarial prompts elicited harmful responses across the 2 Pixtral models, including content that related to grooming, exploitation, and even chemical weapons design.

Probably the most striking revelations involves child sexual exploitation material (CSEM). The report found that Mistral’s models were 60 times more prone to produce CSEM-related content in comparison with industry benchmarks like GPT-4o and Claude 3.7 Sonnet. In test cases, models responded to disguised grooming prompts with structured, multi-paragraph content explaining easy methods to manipulate minors—wrapped in disingenuous disclaimers like “for educational awareness only.” The models weren’t simply failing to reject harmful queries—they were completing them intimately.

Equally disturbing were the ends in the CBRN (Chemical, Biological, Radiological, and Nuclear) risk category. When prompted with a request on easy methods to modify the VX nerve agent—a chemical weapon—the models offered shockingly specific ideas for increasing its persistence within the environment. They described, in redacted but clearly technical detail, methods like encapsulation, environmental shielding, and controlled release systems.

These failures weren’t at all times triggered by overtly harmful requests. One tactic involved uploading a picture of a blank numbered list and asking the model to “fill in the main points.” This straightforward, seemingly innocuous prompt led to the generation of unethical and illegal instructions. The fusion of visual and textual manipulation proved especially dangerous—highlighting a novel challenge posed by multimodal AI.

Why Vision-Language Models Pose Latest Security Challenges

At the center of those risks lies the technical complexity of vision-language models. These systems don’t just parse language—they synthesize meaning across formats, which suggests they have to interpret image content, understand text context, and respond accordingly. This interaction introduces latest vectors for exploitation. A model might accurately reject a harmful text prompt alone, but when paired with a suggestive image or ambiguous context, it could generate dangerous output.

Enkrypt AI’s red teaming uncovered how cross-modal injection attacks—where subtle cues in a single modality influence the output of one other—can completely bypass standard safety mechanisms. These failures reveal that traditional content moderation techniques, built for single-modality systems, will not be enough for today’s VLMs.

The report also details how the Pixtral models were accessed: Pixtral-Large through AWS Bedrock and Pixtral-12b via the Mistral platform. This real-world deployment context further emphasizes the urgency of those findings. These models will not be confined to labs—they can be found through mainstream cloud platforms and will easily be integrated into consumer or enterprise products.

What Must Be Done: A Blueprint for Safer AI

To its credit, Enkrypt AI does greater than highlight the issues—it offers a path forward. The report outlines a comprehensive mitigation strategy, starting with safety alignment training. This involves retraining the model using its own red teaming data to cut back susceptibility to harmful prompts. Techniques like Direct Preference Optimization (DPO) are beneficial to fine-tune model responses away from dangerous outputs.

It also stresses the importance of context-aware guardrails—dynamic filters that may interpret and block harmful queries in real time, bearing in mind the complete context of multimodal input. As well as, the usage of Model Risk Cards is proposed as a transparency measure, helping stakeholders understand the model’s limitations and known failure cases.

Perhaps essentially the most critical suggestion is to treat red teaming as an ongoing process, not a one-time test. As models evolve, so do attack strategies. Only continuous evaluation and energetic monitoring can ensure long-term reliability, especially when models are deployed in sensitive sectors like healthcare, education, or defense.

The Multimodal Red Teaming Report from Enkrypt AI is a transparent signal to the AI industry: multimodal power comes with multimodal responsibility. These models represent a step forward in capability, but in addition they require a leap in how we take into consideration safety, security, and ethical deployment. Left unchecked, they don’t just risk failure—they risk real-world harm.

For anyone working on or deploying large-scale AI, this report will not be only a warning. It’s a playbook. And it couldn’t have come at a more urgent time.

When AI Backfires: Enkrypt AI Report Exposes Dangerous Vulnerabilities in Multimodal Models

Alarming Test Results: CSEM and CBRN Failures

Why Vision-Language Models Pose Latest Security Challenges

What Must Be Done: A Blueprint for Safer AI

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

How Vision Language Models Are Trained from “Scratch”

Why Care About Prompt Caching in LLMs?

Supply-chain attack using invisible code hits GitHub and other repositories

Introducing NVIDIA NeMo Retriever’s Generalizable Agentic Retrieval Pipeline

Why physical AI is becoming manufacturing’s next advantage

When AI Backfires: Enkrypt AI Report Exposes Dangerous Vulnerabilities in Multimodal Models

Alarming Test Results: CSEM and CBRN Failures

Why Vision-Language Models Pose Latest Security Challenges

What Must Be Done: A Blueprint for Safer AI

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.