Protect AI + Hugging Face 6 Months In

Hugging Face and Protect AI partnered in October 2024 to reinforce machine learning (ML) model security through Guardian’s scanning technology for the community of developers who explore and use models from the Hugging Face Hub. The partnership has been a natural fit from the beginning—Hugging Face is on a mission to democratize the usage of open source AI, with a commitment to safety and security; and Protect AI is constructing the guardrails to make open source models protected for all.

4 recent threat detection modules launched

Since October, Protect AI has significantly expanded Guardian’s detection capabilities, improving existing threat detection capabilities and launching 4 recent detection modules:

PAIT-ARV-100: Archive slip can write to file system at load time
PAIT-JOBLIB-101: Joblib model suspicious code execution detected at model load time
PAIT-TF-200: TensorFlow SavedModel comprises architectural backdoor
PAIT-LMAFL-300: Llamafile can execute malicious code during inference

With these updates, Guardian covers more model file formats and detects additional sophisticated obfuscation techniques, including the high severity CVE-2025-1550 vulnerability in Keras. Through enhanced detection tooling, Hugging Face users receive critical security information via inline alerts on the platform and gain access to comprehensive vulnerability reports on Insights DB. Clearly labeled findings can be found on each model page, empowering users to make more informed decisions about which models to integrate into their projects.


Figure 1: Protect AI’s inline alerts on Hugging Face

By the numbers

As of April 1, 2025, Protect AI has successfully scanned 4.47 million unique model versions in 1.41 million repositories on the Hugging Face Hub.

To this point, Protect AI has identified a complete of 352,000 unsafe/suspicious issues across 51,700 models. In only the last 30 days, Protect AI has served 226 million requests from Hugging Face at a 7.94 ms response time.

Maintaining a Zero Trust Approach to Model Security

Protect AI’s Guardian applies a zero trust approach to AI/ML security. This especially comes into play when treating arbitrary code execution as inherently unsafe, no matter intent. Reasonably than simply classifying overtly malicious threats, Guardian flags execution risks as suspicious on InsightsDB, recognizing that even harmful code can look innocuous through obfuscation techniques (see more on payload obfuscation below). Attackers can disguise payloads inside seemingly benign scripts or extensibility components of a framework, making payload inspection alone insufficient for ensuring security. By maintaining this cautious approach, Guardian helps mitigate risks posed by hidden threats in machine learning models.

Evolving Guardian’s Model Vulnerability Detection Capabilities

AI/ML security threats are evolving each day. That is why Protect AI leverages each in-house threat research teams and huntr—the world’s first and largest AI/ML bug bounty program powered by our community of over 17,000 security researchers.

Coinciding with our partnership launch in October, Protect AI launched a brand new program on huntr to crowdsource research on recent Model File Vulnerabilities. Because the launch of this system, they’ve received over 200 reports that Protect AI teams have worked through and incorporated into Guardian—all of that are robotically applied to the model scans here on Hugging Face.


Figure 2: huntr’s bug bounty program

Common attack themes

As more huntr reports are available in and more independent threat research is conducted, certain trends have emerged.

Library-dependent attack chains: These attacks deal with a foul actor’s ability to invoke functions from libraries present within the ML workstations environment. These are paying homage to the “drive-by download” variety of attacks that afflicted browsers and systems when common utilities like Java and Flash were present. Typically, the size of impact of those attacks are proportional to the pervasiveness of a given library, with common ML libraries like Pytorch having a far wider potential impact than lesser used libraries.

Payload obfuscation: Several reports have highlighted ways to insert, obfuscate, or “hide” a payload in a model that bypasses common scanning techniques. These vulnerabilities use techniques like compression, encoding, and serialization to obfuscate the payload and aren’t easily detectable. Compression is a problem since libraries like Joblib allow compressed payloads to be loaded directly. Container formats like Keras and NeMo embed additional model files, each potentially vulnerable to their very own specific attack vectors. Compression exposes users to TarSlip or ZipSlip vulnerabilities. While the impacts of those will often be limited to Denial of Service, in certain circumstances these vulnerabilities can result in Arbitrary Code Execution by leveraging path traversal techniques, allowing malicious attackers to overwrite files which can be often robotically executed.

Framework-extensibility vulnerabilities: ML frameworks provide quite a few extensibility mechanisms that inadvertently create dangerous attack vectors: custom layers, external code dependencies, and configuration-based code loading. For instance, CVE-2025-1550 in Keras, reported to us by the huntr community, demonstrates how custom layers might be exploited to execute arbitrary code despite security measures. Configuration files with serialization vulnerabilities similarly allow dynamic code loading. These deserialization vulnerabilities make models exploitable through crafted payloads embedded in formats that users load without suspicion. Despite security improvements from vendors, older vulnerable versions and insecure dependency handling proceed to present significant risk in ML ecosystems.

Attack vector chaining: Recent reports reveal how multiple vulnerabilities might be combined to create sophisticated attack chains that may bypass detection. By sequentially exploiting vulnerabilities like obfuscated payloads and extension mechanisms, researchers have shown complex pathways for compromise that appear benign when examined individually. This approach significantly complicates detection and mitigation efforts, as security tools focused on single-vector threats often miss these compound attacks. Effective defense requires identifying and addressing all links within the attack chain somewhat than treating each vulnerability in isolation.

Delivering Comprehensive Threat Detection for Hugging Face Users

The industry-leading Protect AI threat research team, with help from the huntr community, is constantly gathering data and insights with a view to develop recent and more robust model scans in addition to automatic threat blocking (available to Guardian customers). In the previous few months, Guardian has:

Enhanced detection of library-dependent attacks: Significant expansion of Guardian’s scanning capabilities for detecting library-dependent attack vectors. The scanners for PyTorch and Pickle now perform deep structure evaluation of serialized code, examining execution paths and identifying potentially malicious code patterns that might be triggered through library dependencies. For instance, the PyTorch torchvision.io functions can overwrite any file on the victim’s system to either include a payload or delete all of its content. Guardian can now detect many more of those dangerous functions in popular libraries similar to PyTorch, Numpy, and Pandas.

Uncovered obfuscated attacks: Guardian performs multi-layered analyses across various archive formats, decompressing nested archives and examining compressed payloads for malicious models. This approach detects attempts to cover malicious code through compression, encoding, or serialization techniques. Joblib, for instance, supports saving models using different compression formats which may obfuscate Pickle deserialization vulnerabilities, and the identical might be done in other formats like Keras which may include Numpy weights files which have deserialization payloads in them.

Detected exploits in framework extensibility components: Guardian’s continually improving detection modules alerted users on Hugging Face to models that were impacted by CVE-2025-1550 (a critical security finding) before the vulnerability was publicly disclosed. These detection modules comprehensively analyze ML framework extension mechanisms, allowing only standard or verified components and blocking potentially dangerous implementations, no matter their apparent intent.

Identified additional architectural backdoors: Guardian’s architectural backdoor detection capabilities were expanded beyond ONNX formats to incorporate additional model formats like TensorFlow.

Expanded model format coverage: Guardian’s true strength comes from the depth of its coverage, which has driven substantial expansion of detection modules to incorporate additional formats like Joblib and an increasingly popular llamafile format, with support for added ML frameworks coming soon.

Provided deeper model evaluation: Actively research on additional ways to reinforce current detection capabilities for higher evaluation and detection of unsafe models. Expect to see significant enhancements in reducing each false positives and false negatives within the near future.

It Only Gets Higher from Here

Through the partnership with Protect AI and Hugging Face, we’ve made third-party ML models safer and more accessible. We consider that having more eyes on model security can only be an excellent thing. We’re increasingly seeing the safety world concentrate and lean in, making threats more discoverable and AI usage safer for all.

Source link

Protect AI + Hugging Face 6 Months In

4 recent threat detection modules launched

By the numbers

Maintaining a Zero Trust Approach to Model Security

Evolving Guardian’s Model Vulnerability Detection Capabilities

Common attack themes

Delivering Comprehensive Threat Detection for Hugging Face Users

It Only Gets Higher from Here

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Code Less, Ship Faster: Constructing APIs with FastAPI

YOLOv3 Paper Walkthrough: Even Higher, But Not That Much

OpenAI’s “compromise” with the Pentagon is what Anthropic feared

Exciting Changes Are Coming to the TDS Creator Payment Program

I checked out considered one of the largest anti-AI protests ever

Protect AI + Hugging Face 6 Months In

4 recent threat detection modules launched

By the numbers

Maintaining a Zero Trust Approach to Model Security

Evolving Guardian’s Model Vulnerability Detection Capabilities

Common attack themes

Delivering Comprehensive Threat Detection for Hugging Face Users

It Only Gets Higher from Here

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.