Practical Security Guidance for Sandboxing Agentic Workflows and Managing Execution Risk

AI coding agents enable developers to work faster by streamlining tasks and driving automated, test-driven development. Nevertheless, additionally they introduce a major, often ignored, attack surface by running tools from the command line with the identical permissions and entitlements because the user, making them computer use agents, with all of the risks those entail.

The first threat to those tools is that of indirect prompt injection, where a portion of the content ingested by the LLM driving the model is provided by an adversary through vectors comparable to malicious repositories or pull requests, git histories with prompt injections, .cursorrules, CLAUDE/AGENT.md files that contain prompt injections or malicious MCP responses. Such malicious instructions to the LLM may end up in it taking attacker-influenced actions with hostile consequences.

Manual approval of actions performed by the agent is probably the most common option to manage this risk, however it also introduces ongoing developer friction, requiring developers to repeatedly return to the applying to review and approve actions. This creates a risk of user habituation where they simply approve potentially dangerous actions without reviewing them. A key requirement for agentic system security is finding the balance between hands-on user input and automation. The next controls are what the NVIDIA AI Red Team considers either required or highly really useful, but ‌must be implemented to reflect your specific use case and your organization’s risk tolerance.

Based on the NVIDIA AI Red Team’s experience, the next mandatory controls mitigate probably the most serious attacks that could be achieved with indirect prompt injection:

Network egress controls: Blocking network access to arbitrary sites prevents exfiltration of information or establishing a distant shell without additional exploits.
Block file writes outside of the workspace: Blocking write operations to files outside of the workspace prevents plenty of persistence mechanisms, sandbox escapes, and distant code execution (RCE) techniques.
Block writes to configuration files, irrespective of where they’re situated: Blocking writes to config files prevents exploitation of hooks, skills, and native model context protocol (MCP) configurations that usually run outside of a sandbox context.

These really useful controls further reduce the attack surface, making host enumeration and exploration harder, limiting risks posed by hooks, local MCP configurations, and kernel exploits, and shutting other exploitation and disclosure risks.

Prevent reads from files outside of the workspace.
Sandbox your entire integrated development environment (IDE) and all spawned functions (e.g., hooks, MCP startup scripts, skills, and gear calls), and, where possible, are run as their very own user.
Use virtualization to isolate the sandbox kernel from the host kernel (e.g., microVM, Kata container, full VM)
Require user approval for each instance of specific actions (e.g., a network connection) that otherwise violate isolation controls. Allow-once / run-many isn’t an adequate control.
Use a secret injection approach to stop secrets (e.g., in environment variables) from being shared with the agent.
Establish lifecycle management controls for the sandbox to stop the buildup of code, mental property, or secrets.

Note: This post doesn’t address risks arising from inaccurate or adversarially manipulated output from AI-powered tools, that are treated as user-level responsibilities.

Practical Security Guidance for Sandboxing Agentic Workflows and Managing Execution Risk

Why implement sandbox controls at an OS level?

Mandatory sandbox security controls

Network egress except to known-good locations

Block file writes outside of the lively workspace

Block all writes to any agent configuration file or extension

Tiered implementation of controls

Advisable sandbox security controls

Sandbox IDE and all spawned functions

Use virtualization to isolate the sandbox kernel from the host kernel

Prevent reads from files outside of the workspace

Require manual user approval each time an motion would violate default-deny isolation controls

Use a secret injection approach to stop secrets from being exposed to the agent

Establish lifecycle management controls for the sandbox

Learn more

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Speed up Token Production in AI Factories Using Unified Services and Real-Time AI

How Can A Model 10,000× Smaller Outsmart ChatGPT?

NVIDIA Extreme Co-Design Delivers Latest MLPerf Inference Records

The Inversion Error: Why Secure AGI Requires an Enactive Floor and State-Space Reversibility

CUDA Tile Programming Now Available for BASIC!

Practical Security Guidance for Sandboxing Agentic Workflows and Managing Execution Risk

Why implement sandbox controls at an OS level?

Mandatory sandbox security controls

Network egress except to known-good locations

Block file writes outside of the lively workspace

Block all writes to any agent configuration file or extension

Tiered implementation of controls

Advisable sandbox security controls

Sandbox IDE and all spawned functions

Use virtualization to isolate the sandbox kernel from the host kernel

Prevent reads from files outside of the workspace

Require manual user approval each time an motion would violate default-deny isolation controls

Use a secret injection approach to stop secrets from being exposed to the agent

Establish lifecycle management controls for the sandbox

Learn more

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.