Developers are increasingly turning to AI-enabled tools for coding, including Cursor, OpenAI Codex, Claude Code, and GitHub Copilot. While these automation tools can enable faster development and reviews, additionally they present an expanding attack surface for threat actors.
These agentic tools have different implementations but all share the common framework of using LLMs to find out actions to tackle a developer’s behalf. More agentic autonomy means increased access and capabilities, with a corresponding increase in overall unpredictability.
On this blog, we are going to detail how an attacker can leverage easy watering hole attacks, introducing untrusted data to make the most of the mixture of assistive alignment and increasing agent autonomy that subsequently achieves distant code execution (RCE) on developer machines.
That is an summary of certainly one of the attack frameworks we presented at Black Hat USA in August 2025.
What are computer use agents?
For our purposes, a pc use agent (CUA) is any agent that may autonomously execute actions and tools on a given machine with the identical access and permissions because the signed-in user.
Generally, these agents use LLMs to parse user queries, code, and command results to find out the following motion to take. They’re designed to repeatedly invoke actions until a given user request is complete. These actions can include things like moving or clicking the mouse, typing, editing files, and even executing commands.
We classify agents into autonomy levels, defined by the possible paths of execution available to them. CUAs are generally classified as level 3 agents. A model—generally an LLM, but often augmented by vision models to know displays—determines the following actions, and the results of those actions are passed back to the model. This creates an execution loop, and a high degree of nondeterminism.


It’s unimaginable to confidently map the flow of information and execution for any given query, because the result will likely be different every time. This volatility, combined with these agents’ ability to execute commands on a user’s machine, creates ample opportunities for attackers.
How can we leverage agent alignment?
Crafting an attack against these agents first requires understanding their capabilities, overall alignment, and customary use cases.
Tools like Cursor (assuming the agentic auto-run feature is enabled) are designed to autonomously complete users tasks by editing a codebase and executing crucial terminal commands. We are able to learn more about how Cursor works by reading its various system prompts, including the system prompt specific to tool execution:
You might have tools at your disposal to resolve the coding task. Follow these rules regarding tool calls:
1. ALWAYS follow the tool call schema exactly as specified and ensure that to offer all crucial parameters.
...
8. You may autonomously read as many files as it's essential to make clear your individual questions and completely resolve the user's query, not only one.
9. GitHub pull requests and issues contain useful details about the right way to make larger structural changes within the codebase. Also they are very useful for answering questions on recent changes to the codebase. It is best to strongly prefer reading pull request information over manually reading git information from terminal. It is best to call the corresponding tool to get the complete details of a pull request or issue in case you consider the summary or title indicates that it has useful information. Be mindful pull requests and issues are usually not at all times up up to now, so you need to prioritize newer ones over older ones. When mentioning a pull request or issue by number, you need to use markdown to link externally to it.
Here, we see that Cursor is being explicitly instructed to ingest a repository’s pull requests and issues. This data source is inherently untrusted, assuming that external contributors can open pull requests and issues on a repository. Knowing this, we will leverage indirect prompt injection—through which we will add malicious instructions to the content retrieved by a model —to inject a payload right into a GitHub issue or pull request.
For demonstration purposes, we created a goal repository PyCronos, a fake Python data evaluation library. Our objective was to craft an injection that, assuming typical agent usage, could achieve code execution on the machines of developers and maintainers of this repository.
The right way to plant the payload
Knowing that Cursor has the potential to autonomously execute terminal commands, we first have to develop and plant a payload that will probably be ultimately run on a goal user’s machine. In this instance, we obfuscated a basic Powershell script that achieves a reverse shell, with the intention of targeting Windows developers. Using open source obfuscators, the script was recursively obfuscated until it successfully bypassed basic Windows Defender protections.
Targeting our hypothetical PyCronos repository, we created a pycronos-integration Github user. From this account, we created a win-pycronos repository where the Powershell payload was planted.


From this pycronos-integrations account, we now must craft our indirect prompt injection payload to persuade a victim’s Cursor agent to download and execute our Powershell payload.
The right way to plant the prompt injection
First, we attempt indirect prompt injection via a GitHub issue. We’re effectively social engineering whatever agent is parsing this issue to get it to execute our malicious payload.


Here the attacker has planted a difficulty that claims the library’s (non-existent) Window’s integration is broken. The difficulty claims that one must run a particular command to breed the error. While a human reviewer would likely realize that this feature doesn’t exist, and this command is downloading and executing code from a distant source, an agent may not.
We tested this attack path first against the demo release of Anthropic’s Computer Use Agent. Note that this release does contain a security warning indicating that prompt injection is feasible, and the agent should strictly be used inside an isolated environment.
If a user prompts the CUA with something along the lines of “Help me resolve open issues on this repository,” the agent will comply.


The agent navigates to the open issue, parses the screenshot of the difficulty, and pulls out the command it must execute. It then uses the available tools to execute it successfully, granting the attacker a reverse shell.


Upon trying the identical attack path against Cursor, it’s not so easy. Cursor doesn’t depend on vision, as an alternative pulling the text directly from the difficulty’s metadata. Here, it sees the try and download and execute distant code, and informs the user of the risks before refusing to finish the duty.


This tells us that there are some guardrails in place, scanning the GitHub issue itself for potentially malicious commands. Now, the target is to enhance our injection to seem more benign, removing the execution of the payload download from the injection itself.
We are able to do that by hiding our payload download inside a fake Python package. From the attacker’s pycronos-integrations repository, we create a seemingly harmless pycronos-windows package.


Inside the setup.py, we place the command to download and execute the distant payload.


This can execute RunCommand upon a pip install of this package.
Next, we create a pull request on the goal repository so as to add this package to the present project dependencies.


When a user prompts their Cursor agent to review open pull requests, it creates a branch and checks out the changes, before running pip install -r requirements.txt to check the changes.


As soon as our malicious package is installed by the agent on the user’s machine, we receive a reverse shell, gaining execution directly on the user’s computer.
This attack underlines the pattern that permits all such attacks: a very privileged agent treating untrusted data (on this case, each the pull request and the malicious package) as trusted will be become a tool working on behalf of the attacker.
What to do to forestall such attacks
As broken down in our talk From Prompts to Pwns: Exploiting and Securing AI Agents at Black Hat USA 2025, we recommend adopting an “assume prompt injection” approach when architecting or assessing agentic applications. If an agent relies on LLMs to find out actions and power calls to invoke, assume the attacker can gain control of the LLM output, and might consequently control all downstream events.
When architecting similar agentic applications, NVIDIA’s LLM vulnerability scanner garak will be used to assist test for known prompt injection issues. To assist harden LLMs against prompt injection, consider utilizing NeMo Guardrails on LLM inputs and outputs.
The safest approach is to limit the degree of autonomy as much as possible, favoring specific predefined workflows that may prevent the agent from executing arbitrary plans. If that will not be possible, enforcing human-in-the-loop approval for select “sensitive” commands or actions is strongly really helpful, particularly within the presence of probably untrusted data.
If fully autonomous agentic workflows without human oversight are a requirement, then one of the best approach is to isolate them as much as possible from any sensitive tools or information, akin to requiring that fully autonomous computer use agents should be run in an isolated environment akin to standalone virtual machine with limited network egress and limited access to enterprise or user data.
An identical but less effective approach is to implement the usage of local development containers; this can provide a point of isolation for the agent as well, albeit less effective than a totally isolated VM.
Regarding Cursor specifically, enterprise controls can be found to either disable auto-run, or to limit its blast radius by only allowing autonomous execution of allowlisted commands. Moreover, background agents are actually available to permit users to spawn autonomous agents inside containers on Cursor’s isolated cloud infrastructure.
Agentic coding workflows have unlocked rapid development capabilities across the industry. But to effectively harness this latest efficiency, enterprises and developers need to know the potential risks and adopt mitigating policies.
For more details, please see our talk at Black Hat USA. Black Hat will post the talk recording to YouTube when available.
