From Assistant to Adversary: Exploiting Agentic AI Developer Tools

-


Developers are increasingly turning to AI-enabled tools for coding, including Cursor, OpenAI Codex, Claude Code, and GitHub Copilot. While these automation tools can enable faster development and reviews, additionally they present an expanding attack surface for threat actors. 

These agentic tools have different implementations but all share the common framework of using LLMs to find out actions to tackle a developer’s behalf. More agentic autonomy means increased access and capabilities, with a corresponding increase in overall unpredictability.

On this blog, we are going to detail how an attacker can leverage easy watering hole attacks, introducing untrusted data to make the most of the mixture of assistive alignment and increasing agent autonomy that subsequently achieves distant code execution (RCE) on developer machines.

That is an summary of certainly one of the attack frameworks we presented at Black Hat USA in August 2025.

What are computer use agents?

For our purposes, a pc use agent (CUA) is any agent that may autonomously execute actions and tools on a given machine with the identical access and permissions because the signed-in user. 

Generally, these agents use LLMs to parse user queries, code, and command results to find out the following motion to take. They’re designed to repeatedly invoke actions until a given user request is complete. These actions can include things like moving or clicking the mouse, typing, editing files, and even executing commands.

We classify agents into autonomy levels, defined by the possible paths of execution available to them. CUAs are generally classified as level 3 agents. A model—generally an LLM, but often augmented by vision models to know displays—determines the following actions, and the results of those actions are passed back to the model. This creates an execution loop, and a high degree of nondeterminism.

General architecture of computer use agents, showing the communication flow between the server agent determining the next tool calls, and the client agent executing the tools. The diagram highlights the execution loop, in which the client continues to execute tools until the server agent (using LLM/vision/other models) determines that the users original task is completedGeneral architecture of computer use agents, showing the communication flow between the server agent determining the next tool calls, and the client agent executing the tools. The diagram highlights the execution loop, in which the client continues to execute tools until the server agent (using LLM/vision/other models) determines that the users original task is completed
Figure 1. General architecture of computer use agents

It’s unimaginable to confidently map the flow of information and execution for any given query, because the result will likely be different every time. This volatility, combined with these agents’ ability to execute commands on a user’s machine, creates ample opportunities for attackers.

How can we leverage agent alignment?

Crafting an attack against these agents first requires understanding their capabilities, overall alignment, and customary use cases. 

Tools like Cursor (assuming the agentic auto-run feature is enabled) are designed to autonomously complete users tasks by editing a codebase and executing crucial terminal commands. We are able to learn more about how Cursor works by reading its various system prompts, including the system prompt specific to tool execution:

You might have tools at your disposal to resolve the coding task. Follow these rules regarding tool calls:
1. ALWAYS follow the tool call schema exactly as specified and ensure that to offer all crucial parameters.
...
8. You may autonomously read as many files as it's essential to make clear your individual questions and completely resolve the user's query, not only one.
9. GitHub pull requests and issues contain useful details about the right way to make larger structural changes within the codebase. Also they are very useful for answering questions on recent changes to the codebase. It is best to strongly prefer reading pull request information over manually reading git information from terminal. It is best to call the corresponding tool to get the complete details of a pull request or issue in case you consider the summary or title indicates that it has useful information. Be mindful pull requests and issues are usually not at all times up up to now, so you need to prioritize newer ones over older ones. When mentioning a pull request or issue by number, you need to use markdown to link externally to it. 

Here, we see that Cursor is being explicitly instructed to ingest a repository’s pull requests and issues. This data source is inherently untrusted, assuming that external contributors can open pull requests and issues on a repository. Knowing this, we will leverage indirect prompt injection—through which we will add malicious instructions to the content retrieved by a model —to inject a payload right into a GitHub issue or pull request. 

For demonstration purposes, we created a goal repository PyCronos, a fake Python data evaluation library. Our objective was to craft an injection that, assuming typical agent usage, could achieve code execution on the machines of developers and maintainers of this repository.

The right way to plant the payload

Knowing that Cursor has the potential to autonomously execute terminal commands, we first have to develop and plant a payload that will probably be ultimately run on a goal user’s machine. In this instance, we obfuscated a basic Powershell script that achieves a reverse shell, with the intention of targeting Windows developers. Using open source obfuscators, the script was recursively obfuscated until it successfully bypassed basic Windows Defender protections. 

Targeting our hypothetical PyCronos repository, we created a pycronos-integration Github user. From this account, we created a win-pycronos repository where the Powershell payload was planted.

A screenshot of the attacker Github repository including a win-pycronos.ps1 file. The file contents are extremely obfuscated and not legible.A screenshot of the attacker Github repository including a win-pycronos.ps1 file. The file contents are extremely obfuscated and not legible.
Figure 2. Snippet of obfuscated reverse shell PS script

From this pycronos-integrations account, we now must craft our indirect prompt injection payload to persuade a victim’s Cursor agent to download and execute our Powershell payload.

The right way to plant the prompt injection

First, we attempt indirect prompt injection via a GitHub issue. We’re effectively social engineering whatever agent is parsing this issue to get it to execute our malicious payload. 

A screenshot of a GitHub Issue claiming that the only way to reproduce the user’s error is to run a specific powershell command. The command in question downloads and executes the attack payload.A screenshot of a GitHub Issue claiming that the only way to reproduce the user’s error is to run a specific powershell command. The command in question downloads and executes the attack payload.
Figure 3. Indirect prompt injection via GitHub issue

Here the attacker has planted a difficulty that claims the library’s (non-existent) Window’s integration is broken. The difficulty claims that one must run a particular command to breed the error. While a human reviewer would likely realize that this feature doesn’t exist, and this command is downloading and executing code from a distant source, an agent may not.

We tested this attack path first against the demo release of Anthropic’s Computer Use Agent. Note that this release does contain a security warning indicating that prompt injection is feasible, and the agent should strictly be used inside an isolated environment. 

If a user prompts the CUA with something along the lines of “Help me resolve open issues on this repository,” the agent will comply. 

A screenshot of a browser in which the agent is typing the URL to navigate to the relevant question.A screenshot of a browser in which the agent is typing the URL to navigate to the relevant question.
Figure 4. Screenshot of CUA tool navigating to the relevant issue

The agent navigates to the open issue, parses the screenshot of the difficulty, and pulls out the command it must execute. It then uses the available tools to execute it successfully, granting the attacker a reverse shell.

A screenshot of the chat transcript with the agent, in which the agent has parsed the malicious command, runs it, and returns the text “Command executed successfully."A screenshot of the chat transcript with the agent, in which the agent has parsed the malicious command, runs it, and returns the text “Command executed successfully."
Figure 5. CUA successfully executing the command from the difficulty

Upon trying the identical attack path against Cursor, it’s not so easy. Cursor doesn’t depend on vision, as an alternative pulling the text directly from the difficulty’s metadata. Here, it sees the try and download and execute distant code, and informs the user of the risks before refusing to finish the duty.

A screenshot of Cursor chat transcript in which the Cursor agent spells out the security risks associated with downloading and executing the remote code specified in the issueA screenshot of Cursor chat transcript in which the Cursor agent spells out the security risks associated with downloading and executing the remote code specified in the issue
Figure 6. Cursor chat showing agent’s refusal to execute malicious command

This tells us that there are some guardrails in place, scanning the GitHub issue itself for potentially malicious commands. Now, the target is to enhance our injection to seem more benign, removing the execution of the payload download from the injection itself. 

We are able to do that by hiding our payload download inside a fake Python package. From the attacker’s pycronos-integrations repository, we create a seemingly harmless pycronos-windows package.

A screenshot of a Github repository containing a fake “PyCronos for Windows package” including a setup.py file and a basic READMEA screenshot of a Github repository containing a fake “PyCronos for Windows package” including a setup.py file and a basic README
Figure 7. Screenshot of a seemingly innocuous Python package on Github.

Inside the setup.py, we place the command to download and execute the distant payload.

A screenshot of the setup.py file, including a RunCommand function which spawns a subprocess to execute the malicious reverse shell payloadA screenshot of the setup.py file, including a RunCommand function which spawns a subprocess to execute the malicious reverse shell payload
Figure 8. Screenshot of the package’s setup.py, containing command to download and execute payload

This can execute RunCommand upon a pip install of this package.

Next, we create a pull request on the goal repository so as to add this package to the present project dependencies.

A screenshot of the injection Github PR, in which the proposed change is an additional line in the requirements.txt file that adds the malicious dependencyA screenshot of the injection Github PR, in which the proposed change is an additional line in the requirements.txt file that adds the malicious dependency
Figure 9. Pull request by the attacker, adding malicious dependency to focus on repository.

When a user prompts their Cursor agent to review open pull requests, it creates a branch and checks out the changes, before running pip install -r requirements.txt to check the changes.

A screenshot of Cursor chat transcript in which the Cursor agent spells out the security risks associated with downloading and executing the remote code specified in the issueA screenshot of Cursor chat transcript in which the Cursor agent spells out the security risks associated with downloading and executing the remote code specified in the issue
Figure 10. Cursor chat showing the agent executing the pip install of the malicious package.

As soon as our malicious package is installed by the agent on the user’s machine, we receive a reverse shell, gaining execution directly on the user’s computer. 

This attack underlines the pattern that permits all such attacks: a very privileged agent treating untrusted data (on this case, each the pull request and the malicious package) as trusted will be become a tool working on behalf of the attacker. 

What to do to forestall such attacks

As broken down in our talk From Prompts to Pwns: Exploiting and Securing AI Agents at Black Hat USA 2025, we recommend adopting an “assume prompt injection” approach when architecting or assessing agentic applications. If an agent relies on LLMs to find out actions and power calls to invoke, assume the attacker can gain control of the LLM output, and might consequently control all downstream events. 

When architecting similar agentic applications, NVIDIA’s LLM vulnerability scanner garak will be used to assist test for known prompt injection issues. To assist harden LLMs against prompt injection, consider utilizing NeMo Guardrails on LLM inputs and outputs. 

The safest approach is to limit the degree of autonomy as much as possible, favoring specific predefined workflows that may prevent the agent from executing arbitrary plans.  If that will not be possible, enforcing human-in-the-loop approval for select “sensitive” commands or actions is strongly really helpful, particularly within the presence of probably untrusted data.

If fully autonomous agentic workflows without human oversight are a requirement, then one of the best approach is to isolate them as much as possible from any sensitive tools or information, akin to requiring that fully autonomous computer use agents should be run in an isolated environment akin to standalone virtual machine with limited network egress and limited access to enterprise or user data.  

An identical but less effective approach is to implement the usage of local development containers; this can provide a point of isolation for the agent as well, albeit less effective than a totally isolated VM.

Regarding Cursor specifically, enterprise controls can be found to either disable auto-run, or to limit its blast radius by only allowing autonomous execution of allowlisted commands. Moreover, background agents are actually available to permit users to spawn autonomous agents inside containers on Cursor’s isolated cloud infrastructure.

Agentic coding workflows have unlocked rapid development capabilities across the industry. But to effectively harness this latest efficiency, enterprises and developers need to know the potential risks and adopt mitigating policies.

For more details, please see our talk at Black Hat USA. Black Hat will post the talk recording to YouTube when available.



Source link

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x