Create Your Own Bash Computer Use Agent with NVIDIA Nemotron in One Hour

-


What when you could consult with your computer and have it perform tasks through the Bash terminal, without you writing a single command? With NVIDIA Nemotron Nano v2, you possibly can easily construct a natural language Bash agent from scratch, in under an hour, and with roughly 200 lines of Python code with minimal dependencies.

This post will walk you thru the core components and considerations step-by-step, so you possibly can easily follow along. First, we’ll construct every little thing from scratch. Then, as a bonus, the blog will show how LangGraph can simplify the design further.

Let’s dive in.

What are the prerequisites?

  • NVIDIA Nemotron Nano 9B v2 (deployed locally or within the cloud)
    • Local deployment requires:
      •  ~20GB of disk space
      • An NVIDIA GPU with a minimum of 24GB of VRAM
  • An operating system with Bash, reminiscent of Ubuntu, Mac OS, or Windows Subsystem for Linux (WSL)
  • Python v3.10+ environment, with the next packages installed:
  • One hour of free time

What you’ll get in the long run: a pc use agent

Here is an actual interaction which you could expect from the agent you’ll construct from this tutorial:

['/home/sandbox' 🙂] Make a brand new directory called "system-info," then create a file called "info.txt" and populate it with current system config (disk space, memory), then briefly summarize the contents.

    ▶️   Execute 'mkdir system-info'? [y/N]: y
    ▶️   Execute 'touch system-info/info.txt'? [y/N]: y
    ▶️   Execute 'df -h >> system-info/info.txt'? [y/N]: y
    ▶️   Execute 'free -h >> system-info/info.txt'? [y/N]: y
    ▶️   Execute 'cat system-info/info.txt'? [y/N]: y

[🤖] Here's a summary of the system information in `system-info/info.txt`:

**Disk Space**  
- Total: 984 GB (`overlay`), 2.6 TB (`/home/sandbox`)  
- Used: ~518 GB (56%) on `/`, ~2.1 TB (83%) on `/home/sandbox`  
- Available: ~416 GB (42%) on `/`, ~432 GB (17%) on `/home/sandbox`  

**Memory**  
- Total: 62 GB RAM  
- Used: 20 GB  
- Free: 14 GB  
- Swap: 63 GB available (0 GB used)  

Let me know when you'd wish to refine this further.

Legend:      🙂: User message     🤖: Agent response     ▶️: Command execution confirmation

Going from chatbots to AI agents

Unlike chatbots that respond in a request-and-reply fashion, AI agents are systems that take high-level goals because the input, then autonomously reason, plan, and execute tasks to realize those goals. A key enabler for this process is tool calling (a.k.a. function calling): as a substitute of just replying with text, the agent can invoke external tools or APIs to truly perform actions, determine their outcomes, and plan for the subsequent steps.

The Bash computer use agent we’re constructing is a transparent example of an agent: You provide a high-level instruction, and it decides which Bash commands to run via tool calling to perform the instruction. This agent is sufficiently small to construct in from scratch around 200 lines of code, yet it illustrates the identical fundamental principles behind much more advanced agents.

On the core of each modern agent is a large language model (LLM) able to reasoning about user intent and translating it into concrete actions. This LLM should be efficient, responsive, and have excellent reasoning skills so it might probably achieve complex goals. That’s exactly what NVIDIA Nemotron Nano 9B v2 delivers: a compact model with strong reasoning ability that runs quickly to maintain interactions snappy, while maintaining a straightforward setup. These characteristics make it a wonderful fit for lightweight agents just like the one we’re constructing here.

When you’re just getting began and wish a primer on the 4 major components of an AI agent, please try this blog.

What are the important thing considerations?

Let’s start by reviewing the important thing considerations for constructing our agent:

  • Bash use via tool calls: We’d like to show the Bash CLI as a tool to the agent, so it might probably execute commands and receive outputs (reminiscent of success or failure, in addition to any outputs from the command). We also have to keep track of the lively working directory. This is significant since the agent must navigate across the filesystem, and must have the ability to run each Bash command from the proper directory.
  • Command safety: We must prevent our agent from running unsafe or destructive commands. To deal with this, we implement an allowed list of commands like ls, cat, and grep, ensuring the agent only operates inside a secure, predictable scope. Moreover, we introduce a confirmation step: Before executing any command, the user is prompted to approve it. This human-in-the-loop pattern gives the user full control over what actually runs within the terminal.
  • Error handling: To construct reliable agentic systems, we must all the time account for failure cases. For our Bash agent, commands can fail as a result of invalid syntax, missing files, or unexpected outputs. The agent should catch these errors, interpret them, and select the fitting next step.

What are the system components?

With the considerations in place, the architecture becomes quite simple. The system has two major components:  

  1. The Bash class: a light-weight wrapper around Python’s subprocess module that manages the working directory, enforces the command allowlist, executes commands, and returns the execution results (or errors) back to the agent.
  2. The agent: uses the NVIDIA Nemotron model to grasp user intent and choose methods to act, while maintaining context across turns. The agent’s behavior is guided by a fastidiously crafted system prompt that sets boundaries, defines its role as a Bash assistant, and reminds it of the allowed commands. 

The figure below depicts the architecture diagram of the system. The workflow is as follows:

  1. The user issues a high-level instruction, reminiscent of changing directories, copying files, or inspecting document contents.
  2. Nemotron interprets the request, breaks it into concrete steps, and uses the Bash class when command execution is required. Some tasks may require no execution in any respect, while others may span multiple commands. After each run, the model receives the output and decides the subsequent step or when to stop.
  3. Once the duty is complete, whether successful or halted by an error, the agent returns the result to the user and waits for the subsequent instruction.
The agent’s workflow diagram, which consists of the Nemotron model and the Bash class. The model takes a user request, executes the request via the Bash class and returns a response to the user. The agent’s workflow diagram, which consists of the Nemotron model and the Bash class. The model takes a user request, executes the request via the Bash class and returns a response to the user.
Figure 1. The agent’s workflow diagram

We’ll first implement each components from scratch, then this blog will walk you thru wiring them up with LangGraph to further simplify the setup.

The Bash class

We create a straightforward class that stores the list of allowed commands, in addition to the present working directory. See below for a summarized snippet of this class.

class Bash:
    """
    An implementation of a tool that executes Bash commands
    """

    def __init__(self, cwd: str, allowed_commands: List[str]):
        self.cwd = cwd  # The present working directory
        self._allowed_commands = allowed_commands  # Allowed commands

    def exec_bash_command(self, cmd: str) -> Dict[str, str]:
        """
        Execute the bash command after getting confirmation from the user
        """
        if cmd:
            # Check the allowlist
            allowed = True

            for cmd_part in self._extract_commands(cmd):
                if cmd_part not in self._allowed_commands:
                    allowed = False
                    break

            if not allowed:
                return {"error": "Parts of this command weren't within the allowlist."}

            return self._run_bash_command(cmd)
        return {"error": "No command was provided"}

    def to_json_schema(self) -> Dict[str, Any]:
        """
        Convert the function signature to a JSON schema for LLM tool calling.
        """
        return {
            "type": "function",
            "function": {
                "name": "exec_bash_command",
                "description": "Execute a bash command and return stdout/stderr and the working directory",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "cmd": {
                            "type": "string",
                            "description": "The bash command to execute"
                        }
                    },
                    "required": ["cmd"],
   },
            },
        }

    def _run_bash_command(self, cmd: str) -> Dict[str, str]:
        """
        Runs the bash command and catches exceptions (if any).
        """
        stdout = ""
        stderr = ""
        new_cwd = self.cwd

        try:
            # Wrap the command so we will keep track of the working directory.
            wrapped = f"{cmd};echo __END__;pwd"
            result = subprocess.run(
                wrapped, shell=True, cwd=self.cwd,
                capture_output=True, text=True,
                executable="/bin/bash"
            )
            stderr = result.stderr
            # Find the separator marker
            split = result.stdout.split("__END__")
            stdout = split[0].strip()

            # If no output/error in any respect, inform that the decision was successful.
            if not stdout and never stderr:
                stdout = "Command executed successfully, with none output."

            # Get the brand new working directory, and alter it
            new_cwd = split[-1].strip()
            self.cwd = new_cwd
        except Exception as e:
            stdout = ""
            stderr = str(e)

        return {"stdout": stdout, "stderr": stderr, "cwd": new_cwd}

This class exposes two public functions:

  1. exec_bash_command(cmd: str) -> Dict[str, str], which the agent can call to execute commands. It returns a dictionary with stdout, stderr, and the updated working directory, or an error if the command is invalid or not allowed. These signals let the agent adapt when something goes mistaken.
  2. to_json_schema(self) -> Dict[str, Any] , which is used for telling the LLM methods to use this tool (LangGraph doesn’t need this).

Before execution, the function checks the command against the allowlist. Execution is handled contained in the private function _run_bash_command(), which internally calls Python’s subprocess.run(). There are exception handling blocks to properly take care of all failure cases. To trace directory changes (reminiscent of when the agent uses the cd command), we append a novel text marker and pwd to each command. After execution, we locate the marker within the output, extract the brand new working directory, and update the tool’s state before returning the execution results, together with the lively working directory to the agent.

The agent

For the agent, we initialize Nemotron because the reasoning engine and register exec_bash_command() as a callable tool for command execution. The model’s behavior is formed by a system prompt (shown below) that defines its role as a Bash assistant, lists the allowed commands, and guides when and the way it should assist the user or invoke tool calls. While our Bash class enforces the allowlist, the prompt reinforces this rule, which is a very good practice to maintain the model aligned. The prompt also uses the /think flag to enable pondering mode, improving the model’s reasoning.

SYSTEM_PROMPT = f"""/think
You might be a helpful Bash assistant with the flexibility to execute commands within the shell.
You engage with users to assist answer questions on bash commands, or execute their intent.
If user intent is unclear, keep engaging with them to determine what they need and methods to best help
them. In the event that they ask query that will not be relevant to bash or computer use, decline to reply.

When a command is executed, you might be given the output from that command and any errors. Based on
that, either take further actions or yield control to the user.

The bash interpreter's output and current working directory might be given to you each time a
command is executed. Take that into consideration for the subsequent conversation.
If there was an error during execution, tell the user what that error was exactly.

You might be only allowed to execute the next commands:
{LIST_OF_ALLOWED_COMMANDS}

**Never** try and execute a command not on this list. **Never** try and execute dangerous commands
like `rm`, `mv`, `rmdir`, `sudo`, etc. If the user asks you to accomplish that, politely refuse.

Whenever you switch to recent directories, all the time list files so you possibly can get more context.
"""

The agent loop (produced from scratch)

Constructing the agent loop is easy. We initialize the OpenAI client and keep a history of conversation turns, acting as our memory/state. Contained in the loop:

  1. Take user input and send it to the model with the system prompt.
  2. Get and store the model’s response in conversation history, then check for tool calls:
    1. If a tool call is present, confirm execution with the user. On approval, run exec_bash_command(), return the result, and get the subsequent response; otherwise, inform the model.
    2. If no tool call is present, display the model’s reply and return control to the user.
  3. This cycle repeats until the appliance is terminated.

To maintain our code nice and tidy, let’s define abstractions for storing the conversation history (the Messages class), in addition to using the client to send requests to the model and get the responses (the LLM class). With these abstractions in place, the whole agent loop becomes short and intuitive:

bash = Bash(...)
# The model
llm = LLM(...)
# The conversation history, with the system prompt
messages = Messages(SYSTEM_PROMPT)

# The major agent loop
while True:
    # Get user message.
    user = input(f"['🙂] ").strip()
    messages.add_user_message(user)

    # The tool-call/response loop
    while True:
        response, tool_calls = llm.query(messages, [bash.to_json_schema()])
        # Add the response to the context
        messages.add_assistant_message(response)

        # Process tool calls
        if tool_calls:
            for tc in tool_calls:
                function_name = tc.function.name
                function_args = json.loads(tc.function.arguments)

                # Ensure it's calling the fitting tool
                if function_name != "exec_bash_command" or "cmd" not in function_args:
                    tool_call_result = json.dumps({"error": "Incorrect tool or function argument"})
                else:
                    if confirm_execution("cmd"):
                        tool_call_result = bash.exec_bash_command(function_args["cmd"])
                    else:
                        tool_call_result = {"error": "The user declined the execution of this command."}

                messages.add_tool_message(tool_call_result, tc.id)
        else:
            # Display the assistant's message to the user (without the pondering part).
            print(f"n[🤖] {response.strip()}")
            break

Note the inner while loop, which is required since the agent might need multiple tool calls to perform its task. This corresponds to step No. 2 in Figure 1.

Bonus: the agent loop (using LangGraph)

With LangGraph, the agent loop becomes even simpler. Using create_react_agent() from this library, we will manage the loop, connect the model, tool, and conversation state, and let the library handle tool calls and result passing robotically. It also makes error handling more structured, letting the agent react to failures or retries inside a controlled flow as a substitute of manual checks. Like our from-scratch version, a system prompt defines the Bash assistant’s role and enforces secure command execution, while a small helper wraps bash.exec_bash_command() for human-in-the-loop confirmation. This minimal setup produces a completely functional agent that understands intent, invokes the fitting tool, and returns results interactively.

The summarized code snippet is as follows:

from langgraph.prebuilt import create_react_agent
from langgraph.checkpoint.memory import InMemorySaver
from langchain_openai import ChatOpenAI

class ExecOnConfirm:
    """
    A wrapper across the Bash class to implement human-in-the-loop
    """

    def __init__(self, bash: Bash):
        self.bash = bash

    def _confirm_execution(self, cmd: str) -> bool:
        """Ask the user whether the suggested command needs to be executed."""
        return input(f"    ▶️   Execute '{cmd}'? [y/N]: ").strip().lower() == "y"

    def exec_bash_command(self, cmd: str) -> Dict[str, str]:
        """Execute a bash command after confirming with the user."""
        if self._confirm_execution(cmd):
            return self.bash.exec_bash_command(cmd)
        return {"error": "The user declined the execution of this command."}

# Instantiate the Bash class
bash = Bash(...)
# Create the agent
agent = create_react_agent(
    model=ChatOpenAI(model=...),
    tools=[ExecOnConfirm(bash).exec_bash_command],  # Wrap for human-in-the-loop
    prompt=SYSTEM_PROMPT,
    checkpointer=InMemorySaver(),
)
# Create the user/agent interaction loop
while True:
    user = input(f"[🙂] ").strip()
    # Run the agent's logic and get the response.
    result = agent.invoke({"messages": [{"role": "user", "content": user}]}, config=...)
    # Show the response (without the pondering part, if any)
    response = result["messages"][-1].content.strip()

    if "" in response:
        response = response.split("")[-1].strip()

    if response:
        print(f"n[🤖] {response}")

What are the subsequent steps?

You’ve now built your individual computer use agent with just a couple of lines of code. From here, experiment: Try adding your individual commands, adjust the system prompt, and see how the agent adapts. When you’ve explored a bit, you’ll notice the identical principles extend naturally to more advanced multi-agent systems.

Join the conversation within the NVIDIA developer forum. We’re excited to see your experiments, hear your questions, and take a look at what you construct next. 

Not sleep-to-date on NVIDIA Nemotron by subscribing to NVIDIA news and following NVIDIA AI on LinkedIn, X, Discord, and YouTube.



Source link

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x