NVIDIA brings agents to life with DGX Spark and Reachy Mini

Today at CES 2026, NVIDIA unveiled a world of recent open models to enable the longer term of agents, online and in the true world. From the recently released NVIDIA Nemotron reasoning LLMs to the brand new NVIDIA Isaac GR00T N1.6 open reasoning VLA and NVIDIA Cosmos world foundation models, all of the constructing blocks are here today for AI Builders to construct their very own agents.

But what in the event you could bring your individual agent to life, right at your desk? An AI buddy that could be useful to you and process your data privately?

Within the CES keynote today, Jensen Huang showed us how we are able to do exactly that, using the processing power of NVIDIA DGX Spark with Reachy Mini to create your individual little office R2D2 you’ll be able to confer with and collaborate with.

This blog post provides a step-by-step guide to copy this amazing experience at home using a DGX Spark and Reachy Mini.

Let’s dive in!

Ingredients

If you desire to start cooking immediately, here’s the source code of the demo.

We’ll be using the next:

A reasoning model: demo uses NVIDIA Nemotron 3 Nano
A vision model: demo uses NVIDIA Nemotron Nano 2 VL
A text-to-speech model: demo uses ElevenLabs
Reachy Mini (or Reachy Mini Simulation)
Python v3.10+ environment, with uv

Be happy to adapt the recipe and make it your individual – you might have some ways to integrate the models into your application:

Local deployment – Run on your individual hardware (DGX Spark or a GPU with sufficient VRAM). Our implementation requires ~65GB disk space for the reasoning model, and ~28GB for the vision model.
Cloud deployment– Deploy the models on cloud GPUs e.g. through NVIDIA Brev or Hugging Face Inference Endpoints.
Serverless model endpoints – Send requests to NVIDIA or Hugging Face Inference Providers.

Giving agentic powers to Reachy

Turning an AI agent from a straightforward chat interface into something you’ll be able to interact with naturally makes conversations feel more real. When an AI agent can see through a camera, speak out loud, and perform actions, the experience becomes more engaging. That’s what Reachy Mini makes possible.

Reachy Mini is designed to be customizable. With access to sensors, actuators, and APIs, you’ll be able to easily wire it into your existing agent stack, by simulation or real hardware controlled directly from Python.

This post focuses on composing existing constructing blocks quite than reinventing them. We mix open models for reasoning and vision, an agent framework for orchestration, and power handlers for actions. Each component is loosely coupled, making it easy to swap models, change routing logic, or add recent behaviors.

Unlike closed personal assistants, this setup stays fully open. You control the models, the prompts, the tools, and the robot’s actions. Reachy Mini simply becomes the physical endpoint of your agent where perception, reasoning, and motion come together.

Constructing the agent

In this instance, we use the NVIDIA NeMo Agent Toolkit, a versatile, lightweight, framework-agnostic open source library, to attach all of the components of the agent together. It really works seamlessly with other agentic frameworks, like LangChain, LangGraph, CrewAI, handling how models interact, routing inputs and outputs between them, and making it easy to experiment with different configurations or add recent capabilities without rewriting core logic. The toolkit also provides built-in profiling and optimization features, letting you track token usage efficiency and latency across tools and agents, discover bottlenecks, and robotically tune hyperparameters to maximise accuracy while reducing cost and latency.

Step 0: Arrange and get access to models and services

First, clone the repository that incorporates all of the code you’ll must follow along:

git clone git@github.com/brevdev/reachy-personal-assistant
cd reachy-personal-assistant

To access your intelligence layer, powered by the NVIDIA Nemotron models, you’ll be able to either deploy them using NVIDIA NIM or vLLM, or connect with them through distant endpoints available at construct.nvidia.com.

The next instructions assume you might be accessing the Nemotron models via endpoints. Create a .env file within the foremost directory together with your API keys. For local deployments, you don’t want to specify API keys and may skip this step.

NVIDIA_API_KEY=your_nvidia_api_key_here
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here

Step 1: Construct a chat interface

Start by getting a basic LLM chat workflow running through NeMo Agent Toolkit’s API server. NeMo Agent Toolkit supports running workflows via `nat serve` and providing a config file. The config file passed here incorporates all of the essential setup information for the agent, which incorporates the models used for chat, image understanding, in addition to the router model utilized by the agent. The NeMo Agent Toolkit UI can connect over HTTP/WebSocket so you’ll be able to chat together with your workflow like a normal chat product. On this implementation, the NeMo Agent Toolkit server is launched on port 8001 (so your bot can call it, and the UI can too):

cd nat
uv venv
uv sync
uv run --env-file ../.env nat serve --config_file src/ces_tutorial/config.yml --port 8001

Next, confirm which you can send a plain text prompt through a separate terminal to make sure every little thing is setup accurately:

curl -s http://localhost:8001/v1/chat/completions 
  -H "Content-Type: application/json" 
  -d '{"model": "test", "messages": [{"role": "user", "content": "What is the capital of France?"}]}'

Reviewing the agent configuration, you’ll notice it defines way more capabilities than a straightforward chat completion. The subsequent steps will walk through those details.

Step 2: Add NeMo Agent Toolkit’s built-in ReAct agent for tool calling

Tool calling is a necessary a part of AI agents. NeMo Agent Toolkit features a built-in ReAct agent that may reason between tool calls and use multiple tools before answering. We route “motion requests” to a ReAct agent that’s allowed to call tools (for instance, tools that trigger robot behaviors or fetch current robot state).

Some practical notes to bear in mind:

Keep tool schemas tight (clear name/description/args), because that’s what the agent uses to come to a decision what to call.
Put a tough cap on steps (max_tool_calls) so the agent can’t spiral.
If using a physical robot, consider a “confirm before actuation” pattern for physical actions to make sure movement safety.

Take a have a look at this portion of the config it defines the tools (like Wikipedia search) and specifies the ReAct agent pattern used to administer them.

functions:
   wikipedia_search:
      _type: wiki_search
      max_results: 2
      
   ..
   react_agent:
      _type: react_agent
      llm_name: agent_llm
      verbose: true
      parse_agent_response_max_retries: 3
      tool_names: [wikipedia_search]

workflow:
   _type: ces_tutorial_router_agent
   agent: react_agent

Step 3: Add a router to direct queries to different models

The important thing idea: don’t use one model for every little thing. As an alternative, route based on intent:

Text queries can use a quick text model
Visual queries have to be run through a VLM
Motion/tool requests are routed to the ReAct agent + tools

You possibly can implement routing a couple of ways (heuristics, a light-weight classifier, or a dedicated routing service). In case you want the “production” version of this concept, the NVIDIA LLM Router developer example is the total reference implementation and includes evaluation and monitoring patterns.

A basic routing policy might work like this:

If the user is asking a matter about their environment, then send the request to a VLM together with a picture captured from the camera (or Reachy).
If the user asks a matter requiring real time information, send the input to a ReACT agent to perform an internet search via a tool call.
If the user is asking easy questions, send the request to a small and fast model optimized for chit chat.

These sections of the config define the routing topology and specify the router model.

functions:
   ..
   
   router:
      _type: router
      route_config:
        - name: other
          description: Any query that requires careful thought, outside information, image understanding, or tool calling to take actions.
        - name: chit_chat
          description: Any easy chit chat, small talk, or casual conversation.
        - name: image_understanding
          description: A matter that requires the assistant to see the user eg a matter about their appearance, environment, scene or surroundings. Examples what am I holding, what am I wearing, what do I appear to be, what's in my surroundings, what does it say on the whiteboard. Questions on attire eg what color is my shirt/hat/jacket/etc
      llm_name: routing_llm


llms:
   ..
   routing_llm: 
      _type: nim
      model_name: microsoft/phi-3-mini-128k-instruct
      temperature: 0.0

NOTE: If you desire to reduce latency/cost or run offline, you’ll be able to self-host one in every of the routed models (typically the “fast text” model) and keep the VLM distant. One common approach is serving via NVIDIA NIM or vLLM and pointing NeMo Agent Toolkit to an OpenAI-compatible endpoint.

Step 4: Add a Pipecat bot for real-time voice + vision

Now we go real time. Pipecat is a framework designed for low-latency voice/multimodal agents: it orchestrates audio/video streams, AI services, and transports so you’ll be able to construct natural conversations. On this repo, the bot service is chargeable for:

Capturing vision (robot camera)
Speech recognition + text-to-speech
Coordinating robot movement and expressive behaviors

You can find all of the pipecat bot code within the `reachy-personal-assistant/bot` folder.

Step 5: Hook every little thing as much as Reachy (hardware or simulation)

Reachy Mini exposes a daemon that the remainder of your system connects to. The repo runs the daemon in simulation by default (–sim). If you might have access to an actual Reachy you’ll be able to remove this flag and the identical code will control your robot.

Run the total system

You’ll need three terminals to run your entire system:

Terminal 1: Reachy daemon

cd bot
# macOS:
uv run mjpython -m reachy_mini.daemon.app.foremost --sim --no-localhost-only

# Linux:
uv run -m reachy_mini.daemon.app.foremost --sim --no-localhost-only

In case you are using the physical hardware, remember to omit the –sim flag from the command.

Terminal 2: Bot service

cd bot
uv venv
uv sync
uv run --env-file ../.env python foremost.py

Terminal 3: NeMo Agent Toolkit service

If the NeMo Agent Toolkit service will not be already running from Step 1, start it now in Terminal 3.

cd nat
uv venv
uv sync
uv run --env-file ../.env nat serve --config_file src/ces_tutorial/config.yml --port 8001

Once all of the terminals are arrange, there are two foremost windows to maintain track of:

Reachy Sim – This window appears robotically whenever you start the simulator daemon in Terminal 1. That is applicable in the event you’re running Reachy mini simulation rather than the physical device.
Pipecat Playground – That is the client-side UI where you’ll be able to connect with the agent, enable microphone and camera inputs, and look at live transcripts. In Terminal 2, open the URL exposed by the bot service: http://localhost:7860/. Click “CONNECT” in your browser. It might take a couple of seconds to initialize, and also you’ll be prompted to grant microphone (and optionally camera) access.

Once each windows are up and running:

The Client and Agent STATUS indicators should show READY
The bot will greet you with a welcome message “Hello, how may I assist you today?”

At this point, you’ll be able to start interacting together with your agent!

Try these example prompts

Listed below are a couple of easy prompts to enable you test your personal assistant. You possibly can start with these after which experiment by adding your individual to see how the agent responds!

Text-only prompts (routes to the fast text model)

“Explain what you’ll be able to do in a single sentence.”
“Summarize the very last thing I said.”

Vision prompts (routes to the VLM)

“What am I holding as much as the camera?”
“Read the text on this page and summarize it.”

Where to go next

As an alternative of a “black-box” assistant, this builds a foundation for a personal, hackable system where you’ll be able to control each the intelligence and the hardware. You possibly can inspect, extend, and run it locally, with full visibility into data flow, tool permissions, and the way the robot perceives and acts.

Depending in your goals, listed below are a couple of directions to explore next:

Optimize for performance: Use the LLM Router developer example to balance cost, latency, and quality by intelligently directing queries between different models.
Take a look at the tutorial for constructing a voice-powered RAG agent with guardrails using Nemotron open models.
Master the hardware: Explore the Reachy Mini SDK and simulation docs to design and test advanced robotic behaviors before deploying to your physical system.
Explore and contribute to the apps built by the community for Reachy.

Need to try it immediately? Deploy the total environment here. One click and also you’re running.

Source link

NVIDIA brings agents to life with DGX Spark and Reachy Mini

Ingredients

Giving agentic powers to Reachy

Constructing the agent

Step 0: Arrange and get access to models and services

Step 1: Construct a chat interface

Step 2: Add NeMo Agent Toolkit’s built-in ReAct agent for tool calling

Step 3: Add a router to direct queries to different models

Step 4: Add a Pipecat bot for real-time voice + vision

Step 5: Hook every little thing as much as Reachy (hardware or simulation)

Run the total system

Terminal 1: Reachy daemon

Terminal 2: Bot service

Terminal 3: NeMo Agent Toolkit service

Try these example prompts

Where to go next

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Exposing biases, moods, personalities, and abstract concepts hidden in large language models

Accelerating Data Processing with NVIDIA Multi-Instance GPU and NUMA Node Localization

「データ不足」の壁を越える：合成ペルソナが日本のAI開発を加速

Microsoft has a brand new plan to prove what’s real and what’s AI online

Announcing our latest Gemini AI model

NVIDIA brings agents to life with DGX Spark and Reachy Mini

Ingredients

Giving agentic powers to Reachy

Constructing the agent

Step 0: Arrange and get access to models and services

Step 1: Construct a chat interface

Step 2: Add NeMo Agent Toolkit’s built-in ReAct agent for tool calling

Step 3: Add a router to direct queries to different models

Step 4: Add a Pipecat bot for real-time voice + vision

Step 5: Hook every little thing as much as Reachy (hardware or simulation)

Run the total system

Terminal 1: Reachy daemon

Terminal 2: Bot service

Terminal 3: NeMo Agent Toolkit service

Try these example prompts

Where to go next

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.