Today at CES 2026, NVIDIA unveiled a world of recent open models to enable the longer term of agents, online and in the true world. From the recently released NVIDIA Nemotron reasoning LLMs to the brand new NVIDIA Isaac GR00T N1.6 open reasoning VLA and NVIDIA Cosmos world foundation models, all of the constructing blocks are here today for AI Builders to construct their very own agents.
But what in the event you could bring your individual agent to life, right at your desk? An AI buddy that could be useful to you and process your data privately?
Within the CES keynote today, Jensen Huang showed us how we are able to do exactly that, using the processing power of NVIDIA DGX Spark with Reachy Mini to create your individual little office R2D2 you’ll be able to confer with and collaborate with.
This blog post provides a step-by-step guide to copy this amazing experience at home using a DGX Spark and Reachy Mini.
Let’s dive in!
Ingredients
If you desire to start cooking immediately, here’s the source code of the demo.
We’ll be using the next:
- A reasoning model: demo uses NVIDIA Nemotron 3 Nano
- A vision model: demo uses NVIDIA Nemotron Nano 2 VL
- A text-to-speech model: demo uses ElevenLabs
- Reachy Mini (or Reachy Mini Simulation)
- Python v3.10+ environment, with uv
Be happy to adapt the recipe and make it your individual – you might have some ways to integrate the models into your application:
- Local deployment – Run on your individual hardware (DGX Spark or a GPU with sufficient VRAM). Our implementation requires ~65GB disk space for the reasoning model, and ~28GB for the vision model.
- Cloud deployment– Deploy the models on cloud GPUs e.g. through NVIDIA Brev or Hugging Face Inference Endpoints.
- Serverless model endpoints – Send requests to NVIDIA or Hugging Face Inference Providers.
Giving agentic powers to Reachy
Turning an AI agent from a straightforward chat interface into something you’ll be able to interact with naturally makes conversations feel more real. When an AI agent can see through a camera, speak out loud, and perform actions, the experience becomes more engaging. That’s what Reachy Mini makes possible.
Reachy Mini is designed to be customizable. With access to sensors, actuators, and APIs, you’ll be able to easily wire it into your existing agent stack, by simulation or real hardware controlled directly from Python.
This post focuses on composing existing constructing blocks quite than reinventing them. We mix open models for reasoning and vision, an agent framework for orchestration, and power handlers for actions. Each component is loosely coupled, making it easy to swap models, change routing logic, or add recent behaviors.
Unlike closed personal assistants, this setup stays fully open. You control the models, the prompts, the tools, and the robot’s actions. Reachy Mini simply becomes the physical endpoint of your agent where perception, reasoning, and motion come together.
Constructing the agent
In this instance, we use the NVIDIA NeMo Agent Toolkit, a versatile, lightweight, framework-agnostic open source library, to attach all of the components of the agent together. It really works seamlessly with other agentic frameworks, like LangChain, LangGraph, CrewAI, handling how models interact, routing inputs and outputs between them, and making it easy to experiment with different configurations or add recent capabilities without rewriting core logic. The toolkit also provides built-in profiling and optimization features, letting you track token usage efficiency and latency across tools and agents, discover bottlenecks, and robotically tune hyperparameters to maximise accuracy while reducing cost and latency.
Step 0: Arrange and get access to models and services
First, clone the repository that incorporates all of the code you’ll must follow along:
git clone git@github.com/brevdev/reachy-personal-assistant
cd reachy-personal-assistant
To access your intelligence layer, powered by the NVIDIA Nemotron models, you’ll be able to either deploy them using NVIDIA NIM or vLLM, or connect with them through distant endpoints available at construct.nvidia.com.
The next instructions assume you might be accessing the Nemotron models via endpoints. Create a .env file within the foremost directory together with your API keys. For local deployments, you don’t want to specify API keys and may skip this step.
NVIDIA_API_KEY=your_nvidia_api_key_here
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
Step 1: Construct a chat interface
Start by getting a basic LLM chat workflow running through NeMo Agent Toolkit’s API server. NeMo Agent Toolkit supports running workflows via `nat serve` and providing a config file. The config file passed here incorporates all of the essential setup information for the agent, which incorporates the models used for chat, image understanding, in addition to the router model utilized by the agent. The NeMo Agent Toolkit UI can connect over HTTP/WebSocket so you’ll be able to chat together with your workflow like a normal chat product. On this implementation, the NeMo Agent Toolkit server is launched on port 8001 (so your bot can call it, and the UI can too):
cd nat
uv venv
uv sync
uv run --env-file ../.env nat serve --config_file src/ces_tutorial/config.yml --port 8001
Next, confirm which you can send a plain text prompt through a separate terminal to make sure every little thing is setup accurately:
curl -s http://localhost:8001/v1/chat/completions
-H "Content-Type: application/json"
-d '{"model": "test", "messages": [{"role": "user", "content": "What is the capital of France?"}]}'
Reviewing the agent configuration, you’ll notice it defines way more capabilities than a straightforward chat completion. The subsequent steps will walk through those details.
Step 2: Add NeMo Agent Toolkit’s built-in ReAct agent for tool calling
Tool calling is a necessary a part of AI agents. NeMo Agent Toolkit features a built-in ReAct agent that may reason between tool calls and use multiple tools before answering. We route “motion requests” to a ReAct agent that’s allowed to call tools (for instance, tools that trigger robot behaviors or fetch current robot state).
Some practical notes to bear in mind:
- Keep tool schemas tight (clear name/description/args), because that’s what the agent uses to come to a decision what to call.
- Put a tough cap on steps (max_tool_calls) so the agent can’t spiral.
- If using a physical robot, consider a “confirm before actuation” pattern for physical actions to make sure movement safety.
Take a have a look at this portion of the config it defines the tools (like Wikipedia search) and specifies the ReAct agent pattern used to administer them.
functions:
wikipedia_search:
_type: wiki_search
max_results: 2
..
react_agent:
_type: react_agent
llm_name: agent_llm
verbose: true
parse_agent_response_max_retries: 3
tool_names: [wikipedia_search]
workflow:
_type: ces_tutorial_router_agent
agent: react_agent
Step 3: Add a router to direct queries to different models
The important thing idea: don’t use one model for every little thing. As an alternative, route based on intent:
- Text queries can use a quick text model
- Visual queries have to be run through a VLM
- Motion/tool requests are routed to the ReAct agent + tools
You possibly can implement routing a couple of ways (heuristics, a light-weight classifier, or a dedicated routing service). In case you want the “production” version of this concept, the NVIDIA LLM Router developer example is the total reference implementation and includes evaluation and monitoring patterns.
A basic routing policy might work like this:
- If the user is asking a matter about their environment, then send the request to a VLM together with a picture captured from the camera (or Reachy).
- If the user asks a matter requiring real time information, send the input to a ReACT agent to perform an internet search via a tool call.
- If the user is asking easy questions, send the request to a small and fast model optimized for chit chat.
These sections of the config define the routing topology and specify the router model.
functions:
..
router:
_type: router
route_config:
- name: other
description: Any query that requires careful thought, outside information, image understanding, or tool calling to take actions.
- name: chit_chat
description: Any easy chit chat, small talk, or casual conversation.
- name: image_understanding
description: A matter that requires the assistant to see the user eg a matter about their appearance, environment, scene or surroundings. Examples what am I holding, what am I wearing, what do I appear to be, what's in my surroundings, what does it say on the whiteboard. Questions on attire eg what color is my shirt/hat/jacket/etc
llm_name: routing_llm
llms:
..
routing_llm:
_type: nim
model_name: microsoft/phi-3-mini-128k-instruct
temperature: 0.0
NOTE: If you desire to reduce latency/cost or run offline, you’ll be able to self-host one in every of the routed models (typically the “fast text” model) and keep the VLM distant. One common approach is serving via NVIDIA NIM or vLLM and pointing NeMo Agent Toolkit to an OpenAI-compatible endpoint.
Step 4: Add a Pipecat bot for real-time voice + vision
Now we go real time. Pipecat is a framework designed for low-latency voice/multimodal agents: it orchestrates audio/video streams, AI services, and transports so you’ll be able to construct natural conversations. On this repo, the bot service is chargeable for:
- Capturing vision (robot camera)
- Speech recognition + text-to-speech
- Coordinating robot movement and expressive behaviors
You can find all of the pipecat bot code within the `reachy-personal-assistant/bot` folder.
Step 5: Hook every little thing as much as Reachy (hardware or simulation)
Reachy Mini exposes a daemon that the remainder of your system connects to. The repo runs the daemon in simulation by default (–sim). If you might have access to an actual Reachy you’ll be able to remove this flag and the identical code will control your robot.
Run the total system
You’ll need three terminals to run your entire system:
Terminal 1: Reachy daemon
cd bot
# macOS:
uv run mjpython -m reachy_mini.daemon.app.foremost --sim --no-localhost-only
# Linux:
uv run -m reachy_mini.daemon.app.foremost --sim --no-localhost-only
In case you are using the physical hardware, remember to omit the –sim flag from the command.
Terminal 2: Bot service
cd bot
uv venv
uv sync
uv run --env-file ../.env python foremost.py
Terminal 3: NeMo Agent Toolkit service
If the NeMo Agent Toolkit service will not be already running from Step 1, start it now in Terminal 3.
cd nat
uv venv
uv sync
uv run --env-file ../.env nat serve --config_file src/ces_tutorial/config.yml --port 8001
Once all of the terminals are arrange, there are two foremost windows to maintain track of:
-
Reachy Sim – This window appears robotically whenever you start the simulator daemon in Terminal 1. That is applicable in the event you’re running Reachy mini simulation rather than the physical device.
-
Pipecat Playground – That is the client-side UI where you’ll be able to connect with the agent, enable microphone and camera inputs, and look at live transcripts. In Terminal 2, open the URL exposed by the bot service: http://localhost:7860/. Click “CONNECT” in your browser. It might take a couple of seconds to initialize, and also you’ll be prompted to grant microphone (and optionally camera) access.
Once each windows are up and running:
- The Client and Agent STATUS indicators should show READY
- The bot will greet you with a welcome message “Hello, how may I assist you today?”
At this point, you’ll be able to start interacting together with your agent!
Try these example prompts
Listed below are a couple of easy prompts to enable you test your personal assistant. You possibly can start with these after which experiment by adding your individual to see how the agent responds!
Text-only prompts (routes to the fast text model)
- “Explain what you’ll be able to do in a single sentence.”
- “Summarize the very last thing I said.”
Vision prompts (routes to the VLM)
- “What am I holding as much as the camera?”
- “Read the text on this page and summarize it.”
Where to go next
As an alternative of a “black-box” assistant, this builds a foundation for a personal, hackable system where you’ll be able to control each the intelligence and the hardware. You possibly can inspect, extend, and run it locally, with full visibility into data flow, tool permissions, and the way the robot perceives and acts.
Depending in your goals, listed below are a couple of directions to explore next:
- Optimize for performance: Use the LLM Router developer example to balance cost, latency, and quality by intelligently directing queries between different models.
- Take a look at the tutorial for constructing a voice-powered RAG agent with guardrails using Nemotron open models.
- Master the hardware: Explore the Reachy Mini SDK and simulation docs to design and test advanced robotic behaviors before deploying to your physical system.
- Explore and contribute to the apps built by the community for Reachy.
Need to try it immediately? Deploy the total environment here. One click and also you’re running.

