Train Small Orchestration Agents to Solve Big Problems

Using the fitting tool and model for a task is a difficult and ever-present engineering problem in agent design. At NVIDIA Research, we’re making fast progress toward automating it away with an approach that trains and uses a separate model, which we call an “orchestrator”, to act as a supervisor over the entire other models and tools.

The orchestrator’s job is to contemplate the duty within the context of user preferences (do they need the result fast, low cost, with the best level of accuracy possible, or some combination of those?) after which manage other models and call on tools within the task-solving conversation to succeed in the goal. Crucially, because it seems, small models are already powerful enough to handle this burden if tuned appropriately.

While it could be surprising to employ large models subordinate to small models, the arrangement plays to their benefits. Small models are unburdened by excessive knowledge and trained to capture the essence of problem-solving resulting from their limited size.

To construct orchestrators, we introduce ToolOrchestra, our flagship method, which involves data preparation, synthetic data generation, multi-objective reinforcement-learning training, and comprehensive evaluation of orchestration methods and models.

Diagram showing how an AI orchestrator coordinates tools and models to answer a user’s query efficiently. The Orchestrator uses multi-turn reasoning and calls basic tools, specialized LLMs, and generalist LLMs, optimizing for outcome, efficiency, and cost preference through reinforcement learning. — *Figure 1. Overview of the orchestrator: when given a task, it alternates between reasoning and gear calling in multiple turns to resolve it*

Tools	Model(s)	HLE (↑)	FRAMES (↑)	τ²-Bench (↑)	Cost (↓)	Latency (↓)
Existing reported SOTA	GPT-5	35.2	–	84.2‡	–	–
	o3	24.3	–	68.4	–	–
	GPT-4o	5.3	–	43.8	–	–
No tool	Qwen3-8B	3.2	24.2	–*	0.2	0.6
	Llama-Nemotron-49B	3.6	25.6	–*	0.4	1.1
	Llama-3.3-70B	3.8	32.4	–*	0.5	1.4
	Qwen3-235B-A22B	5.2	34.3	–*	2.6	3.3
	Claude Opus 4.1	11.7	58.2	–*	27.4	8.2
	GPT-5	23.4	66.3	–*	6.2	4.1
Basic tools	Qwen3-8B	4.7	26.5	40.7	1.3	2.2
	Llama-Nemotron-49B	6.8	28.2	23.2	2.5	3.5
	Llama-3.3-70B	4.6	42.3	17.6	2.8	4.3
	Qwen3-235B-A22B	14.0	39.5	52.9	12.3	10.2
	Claude Opus 4.1	19.8	63.5	46.0	76.2	32.5
	GPT-5	35.1	74.0	77.7	30.2	19.8
Basic tools, specialized LLMs, generalist LLMs	Qwen3-8B	30.6	68.9	72.3	27.6	18.3
	Llama-Nemotron-49B	25.8	57.9	66.7	25.6	17.1
	Llama-3.3-70B	19.7	52.4	55.8	19.7	13.4
	Qwen3-235B-A22B	32.8	74.2	75.6	29.7	21.2
	Claude Opus 4.1	34.6	72.8	76.8	52.5	25.6
	GPT-5	21.2	57.5	62.3	17.8	13.6
	Orchestrator-8B	37.1	76.3	80.2	9.2	8.2

Train Small Orchestration Agents to Solve Big Problems

Why train an orchestrator?

What are the outcomes of using an orchestrator?

Find out how to train an orchestrator?

Step 1: Select the underlying model

Step 2: Prepare and generate data

Step 3: Start training

Step 4: Visualize your progress

The advantages of orchestration

Looking ahead: The rise of compound AI systems

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Working to advance the nuclear renaissance

DenseNet Paper Walkthrough: All Connected

I Replaced Vector DBs with Google’s Memory Agent Pattern for my notes in Obsidian

AI just made the billion-dollar solo founder real

Bringing AI Closer to the Edge and On-Device with Gemma 4

Train Small Orchestration Agents to Solve Big Problems

Why train an orchestrator?

What are the outcomes of using an orchestrator?

Find out how to train an orchestrator?

Step 1: Select the underlying model

Step 2: Prepare and generate data

Step 3: Start training

Step 4: Visualize your progress

The advantages of orchestration

Looking ahead: The rise of compound AI systems

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.