Let’s say you must train a robot so it understands the way to use tools and might then quickly learn to make repairs around your own home with a hammer, wrench, and screwdriver. To do this, you would want an unlimited amount of information demonstrating tool use.
Existing robotic datasets vary widely in modality — some include color images while others are composed of tactile imprints, as an illustration. Data may be collected in numerous domains, like simulation or human demos. And every dataset may capture a novel task and environment.
It’s difficult to efficiently incorporate data from so many sources in a single machine-learning model, so many methods use only one variety of data to coach a robot. But robots trained this manner, with a comparatively small amount of task-specific data, are sometimes unable to perform recent tasks in unfamiliar environments.
In an effort to coach higher multipurpose robots, MIT researchers developed a way to mix multiple sources of information across domains, modalities, and tasks using a variety of generative AI often known as diffusion models.
They train a separate diffusion model to learn a method, or policy, for completing one task using one specific dataset. Then they mix the policies learned by the diffusion models right into a general policy that permits a robot to perform multiple tasks in various settings.
In simulations and real-world experiments, this training approach enabled a robot to perform multiple tool-use tasks and adapt to recent tasks it didn’t see during training. The tactic, often known as Policy Composition (PoCo), led to a 20 percent improvement in task performance compared to baseline techniques.
“Addressing heterogeneity in robotic datasets is sort of a chicken-egg problem. If we would like to make use of numerous data to coach general robot policies, then we first need deployable robots to get all this data. I believe that leveraging all of the heterogeneous data available, much like what researchers have done with ChatGPT, is a vital step for the robotics field,” says Lirui Wang, an electrical engineering and computer science (EECS) graduate student and lead creator of a paper on PoCo.
Wang’s coauthors include Jialiang Zhao, a mechanical engineering graduate student; Yilun Du, an EECS graduate student; Edward Adelson, the John and Dorothy Wilson Professor of Vision Science within the Department of Brain and Cognitive Sciences and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); and senior creator Russ Tedrake, the Toyota Professor of EECS, Aeronautics and Astronautics, and Mechanical Engineering, and a member of CSAIL. The research will probably be presented on the Robotics: Science and Systems Conference.
Combining disparate datasets
A robotic policy is a machine-learning model that takes inputs and uses them to perform an motion. One technique to take into consideration a policy is as a method. Within the case of a robotic arm, that strategy may be a trajectory, or a series of poses that move the arm so it picks up a hammer and uses it to pound a nail.
Datasets used to learn robotic policies are typically small and focused on one particular task and environment, like packing items into boxes in a warehouse.
“Each robotic warehouse is generating terabytes of information, nevertheless it only belongs to that specific robot installation working on those packages. It just isn’t ideal if you must use all of those data to coach a general machine,” Wang says.
The MIT researchers developed a way that may take a series of smaller datasets, like those gathered from many robotic warehouses, learn separate policies from each, and mix the policies in a way that permits a robot to generalize to many tasks.
They represent each policy using a variety of generative AI model often known as a diffusion model. Diffusion models, often used for image generation, learn to create recent data samples that resemble samples in a training dataset by iteratively refining their output.
But somewhat than teaching a diffusion model to generate images, the researchers teach it to generate a trajectory for a robot. They do that by adding noise to the trajectories in a training dataset. The diffusion model steadily removes the noise and refines its output right into a trajectory.
This method, often known as Diffusion Policy, was previously introduced by researchers at MIT, Columbia University, and the Toyota Research Institute. PoCo builds off this Diffusion Policy work.
The team trains each diffusion model with a special variety of dataset, similar to one with human video demonstrations and one other gleaned from teleoperation of a robotic arm.
Then the researchers perform a weighted combination of the person policies learned by all of the diffusion models, iteratively refining the output so the combined policy satisfies the objectives of every individual policy.
Greater than the sum of its parts
“One among the advantages of this approach is that we are able to mix policies to get the most effective of each worlds. As an illustration, a policy trained on real-world data might find a way to realize more dexterity, while a policy trained on simulation might find a way to realize more generalization,” Wang says.
Image: Courtesy of the researchers
Since the policies are trained individually, one could mix and match diffusion policies to realize higher results for a certain task. A user could also add data in a brand new modality or domain by training an extra Diffusion Policy with that dataset, somewhat than starting your entire process from scratch.

Image: Courtesy of the researchers
The researchers tested PoCo in simulation and on real robotic arms that performed quite a lot of tools tasks, similar to using a hammer to pound a nail and flipping an object with a spatula. PoCo led to a 20 percent improvement in task performance in comparison with baseline methods.
“The striking thing was that after we finished tuning and visualized it, we are able to clearly see that the composed trajectory looks significantly better than either certainly one of them individually,” Wang says.
In the longer term, the researchers need to apply this system to long-horizon tasks where a robot would pick up one tool, use it, then switch to a different tool. Additionally they want to include larger robotics datasets to enhance performance.
“We are going to need all three kinds of information to succeed for robotics: web data, simulation data, and real robot data. Methods to mix them effectively will probably be the million-dollar query. PoCo is a solid step on the appropriate track,” says Jim Fan, senior research scientist at NVIDIA and leader of the AI Agents Initiative, who was not involved with this work.
This research is funded, partly, by Amazon, the Singapore Defense Science and Technology Agency, the U.S. National Science Foundation, and the Toyota Research Institute.