Can robots learn from machine dreams?

For roboticists, one challenge towers above all others: generalization — the power to create machines that may adapt to any environment or condition. Because the Seventies, the sphere has evolved from writing sophisticated programs to using deep learning, teaching robots to learn directly from human behavior. But a critical bottleneck stays: data quality. To enhance, robots need to come across scenarios that push the boundaries of their capabilities, operating at the sting of their mastery. This process traditionally requires human oversight, with operators fastidiously difficult robots to expand their abilities. As robots turn into more sophisticated, this hands-on approach hits a scaling problem: the demand for high-quality training data far outpaces humans’ ability to offer it.

Now, a team of MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) researchers has developed a novel approach to robot training that would significantly speed up the deployment of adaptable, intelligent machines in real-world environments. The brand new system, called “LucidSim,” uses recent advances in generative AI and physics simulators to create diverse and realistic virtual training environments, helping robots achieve expert-level performance in difficult tasks with none real-world data.

LucidSim combines physics simulation with generative AI models, addressing one of the crucial persistent challenges in robotics: transferring skills learned in simulation to the true world. “A fundamental challenge in robot learning has long been the ‘sim-to-real gap’ — the disparity between simulated training environments and the complex, unpredictable real world,” says MIT CSAIL postdoc Ge Yang, a lead researcher on LucidSim. “Previous approaches often relied on depth sensors, which simplified the issue but missed crucial real-world complexities.”

The multipronged system is a mix of various technologies. At its core, LucidSim uses large language models to generate various structured descriptions of environments. These descriptions are then transformed into images using generative models. To make sure that these images reflect real-world physics, an underlying physics simulator is used to guide the generation process.

The birth of an idea: From burritos to breakthroughs

The inspiration for LucidSim got here from an unexpected place: a conversation outside Beantown Taqueria in Cambridge, Massachusetts. “We desired to teach vision-equipped robots improve using human feedback. But then, we realized we didn’t have a pure vision-based policy to start with,” says Alan Yu, an undergraduate student in electrical engineering and computer science (EECS) at MIT and co-lead writer on LucidSim. “We kept talking about it as we walked down the road, after which we stopped outside the taqueria for about half-an-hour. That’s where we had our moment.”

To cook up their data, the team generated realistic images by extracting depth maps, which offer geometric information, and semantic masks, which label different parts of a picture, from the simulated scene. They quickly realized, nevertheless, that with tight control on the composition of the image content, the model would produce similar images that weren’t different from one another using the identical prompt. So, they devised a technique to source diverse text prompts from ChatGPT.

This approach, nevertheless, only resulted in a single image. To make short, coherent videos that function little “experiences” for the robot, the scientists hacked together some image magic into one other novel technique the team created, called “Dreams In Motion.” The system computes the movements of every pixel between frames, to warp a single generated image right into a short, multi-frame video. Dreams In Motion does this by considering the 3D geometry of the scene and the relative changes within the robot’s perspective.

“We outperform domain randomization, a technique developed in 2017 that applies random colours and patterns to things within the environment, which remains to be considered the go-to method as of late,” says Yu. “While this system generates diverse data, it lacks realism. LucidSim addresses each diversity and realism problems. It’s exciting that even without seeing the true world during training, the robot can recognize and navigate obstacles in real environments.”

The team is especially excited in regards to the potential of applying LucidSim to domains outside quadruped locomotion and parkour, their major test bed. One example is mobile manipulation, where a mobile robot is tasked to handle objects in an open area; also, color perception is critical. “Today, these robots still learn from real-world demonstrations,” says Yang. “Although collecting demonstrations is straightforward, scaling a real-world robot teleoperation setup to hundreds of skills is difficult because a human has to physically arrange each scene. We hope to make this easier, thus qualitatively more scalable, by moving data collection right into a virtual environment.”

Who’s the true expert?

The team put LucidSim to the test against an alternate, where an authority teacher demonstrates the skill for the robot to learn from. The outcomes were surprising: Robots trained by the expert struggled, succeeding only 15 percent of the time — and even quadrupling the quantity of expert training data barely moved the needle. But when robots collected their very own training data through LucidSim, the story modified dramatically. Just doubling the dataset size catapulted success rates to 88 percent. “And giving our robot more data monotonically improves its performance — eventually, the coed becomes the expert,” says Yang.

“One among the major challenges in sim-to-real transfer for robotics is achieving visual realism in simulated environments,” says Stanford University assistant professor of electrical engineering Shuran Song, who wasn’t involved within the research. “The LucidSim framework provides a sublime solution through the use of generative models to create diverse, highly realistic visual data for any simulation. This work could significantly speed up the deployment of robots trained in virtual environments to real-world tasks.”

From the streets of Cambridge to the innovative of robotics research, LucidSim is paving the best way toward a brand new generation of intelligent, adaptable machines — ones that learn to navigate our complex world without ever setting foot in it.

Yu and Yang wrote the paper with 4 fellow CSAIL affiliates: Ran Choi, an MIT postdoc in mechanical engineering; Yajvan Ravan, an MIT undergraduate in EECS; John Leonard, the Samuel C. Collins Professor of Mechanical and Ocean Engineering within the MIT Department of Mechanical Engineering; and Phillip Isola, an MIT associate professor in EECS. Their work was supported, partly, by a Packard Fellowship, a Sloan Research Fellowship, the Office of Naval Research, Singapore’s Defence Science and Technology Agency, Amazon, MIT Lincoln Laboratory, and the National Science Foundation Institute for Artificial Intelligence and Fundamental Interactions. The researchers presented their work on the Conference on Robot Learning (CoRL) in early November.

Can robots learn from machine dreams?

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

How social media encourages the worst of AI boosterism

Hugging Face + PyCharm

The Machine Learning “Advent Calendar” Day 20: Gradient Boosted Linear Regression in Excel

Share your open ML datasets on Hugging Face Hub!

The Machine Learning “Advent Calendar” Day 21: Gradient Boosted Decision Tree Regressor in Excel

Can robots learn from machine dreams?

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.