AI-generated images can teach robots how you can act

The system could make it easier to coach various kinds of robots to finish tasks—machines starting from mechanical arms to humanoid robots and driverless cars. It could also help make AI web agents, a next generation of AI tools that may perform complex tasks with little supervision, higher at scrolling and clicking, says Mohit Shridhar, a research scientist specializing in robotic manipulation, who worked on the project.

“You should utilize image-generation systems to do just about all the things which you can do in robotics,” he says. “We desired to see if we could take all these amazing things which can be happening in diffusion and use them for robotics problems.”

To show a robot to finish a task, researchers normally train a neural network on a picture of what’s in front of the robot. The network then spits out an output in a unique format—the coordinates required to maneuver forward, for instance.

Genima’s approach is different because each its input and output are images, which is less complicated for the machines to learn from, says Ivan Kapelyukh, a PhD student at Imperial College London, who focuses on robot learning but wasn’t involved on this research.

“It’s also really great for users, because you possibly can see where your robot will move and what it’s going to do. It makes it form of more interpretable, and signifies that when you’re actually going to deploy this, you possibly can see before your robot went through a wall or something,” he says.

Genima works by tapping into Stable Diffusion’s ability to acknowledge patterns (knowing what a mug looks like since it’s been trained on images of mugs, for instance) after which turning the model right into a form of agent—a decision-making system.

MOHIT SHRIDHAR, YAT LONG (RICHIE) LO, STEPHEN JAMES ROBOT LEARNING LAB

First, the researchers fine-tuned stable Diffusion to allow them to overlay data from robot sensors onto images captured by its cameras.

The system renders the specified motion, like opening a box, hanging up a shawl, or picking up a notebook, right into a series of coloured spheres on top of the image. These spheres tell the robot where its joint should move one second in the long run.

The second a part of the method converts these spheres into actions. The team achieved this through the use of one other neural network, called ACT, which is mapped on the identical data. Then they used Genima to finish 25 simulations and nine real-world manipulation tasks using a robot arm. The common success rate was 50% and 64%, respectively.

AI-generated images can teach robots how you can act

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Dispatch: Partying at certainly one of Africa’s largest AI gatherings

OpenAI enters browser war with Atlas

Scaling Recommender Transformers to a Billion Parameters

Creating AI that matters

Is RAG Dead? The Rise of Context Engineering and Semantic Layers for Agentic AI

AI-generated images can teach robots how you can act

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.