A faster method to teach a robot

-

Imagine purchasing a robot to perform household tasks. This robot was built and trained in a factory on a certain set of tasks and has never seen the items in your house. If you ask it to select up a mug out of your kitchen table, it won’t recognize your mug (perhaps because this mug is painted with an unusual image, say, of MIT’s mascot, Tim the Beaver). So, the robot fails.

“At once, the best way we train these robots, after they fail, we don’t really know why. So you’ll just throw up your hands and say, ‘OK, I suppose we’ve to begin over.’ A critical component that’s missing from this method is enabling the robot to reveal why it’s failing so the user can provide it feedback,” says Andi Peng, an electrical engineering and computer science (EECS) graduate student at MIT.

Peng and her collaborators at MIT, Recent York University, and the University of California at Berkeley created a framework that permits humans to quickly teach a robot what they need it to do, with a minimal amount of effort.

When a robot fails, the system uses an algorithm to generate counterfactual explanations that describe what needed to vary for the robot to succeed. For example, perhaps the robot would have been capable of pick up the mug if the mug were a certain color. It shows these counterfactuals to the human and asks for feedback on why the robot failed. Then the system utilizes this feedback and the counterfactual explanations to generate latest data it uses to fine-tune the robot.

Superb-tuning involves tweaking a machine-learning model that has already been trained to perform one task, so it might perform a second, similar task.

The researchers tested this system in simulations and located that it could teach a robot more efficiently than other methods. The robots trained with this framework performed higher, while the training process consumed less of a human’s time.

This framework could help robots learn faster in latest environments without requiring a user to have technical knowledge. In the long term, this may very well be a step toward enabling general-purpose robots to efficiently perform every day tasks for the elderly or individuals with disabilities in quite a lot of settings.

Peng, the lead writer, is joined by co-authors Aviv Netanyahu, an EECS graduate student; Mark Ho, an assistant professor on the Stevens Institute of Technology; Tianmin Shu, an MIT postdoc; Andreea Bobu, a graduate student at UC Berkeley; and senior authors Julie Shah, an MIT professor of aeronautics and astronautics and the director of the Interactive Robotics Group within the Computer Science and Artificial Intelligence Laboratory (CSAIL), and Pulkit Agrawal, a professor in CSAIL. The research shall be presented on the International Conference on Machine Learning.

On-the-job training

Robots often fail on account of distribution shift — the robot is presented with objects and spaces it didn’t see during training, and it doesn’t understand what to do on this latest environment.

One method to retrain a robot for a particular task is imitation learning. The user could reveal the proper task to show the robot what to do. If a user tries to show a robot to select up a mug, but demonstrates with a white mug, the robot could learn that every one mugs are white. It might then fail to select up a red, blue, or “Tim-the-Beaver-brown” mug.

Training a robot to acknowledge that a mug is a mug, no matter its color, could take hundreds of demonstrations.

“I don’t need to need to reveal with 30,000 mugs. I need to reveal with only one mug. But then I want to show the robot so it recognizes that it might pick up a mug of any color,” Peng says.

To perform this, the researchers’ system determines what specific object the user cares about (a mug) and what elements aren’t vital for the duty (perhaps the colour of the mug doesn’t matter). It uses this information to generate latest, synthetic data by changing these “unimportant” visual concepts. This process is often known as data augmentation.

The framework has three steps. First, it shows the duty that caused the robot to fail. Then it collects an illustration from the user of the specified actions and generates counterfactuals by searching over all features within the space that show what needed to vary for the robot to succeed.

The system shows these counterfactuals to the user and asks for feedback to find out which visual concepts don’t impact the specified motion. Then it uses this human feedback to generate many latest augmented demonstrations.

In this fashion, the user could reveal picking up one mug, however the system would produce demonstrations showing the specified motion with hundreds of various mugs by altering the colour. It uses these data to fine-tune the robot.

Creating counterfactual explanations and soliciting feedback from the user are critical for the technique to succeed, Peng says.

From human reasoning to robot reasoning

Because their work seeks to place the human within the training loop, the researchers tested their technique with human users. They first conducted a study during which they asked people if counterfactual explanations helped them discover elements that may very well be modified without affecting the duty.

“It was so clear right off the bat. Humans are so good at the sort of counterfactual reasoning. And this counterfactual step is what allows human reasoning to be translated into robot reasoning in a way that is sensible,” she says.

Then they applied their framework to 3 simulations where robots were tasked with: navigating to a goal object, picking up a key and unlocking a door, and picking up a desired object then placing it on a tabletop. In each instance, their method enabled the robot to learn faster than with other techniques, while requiring fewer demonstrations from users.

Moving forward, the researchers hope to check this framework on real robots. Additionally they need to give attention to reducing the time it takes the system to create latest data using generative machine-learning models.

“We wish robots to do what humans do, and we wish them to do it in a semantically meaningful way. Humans are inclined to operate on this abstract space, where they don’t take into consideration each property in a picture. At the top of the day, this is de facto about enabling a robot to learn , human-like representation at an abstract level,” Peng says.

This research is supported, partly, by a National Science Foundation Graduate Research Fellowship, Open Philanthropy, an Apple AI/ML Fellowship, Hyundai Motor Corporation, the MIT-IBM Watson AI Lab, and the National Science Foundation Institute for Artificial Intelligence and Fundamental Interactions.

ASK DUKE

What are your thoughts on this topic?
Let us know in the comments below.

2 COMMENTS

0 0 votes
Article Rating
guest
2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

2
0
Would love your thoughts, please comment.x
()
x