Imagine that a robot helps you clean the dishes. You ask it to grab a soapy bowl out of the sink, but its gripper barely misses the mark.
Using a brand new framework developed by MIT and NVIDIA researchers, you can correct that robot’s behavior with easy interactions. The strategy would will let you point to the bowl or trace a trajectory to it on a screen, or just give the robot’s arm a nudge in the suitable direction.
Unlike other methods for correcting robot behavior, this method doesn’t require users to gather latest data and retrain the machine-learning model that powers the robot’s brain. It enables a robot to make use of intuitive, real-time human feedback to decide on a feasible motion sequence that gets as close as possible to satisfying the user’s intent.
When the researchers tested their framework, its success rate was 21 percent higher than an alternate method that didn’t leverage human interventions.
In the long term, this framework could enable a user to more easily guide a factory-trained robot to perform a wide range of household tasks regardless that the robot has never seen their home or the objects in it.
“We are able to’t expect laypeople to perform data collection and fine-tune a neural network model. The buyer will expect the robot to work right out of the box, and if it doesn’t, they’d want an intuitive mechanism to customize it. That’s the challenge we tackled on this work,” says Felix Yanwei Wang, an electrical engineering and computer science (EECS) graduate student and lead creator of a paper on this method.
His co-authors include Lirui Wang PhD ’24 and Yilun Du PhD ’24; senior creator Julie Shah, an MIT professor of aeronautics and astronautics and the director of the Interactive Robotics Group within the Computer Science and Artificial Intelligence Laboratory (CSAIL); in addition to Balakumar Sundaralingam, Xuning Yang, Yu-Wei Chao, Claudia Perez-D’Arpino PhD ’19, and Dieter Fox of NVIDIA. The research might be presented on the International Conference on Robots and Automation.
Mitigating misalignment
Recently, researchers have begun using pre-trained generative AI models to learn a “policy,” or a algorithm, that a robot follows to finish an motion. Generative models can solve multiple complex tasks.
During training, the model only sees feasible robot motions, so it learns to generate valid trajectories for the robot to follow.
While these trajectories are valid, that doesn’t mean they at all times align with a user’s intent in the true world. The robot may need been trained to grab boxes off a shelf without knocking them over, nevertheless it could fail to achieve the box on top of somebody’s bookshelf if the shelf is oriented in another way than those it saw in training.
To beat these failures, engineers typically collect data demonstrating the brand new task and re-train the generative model, a costly and time-consuming process that requires machine-learning expertise.
As an alternative, the MIT researchers desired to allow users to steer the robot’s behavior during deployment when it makes a mistake.
But when a human interacts with the robot to correct its behavior, that might inadvertently cause the generative model to decide on an invalid motion. It would reach the box the user wants, but knock books off the shelf in the method.
“We wish to permit the user to interact with the robot without introducing those sorts of mistakes, so we get a behavior that’s rather more aligned with user intent during deployment, but that can be valid and feasible,” Wang says.
Their framework accomplishes this by providing the user with three intuitive ways to correct the robot’s behavior, each of which offers certain benefits.
First, the user can point to the item they need the robot to govern in an interface that shows its camera view. Second, they’ll trace a trajectory in that interface, allowing them to specify how they need the robot to achieve the item. Third, they’ll physically move the robot’s arm within the direction they need it to follow.
“When you’re mapping a 2D image of the environment to actions in a 3D space, some information is lost. Physically nudging the robot is probably the most direct solution to specifying user intent without losing any of the data,” says Wang.
Sampling for achievement
To make sure these interactions don’t cause the robot to decide on an invalid motion, similar to colliding with other objects, the researchers use a selected sampling procedure. This system lets the model select an motion from the set of valid actions that almost all closely aligns with the user’s goal.
“Reasonably than simply imposing the user’s will, we give the robot an idea of what the user intends but let the sampling procedure oscillate around its own set of learned behaviors,” Wang explains.
This sampling method enabled the researchers’ framework to outperform the opposite methods they compared it to during simulations and experiments with an actual robot arm in a toy kitchen.
While their method may not at all times complete the duty straight away, it offers users the advantage of having the ability to immediately correct the robot in the event that they see it doing something incorrect, moderately than waiting for it to complete after which giving it latest instructions.
Furthermore, after a user nudges the robot just a few times until it picks up the proper bowl, it could log that corrective motion and incorporate it into its behavior through future training. Then, the following day, the robot could pick up the proper bowl while not having a nudge.
“But the important thing to that continuous improvement is having a way for the user to interact with the robot, which is what we have now shown here,” Wang says.
In the long run, the researchers wish to boost the speed of the sampling procedure while maintaining or improving its performance. In addition they wish to experiment with robot policy generation in novel environments.