A simpler approach to train machines for uncertain, real-world situations

Artificial Intelligence

A simpler approach to train machines for uncertain, real-world situations

admin

June 2, 2023

A simpler approach to train machines for uncertain, real-world situations

Someone learning to play tennis might hire a teacher to assist them learn faster. Because this teacher is (hopefully) a terrific tennis player, there are occasions when trying to precisely mimic the teacher won’t help the scholar learn. Perhaps the teacher leaps high into the air to deftly return a volley. The scholar, unable to repeat that, might as an alternative try a couple of other moves on her own until she has mastered the talents she must return volleys.

Computer scientists may also use “teacher” systems to coach one other machine to finish a task. But identical to with human learning, the scholar machine faces a dilemma of knowing when to follow the teacher and when to explore by itself. To this end, researchers from MIT and Technion, the Israel Institute of Technology, have developed an algorithm that routinely and independently determines when the scholar should mimic the teacher (often known as imitation learning) and when it should as an alternative learn through trial and error (often known as reinforcement learning).

Their dynamic approach allows the scholar to diverge from copying the teacher when the teacher is either too good or not ok, but then return to following the teacher at a later point within the training process if doing so would achieve higher results and faster learning.

When the researchers tested this approach in simulations, they found that their combination of trial-and-error learning and imitation learning enabled students to learn tasks more effectively than methods that used just one variety of learning.

This method could help researchers improve the training process for machines that will probably be deployed in uncertain real-world situations, like a robot being trained to navigate inside a constructing it has never seen before.

“This mix of learning by trial-and-error and following a teacher could be very powerful. It gives our algorithm the power to unravel very difficult tasks that can not be solved through the use of either technique individually,” says Idan Shenfeld an electrical engineering and computer science (EECS) graduate student and lead writer of a paper on this method.

Shenfeld wrote the paper with coauthors Zhang-Wei Hong, an EECS graduate student; Aviv Tamar; assistant professor of electrical engineering and computer science at Technion; and senior writer Pulkit Agrawal, director of Improbable AI Lab and an assistant professor within the Computer Science and Artificial Intelligence Laboratory. The research will probably be presented on the International Conference on Machine Learning.

Striking a balance

Many existing methods that seek to strike a balance between imitation learning and reinforcement learning accomplish that through brute force trial-and-error. Researchers pick a weighted combination of the 2 learning methods, run your entire training procedure, after which repeat the method until they find the optimal balance. That is inefficient and sometimes so computationally expensive it isn’t even feasible.

“We wish algorithms which can be principled, involve tuning of as few knobs as possible, and achieve high performance — these principles have driven our research,” says Agrawal.

To attain this, the team approached the issue otherwise than prior work. Their solution involves training two students: one with a weighted combination of reinforcement learning and imitation learning, and a second that may only use reinforcement learning to learn the identical task.

The fundamental idea is to routinely and dynamically adjust the weighting of the reinforcement and imitation learning objectives of the primary student. Here is where the second student comes into play. The researchers’ algorithm continually compares the 2 students. If the one using the teacher is doing higher, the algorithm puts more weight on imitation learning to coach the scholar, but when the one using only trial and error is beginning to recover results, it’ll focus more on learning from reinforcement learning.

By dynamically determining which method achieves higher results, the algorithm is adaptive and may pick the very best technique throughout the training process. Due to this innovation, it’s in a position to more effectively teach students than other methods that aren’t adaptive, Shenfeld says.

“Certainly one of the fundamental challenges in developing this algorithm was that it took us a while to understand that we must always not train the 2 students independently. It became clear that we wanted to attach the agents to make them share information, after which find the precise approach to technically ground this intuition,” Shenfeld says.

Solving tough problems

To check their approach, the researchers arrange many simulated teacher-student training experiments, similar to navigating through a maze of lava to achieve the opposite corner of a grid. On this case, the teacher has a map of your entire grid while the scholar can only see a patch in front of it. Their algorithm achieved an almost perfect success rate across all testing environments, and was much faster than other methods.

To present their algorithm an excellent harder test, they arrange a simulation involving a robotic hand with touch sensors but no vision, that must reorient a pen to the right pose. The teacher had access to the actual orientation of the pen, while the scholar could only use touch sensors to find out the pen’s orientation.

Their method outperformed others that used either only imitation learning or only reinforcement learning.

Reorienting objects is one amongst many manipulation tasks that a future home robot would want to perform, a vision that the Improbable AI lab is working toward, Agrawal adds.

Teacher-student learning has successfully been applied to coach robots to perform complex object manipulation and locomotion in simulation after which transfer the learned skills into the real-world. In these methods, the teacher has privileged information accessible from the simulation that the scholar won’t have when it’s deployed in the actual world. For instance, the teacher will know the detailed map of a constructing that the scholar robot is being trained to navigate using only images captured by its camera.

“Current methods for student-teacher learning in robotics don’t account for the shortcoming of the scholar to mimic the teacher and thus are performance-limited. The brand new method paves a path for constructing superior robots,” says Agrawal.

Aside from higher robots, the researchers imagine their algorithm has the potential to enhance performance in diverse applications where imitation or reinforcement learning is getting used. For instance, large language models similar to GPT-4 are excellent at accomplishing a big selection of tasks, so perhaps one could use the big model as a teacher to coach a smaller, student model to be even “higher” at one particular task. One other exciting direction is to research the similarities and differences between machines and humans learning from their respective teachers. Such evaluation might help improve the educational experience, the researchers say.

“What’s interesting about this approach in comparison with related methods is how robust it seems to numerous parameter decisions, and the variability of domains it shows promising leads to,” says Abhishek Gupta, an assistant professor on the University of Washington, who was not involved with this work. “While the present set of results are largely in simulation, I’m very excited concerning the future possibilities of applying this work to problems involving memory and reasoning with different modalities similar to tactile sensing.”

“This work presents an interesting approach to reuse prior computational work in reinforcement learning. Particularly, their proposed method can leverage suboptimal teacher policies as a guide while avoiding careful hyperparameter schedules required by prior methods for balancing the objectives of mimicking the teacher versus optimizing the duty reward,” adds Rishabh Agarwal, a senior research scientist at Google Brain, who was also not involved on this research. “Hopefully, this work would make reincarnating reinforcement learning with learned policies less cumbersome.”

This research was supported, partly, by the MIT-IBM Watson AI Lab, Hyundai Motor Company, the DARPA Machine Common Sense Program, and the Office of Naval Research.

LEAVE A REPLY Cancel reply