Ensuring AI works with the precise dose of curiosity


It’s a dilemma as old as time. Friday night has rolled around, and also you’re trying to select a restaurant for dinner. Must you visit your most beloved watering hole or try a latest establishment, within the hopes of discovering something superior? Potentially, but that curiosity comes with a risk: Should you explore the brand new option, the food could possibly be worse. On the flip side, if you happen to persist with what works well, you will not grow out of your narrow pathway. 

Curiosity drives artificial intelligence to explore the world, now in boundless use cases — autonomous navigation, robotic decision-making, optimizing health outcomes, and more. Machines, in some cases, use “reinforcement learning” to perform a goal, where an AI agent iteratively learns from being rewarded for good behavior and punished for bad. Similar to the dilemma faced by humans in choosing a restaurant, these agents also struggle with balancing the time spent discovering higher actions (exploration) and the time spent taking actions that led to high rewards prior to now (exploitation). An excessive amount of curiosity can distract the agent from making good decisions, while too little means the agent won’t ever discover good decisions.

Within the pursuit of creating AI agents with just the precise dose of curiosity, researchers from MIT’s Improbable AI Laboratory and Computer Science and Artificial Intelligence Laboratory (CSAIL) created an algorithm that overcomes the issue of AI being too “curious” and getting distracted by a given task. Their algorithm routinely increases curiosity when it’s needed, and suppresses it if the agent gets enough supervision from the environment to know what to do.

When tested on over 60 video games, the algorithm was in a position to succeed at each hard and simple exploration tasks, where previous algorithms have only been in a position to tackle only a tough or easy domain alone. With this method, AI agents use fewer data for learning decision-making rules that maximize incentives.  

“Should you master the exploration-exploitation trade-off well, you possibly can learn the precise decision-making rules faster — and anything less would require numerous data, which could mean suboptimal medical treatments, lesser profits for web sites, and robots that do not learn to do the precise thing,” says Pulkit Agrawal, an assistant professor of electrical engineering and computer science (EECS) at MIT, director of the Improbable AI Lab, and CSAIL affiliate who supervised the research. “Imagine a web site attempting to determine the design or layout of its content that may maximize sales. If one doesn’t perform exploration-exploitation well, converging to the precise web site design or the precise website layout will take an extended time, which implies profit loss. Or in a health care setting, like with Covid-19, there could also be a sequence of choices that must be made to treat a patient, and if you would like to use decision-making algorithms, they should learn quickly and efficiently — you don’t need a suboptimal solution when treating a lot of patients. We hope that this work will apply to real-world problems of that nature.” 

It’s hard to encompass the nuances of curiosity’s psychological underpinnings; the underlying neural correlates of challenge-seeking behavior are a poorly understood phenomenon. Attempts to categorize the behavior have spanned studies that dived deeply into studying our impulses, deprivation sensitivities, and social and stress tolerances. 

With reinforcement learning, this process is “pruned” emotionally and stripped all the way down to the bare bones, but it surely’s complicated on the technical side. Essentially, the agent should only be curious when there’s not enough supervision available to check out various things, and if there may be supervision, it must adjust curiosity and lower it. 

Since a big subset of gaming is little agents running around fantastical environments in search of rewards and performing an extended sequence of actions to attain some goal, it gave the look of the logical test bed for the researchers’ algorithm. In experiments, researchers divided games like “Mario Kart” and “Montezuma’s Revenge” into two different buckets: one where supervision was sparse, meaning the agent had less guidance, which were considered “hard” exploration games, and a second where supervision was more dense, or the “easy” exploration games. 

Suppose in “Mario Kart,” for instance, you simply remove all rewards so that you don’t know when an enemy eliminates you. You’re not given any reward once you collect a coin or hop over pipes. The agent is just told ultimately how well it did. This could be a case of sparse supervision. Algorithms that incentivize curiosity do rather well on this scenario. 

But now, suppose the agent is provided dense supervision — a reward for jumping over pipes, collecting coins, and eliminating enemies. Here, an algorithm without curiosity performs rather well since it gets rewarded often. But if you happen to as a substitute take the algorithm that also uses curiosity, it learns slowly. It is because the curious agent might try to run fast in alternative ways, dance around, go to each a part of the sport screen — things which are interesting, but don’t help the agent succeed at the sport. The team’s algorithm, nevertheless, consistently performed well, no matter what environment it was in. 

Future work might involve circling back to the exploration that’s delighted and plagued psychologists for years: an appropriate metric for curiosity — nobody really knows the precise solution to mathematically define curiosity. 

“Getting consistent good performance on a novel problem is amazingly difficult — so by improving exploration algorithms, we are able to save your effort on tuning an algorithm to your problems of interest, says Zhang-Wei Hong, an EECS PhD student, CSAIL affiliate, and co-lead writer together with Eric Chen ’20, MEng ’21 on a latest paper concerning the work. “We want curiosity to unravel extremely difficult problems, but on some problems it could actually hurt performance. We propose an algorithm that removes the burden of tuning the balance of exploration and exploitation. Previously what took, as an example, every week to successfully solve the issue, with this latest algorithm, we are able to get satisfactory ends in just a few hours.”

“One in all the best challenges for current AI and cognitive science is the right way to balance exploration and exploitation — the seek for information versus the seek for reward. Children do that seamlessly, but it surely is difficult computationally,” notes Alison Gopnik, professor of psychology and affiliate professor of philosophy on the University of California at Berkeley, who was not involved with the project. “This paper uses impressive latest techniques to perform this routinely, designing an agent that may systematically balance curiosity concerning the world and the need for reward, [thus taking] one other step towards making AI agents (almost) as smart as children.”

“Intrinsic rewards like curiosity are fundamental to guiding agents to find useful diverse behaviors, but this shouldn’t come at the associated fee of doing well on the given task. That is a crucial problem in AI, and the paper provides a solution to balance that trade-off,” adds Deepak Pathak, an assistant professor at Carnegie Mellon University, who was also not involved within the work. “It will be interesting to see how such methods scale beyond games to real-world robotic agents.”

Chen, Hong, and Agrawal wrote the paper alongside Joni Pajarinen, assistant professor at Aalto University and research leader on the Intelligent Autonomous Systems Group at TU Darmstadt. The research was supported, partially, by the MIT-IBM Watson AI Lab, DARPA Machine Common Sense Program, the Army Research Office by the USA Air Force Research Laboratory, and the USA Air Force Artificial Intelligence Accelerator. The paper can be presented at Neural Information and Processing Systems (NeurIPS) 2022.


What are your thoughts on this topic?
Let us know in the comments below.


0 0 votes
Article Rating
1 Comment
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

Would love your thoughts, please comment.x