Fields starting from robotics to medicine to political science are trying to coach AI systems to make meaningful decisions of all types. For instance, using an AI system to intelligently control traffic in a congested city could help motorists reach their destinations faster, while improving safety or sustainability.
Unfortunately, teaching an AI system to make good decisions isn’t any easy task.
Reinforcement learning models, which underlie these AI decision-making systems, still often fail when faced with even small variations within the tasks they’re trained to perform. Within the case of traffic, a model might struggle to regulate a set of intersections with different speed limits, numbers of lanes, or traffic patterns.
To spice up the reliability of reinforcement learning models for complex tasks with variability, MIT researchers have introduced a more efficient algorithm for training them.
The algorithm strategically selects one of the best tasks for training an AI agent so it will possibly effectively perform all tasks in a group of related tasks. Within the case of traffic signal control, each task could possibly be one intersection in a task space that features all intersections in the town.
By specializing in a smaller variety of intersections that contribute essentially the most to the algorithm’s overall effectiveness, this method maximizes performance while keeping the training cost low.
The researchers found that their technique was between five and 50 times more efficient than standard approaches on an array of simulated tasks. This gain in efficiency helps the algorithm learn a greater solution in a faster manner, ultimately improving the performance of the AI agent.
“We were in a position to see incredible performance improvements, with a quite simple algorithm, by considering outside the box. An algorithm that is just not very complicated stands a greater likelihood of being adopted by the community since it is less complicated to implement and easier for others to grasp,” says senior creator Cathy Wu, the Thomas D. and Virginia W. Cabot Profession Development Associate Professor in Civil and Environmental Engineering (CEE) and the Institute for Data, Systems, and Society (IDSS), and a member of the Laboratory for Information and Decision Systems (LIDS).
She is joined on the paper by lead creator Jung-Hoon Cho, a CEE graduate student; Vindula Jayawardana, a graduate student within the Department of Electrical Engineering and Computer Science (EECS); and Sirui Li, an IDSS graduate student. The research will likely be presented on the Conference on Neural Information Processing Systems.
Finding a middle ground
To coach an algorithm to regulate traffic lights at many intersections in a city, an engineer would typically choose from two foremost approaches. She will be able to train one algorithm for every intersection independently, using only that intersection’s data, or train a bigger algorithm using data from all intersections after which apply it to every one.
But each approach comes with its share of downsides. Training a separate algorithm for every task (corresponding to a given intersection) is a time-consuming process that requires an unlimited amount of information and computation, while training one algorithm for all tasks often results in subpar performance.
Wu and her collaborators sought a sweet spot between these two approaches.
For his or her method, they select a subset of tasks and train one algorithm for every task independently. Importantly, they strategically select individual tasks that are most probably to enhance the algorithm’s overall performance on all tasks.
They leverage a standard trick from the reinforcement learning field called zero-shot transfer learning, through which an already trained model is applied to a brand new task without being further trained. With transfer learning, the model often performs remarkably well on the brand new neighbor task.
“We understand it could be ideal to coach on all of the tasks, but we wondered if we could get away with training on a subset of those tasks, apply the result to all of the tasks, and still see a performance increase,” Wu says.
To discover which tasks they need to select to maximise expected performance, the researchers developed an algorithm called Model-Based Transfer Learning (MBTL).
The MBTL algorithm has two pieces. For one, it models how well each algorithm would perform if it were trained independently on one task. Then it models how much each algorithm’s performance would degrade if it were transferred to one another task, an idea referred to as generalization performance.
Explicitly modeling generalization performance allows MBTL to estimate the worth of coaching on a brand new task.
MBTL does this sequentially, selecting the duty which ends up in the best performance gain first, then choosing additional tasks that provide the most important subsequent marginal improvements to overall performance.
Since MBTL only focuses on essentially the most promising tasks, it will possibly dramatically improve the efficiency of the training process.
Reducing training costs
When the researchers tested this method on simulated tasks, including controlling traffic signals, managing real-time speed advisories, and executing several classic control tasks, it was five to 50 times more efficient than other methods.
This implies they might arrive at the identical solution by training on far less data. For example, with a 50x efficiency boost, the MBTL algorithm could train on just two tasks and achieve the identical performance as a typical method which uses data from 100 tasks.
“From the attitude of the 2 foremost approaches, which means data from the opposite 98 tasks was not needed or that training on all 100 tasks is confusing to the algorithm, so the performance finally ends up worse than ours,” Wu says.
With MBTL, adding even a small amount of additional training time could lead on to significantly better performance.
In the longer term, the researchers plan to design MBTL algorithms that may extend to more complex problems, corresponding to high-dimensional task spaces. Also they are occupied with applying their approach to real-world problems, especially in next-generation mobility systems.
The research is funded, partly, by a National Science Foundation CAREER Award, the Kwanjeong Educational Foundation PhD Scholarship Program, and an Amazon Robotics PhD Fellowship.