Home Artificial Intelligence Training machines to learn more like humans do

Training machines to learn more like humans do

0
Training machines to learn more like humans do

Imagine sitting on a park bench, watching someone stroll by. While the scene may continuously change because the person walks, the human brain can transform that dynamic visual information right into a more stable representation over time. This ability, referred to as perceptual straightening, helps us predict the walking person’s trajectory.

Unlike humans, computer vision models don’t typically exhibit perceptual straightness, so that they learn to represent visual information in a highly unpredictable way. But when machine-learning models had this ability, it’d enable them to raised estimate how objects or people will move.

MIT researchers have discovered that a selected training method can assist computer vision models learn more perceptually straight representations, like humans do. Training involves showing a machine-learning model thousands and thousands of examples so it could possibly learn a task.

The researchers found that training computer vision models using a method called adversarial training, which makes them less reactive to tiny errors added to pictures, improves the models’ perceptual straightness.

The team also discovered that perceptual straightness is affected by the duty one trains a model to perform. Models trained to perform abstract tasks, like classifying images, learn more perceptually straight representations than those trained to perform more fine-grained tasks, like assigning every pixel in a picture to a category.   

For instance, the nodes throughout the model have internal activations that represent “dog,” which permit the model to detect a dog when it sees any image of a dog. Perceptually straight representations retain a more stable “dog” representation when there are small changes within the image. This makes them more robust.

By gaining a greater understanding of perceptual straightness in computer vision, the researchers hope to uncover insights that might help them develop models that make more accurate predictions. As an illustration, this property might improve the security of autonomous vehicles that use computer vision models to predict the trajectories of pedestrians, cyclists, and other vehicles.

“One among the take-home messages here is that taking inspiration from biological systems, corresponding to human vision, can each offer you insight about why certain things work the best way that they do and in addition encourage ideas to enhance neural networks,” says Vasha DuTell, an MIT postdoc and co-author of a paper exploring perceptual straightness in computer vision.

Joining DuTell on the paper are lead creator Anne Harrington, a graduate student within the Department of Electrical Engineering and Computer Science (EECS); Ayush Tewari, a postdoc; Mark Hamilton, a graduate student; Simon Stent, research manager at Woven Planet; Ruth Rosenholtz, principal research scientist within the Department of Brain and Cognitive Sciences and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); and senior creator William T. Freeman, the Thomas and Gerd Perkins Professor of Electrical Engineering and Computer Science and a member of CSAIL. The research is being presented on the International Conference on Learning Representations.

Studying straightening

After reading a 2019 paper from a team of Recent York University researchers about perceptual straightness in humans, DuTell, Harrington, and their colleagues wondered if that property is likely to be useful in computer vision models, too.

They set out to find out whether various kinds of computer vision models straighten the visual representations they learn. They fed each model frames of a video after which examined the representation at different stages in its learning process.

If the model’s representation changes in a predictable way across the frames of the video, that model is straightening. At the top, its output representation needs to be more stable than the input representation.

“You possibly can consider the representation as a line, which starts off really curvy. A model that straightens can take that curvy line from the video and straighten it out through its processing steps,” DuTell explains.

Most models they tested didn’t straighten. Of the few that did, those which straightened most effectively had been trained for classification tasks using the technique referred to as adversarial training.

Adversarial training involves subtly modifying images by barely changing each pixel. While a human wouldn’t notice the difference, these minor changes can idiot a machine so it misclassifies the image. Adversarial training makes the model more robust, so it won’t be tricked by these manipulations.

Because adversarial training teaches the model to be less reactive to slight changes in images, this helps it learn a representation that’s more predictable over time, Harrington explains.

“People have already had this concept that adversarial training might provide help to get your model to be more like a human, and it was interesting to see that carry over to a different property that individuals hadn’t tested before,” she says.

However the researchers found that adversarially trained models only learn to straighten once they are trained for broad tasks, like classifying entire images into categories. Models tasked with segmentation — labeling every pixel in a picture as a certain class — didn’t straighten, even once they were adversarially trained.

Consistent classification

The researchers tested these image classification models by showing them videos. They found that the models which learned more perceptually straight representations tended to appropriately classify objects within the videos more consistently.

“To me, it’s amazing that these adversarially trained models, which have never even seen a video and have never been trained on temporal data, still show some amount of straightening,” DuTell says.

The researchers don’t know exactly what concerning the adversarial training process enables a pc vision model to straighten, but their results suggest that stronger training schemes cause the models to straighten more, she explains.

Constructing off this work, the researchers wish to use what they learned to create recent training schemes that will explicitly give a model this property. In addition they wish to dig deeper into adversarial training to know why this process helps a model straighten.

“From a biological standpoint, adversarial training doesn’t necessarily make sense. It’s not how humans understand the world. There are still plenty of questions on why this training process seems to assist models act more like humans,” Harrington says.

“Understanding the representations learned by deep neural networks is critical to enhance properties corresponding to robustness and generalization,” says Bill Lotter, assistant professor on the Dana-Farber Cancer Institute and Harvard Medical School, who was not involved with this research. “Harrington et al. perform an intensive evaluation of how the representations of computer vision models change over time when processing natural videos, showing that the curvature of those trajectories varies widely depending on model architecture, training properties, and task. These findings can inform the event of improved models and in addition offer insights into biological visual processing.”

“The paper confirms that straightening natural videos is a reasonably unique property displayed by the human visual system. Only adversarially trained networks display it, which provides an interesting reference to one other signature of human perception: its robustness to numerous image transformations, whether natural or artificial,” says Olivier Hénaff, a research scientist at DeepMind, who was not involved with this research. “That even adversarially trained scene segmentation models don’t straighten their inputs raises vital questions for future work: Do humans parse natural scenes in the identical way as computer vision models? Find out how to represent and predict the trajectories of objects in motion while remaining sensitive to their spatial detail? In connecting the straightening hypothesis with other elements of visual behavior, the paper lays the groundwork for more unified theories of perception.”

The research is funded, partly, by the Toyota Research Institute, the MIT CSAIL METEOR Fellowship, the National Science Foundation, the U.S. Air Force Research Laboratory, and the U.S. Air Force Artificial Intelligence Accelerator.

LEAVE A REPLY

Please enter your comment!
Please enter your name here