Researchers enhance peripheral vision in AI models

Artificial Intelligence

Researchers enhance peripheral vision in AI models

admin

March 8, 2024

Researchers enhance peripheral vision in AI models

Peripheral vision enables humans to see shapes that aren’t directly in our line of sight, albeit with less detail. This ability expands our field of regard and could be helpful in lots of situations, equivalent to detecting a vehicle approaching our automotive from the side.

Unlike humans, AI doesn’t have peripheral vision. Equipping computer vision models with this ability could help them detect approaching hazards more effectively or predict whether a human driver would notice an oncoming object.

Taking a step on this direction, MIT researchers developed a picture dataset that permits them to simulate peripheral vision in machine learning models. They found that training models with this dataset improved the models’ ability to detect objects within the visual periphery, although the models still performed worse than humans.

Their results also revealed that, unlike with humans, neither the scale of objects nor the quantity of visual clutter in a scene had a robust impact on the AI’s performance.

“There’s something fundamental happening here. We tested so many various models, and even once we train them, they get just a little bit higher but they aren’t quite like humans. So, the query is: What’s missing in these models?” says Vasha DuTell, a postdoc and co-author of a paper detailing this study.

Answering that query may help researchers construct machine learning models that may see the world more like humans do. Along with improving driver safety, such models might be used to develop displays which are easier for people to view.

Plus, a deeper understanding of peripheral vision in AI models could help researchers higher predict human behavior, adds lead writer Anne Harrington MEng ’23.

“Modeling peripheral vision, if we are able to really capture the essence of what’s represented within the periphery, can assist us understand the features in a visible scene that make our eyes move to gather more information,” she explains.

Their co-authors include Mark Hamilton, an electrical engineering and computer science graduate student; Ayush Tewari, a postdoc; Simon Stent, research manager on the Toyota Research Institute; and senior authors William T. Freeman, the Thomas and Gerd Perkins Professor of Electrical Engineering and Computer Science and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); and Ruth Rosenholtz, principal research scientist within the Department of Brain and Cognitive Sciences and a member of CSAIL. The research can be presented on the International Conference on Learning Representations.

“Any time you might have a human interacting with a machine — a automotive, a robot, a user interface — it’s hugely essential to grasp what the person can see. Peripheral vision plays a critical role in that understanding,” Rosenholtz says.

Simulating peripheral vision

Extend your arm in front of you and put your thumb up — the small area around your thumbnail is seen by your fovea, the small depression in the midst of your retina that gives the sharpest vision. Every little thing else you may see is in your visual periphery. Your visual cortex represents a scene with less detail and reliability because it moves farther from that sharp point of focus.

Many existing approaches to model peripheral vision in AI represent this deteriorating detail by blurring the perimeters of images, but the data loss that happens within the optic nerve and visual cortex is way more complex.

For a more accurate approach, the MIT researchers began with a method used to model peripheral vision in humans. Often known as the feel tiling model, this method transforms images to represent a human’s visual information loss.

They modified this model so it could transform images similarly, but in a more flexible way that doesn’t require knowing prematurely where the person or AI will point their eyes.

“That permit us faithfully model peripheral vision the identical way it’s being done in human vision research,” says Harrington.

The researchers used this modified technique to generate an enormous dataset of transformed images that appear more textural in certain areas, to represent the lack of detail that happens when a human looks further into the periphery.

Then they used the dataset to coach several computer vision models and compared their performance with that of humans on an object detection task.

“We needed to be very clever in how we arrange the experiment so we could also test it within the machine learning models. We didn’t need to need to retrain the models on a toy task that they weren’t meant to be doing,” she says.

Peculiar performance

Humans and models were shown pairs of transformed images which were an identical, except that one image had a goal object situated within the periphery. Then, each participant was asked to select the image with the goal object.

“One thing that basically surprised us was how good people were at detecting objects of their periphery. We went through not less than 10 different sets of images that were just too easy. We kept needing to make use of smaller and smaller objects,” Harrington adds.

The researchers found that training models from scratch with their dataset led to the best performance boosts, improving their ability to detect and recognize objects. Nice-tuning a model with their dataset, a process that involves tweaking a pretrained model so it could actually perform a latest task, resulted in smaller performance gains.

But in every case, the machines weren’t pretty much as good as humans, and so they were especially bad at detecting objects within the far periphery. Their performance also didn’t follow the identical patterns as humans.

“Which may suggest that the models aren’t using context in the identical way as humans are to do these detection tasks. The strategy of the models is likely to be different,” Harrington says.

The researchers plan to proceed exploring these differences, with a goal of finding a model that may predict human performance within the visual periphery. This might enable AI systems that alert drivers to hazards they won’t see, as an illustration. In addition they hope to encourage other researchers to conduct additional computer vision studies with their publicly available dataset.

“This work is essential since it contributes to our understanding that human vision within the periphery mustn’t be considered just impoverished vision as a result of limits within the variety of photoreceptors we have now, but slightly, a representation that’s optimized for us to perform tasks of real-world consequence,” says Justin Gardner, an associate professor within the Department of Psychology at Stanford University who was not involved with this work. “Furthermore, the work shows that neural network models, despite their advancement lately, are unable to match human performance on this regard, which should result in more AI research to learn from the neuroscience of human vision. This future research can be aided significantly by the database of images provided by the authors to mimic peripheral human vision.”

This work is supported, partly, by the Toyota Research Institute and the MIT CSAIL METEOR Fellowship.

LEAVE A REPLY Cancel reply