Latest research shows that reorganizing a model’s visual representations could make it more helpful, robust and reliable
“Visual” artificial intelligence (AI) is all over the place. We use it to sort our photos, discover unknown flowers and steer our cars. But these powerful systems don’t at all times “see” the world as we do, and they often behave in surprising ways. For instance, an AI system that may discover tons of of automotive manufacturers and models might still fail to capture the commonalities between a automotive and an airplane, i.e. each are large vehicles made primarily of metal.
To raised understand these differences, today we’re publishing a latest paper in Nature analyzing the vital ways AI systems organize the visual world in another way from humans. We present a way for higher aligning these systems with human knowledge, and show that addressing these discrepancies improves their robustness and skill to generalize.
This work is a step towards constructing more intuitive and trustworthy AI systems.
Why AI struggles with the “odd one out”
Once you see a cat, your brain creates a mental representation that captures all the pieces concerning the cat, from basic concepts like its color and furriness to high-level concepts like its “cat-ness.” AI vision models also produce representations, by mapping images to points in a high-dimensional space where similar items (like two sheep) are placed close together, and different ones (a sheep and a cake) are far apart.
To grasp the differences in how human and model representations are organized, we used the classic “odd-one-out” task from cognitive science, asking each humans and models to select which of three given images doesn’t slot in with the others. This test reveals which two items they “see” as most similar.
Sometimes, everyone agrees. Given a tapir, a sheep, and a birthday cake, each humans and models reliably pick the cake because the odd one out. Other times, the correct answer is unclear, and folks and models disagree.
Interestingly, we also found many cases where humans strongly agree on a solution, however the AI models get it flawed. For the third example below, most individuals agree the starfish is the odd one out. But most vision models focus more on superficial features like background color and texture, and select the cat as an alternative.
