The Basis of Cognitive Complexity: Teaching CNNs to See Connections

-

Liberating education consists in acts of cognition, not transferrals of data.

Paulo freire

heated discussions around artificial intelligence is: What points of human learning is it able to capturing?

Many authors suggest that artificial intelligence models don’t possess the identical capabilities as humans, especially relating to plasticity, flexibility, and adaptation.

Certainly one of the points that models don’t capture are several causal relationships concerning the external world.

This text discusses these issues:

  • The parallelism between convolutional neural networks (CNNs) and the human visual cortex
  • Limitations of CNNs in understanding causal relations and learning abstract concepts
  • The right way to make CNNs learn easy causal relations

Is it the identical? Is it different?

Convolutional networks (CNNs) [2] are multi-layered neural networks that take images as input and might be used for multiple tasks. One of the fascinating points of CNNs is their inspiration from the human visual cortex [1]:

  • Hierarchical processing. The visual cortex processes images hierarchically, where early visual areas capture easy features (comparable to edges, lines, and colours) and deeper areas capture more complex features comparable to shapes, objects, and scenes. CNN, as a result of its layered structure, captures edges and textures within the early layers, while layers further down capture parts or whole objects.
  • Receptive fields. Neurons within the visual cortex reply to stimuli in a particular local region of the visual field (commonly called receptive fields). As we go deeper, the receptive fields of the neurons widen, allowing more spatial information to be integrated. Due to pooling steps, the identical happens in CNNs.
  • Feature sharing. Although biological neurons aren’t similar, similar features are recognized across different parts of the visual field. In CNNs, the varied filters scan your entire image, allowing patterns to be recognized no matter location.
  • Spatial invariance. Humans can recognize objects even after they are moved, scaled, or rotated. CNNs also possess this property.
The connection between components of the visual system and CNN. Image source: here

These features have made CNNs perform well in visual tasks to the purpose of superhuman performance:

Russakovsky et al. [22] recently reported that human performance yields a 5.1% top-5 error on the ImageNet dataset. This number is achieved by a human annotator who’s well-trained on the validation images to be higher aware of the existence of relevant classes. […] Our result (4.94%) exceeds the reported human-level performance. —source [3]

Although CNNs perform higher than humans in several tasks, there are still cases where they fail spectacularly. For instance, in a 2024 study [4], AI models didn’t generalize image classification. State-of-the-art models perform higher than humans for objects on upright poses but fail when objects are on unusual poses.

The fitting label is on the highest of the thing, and the AI incorrect predicted label is below. Image source: here

In conclusion, our results show that (1) humans are still far more robust than most networks at recognizing objects in unusual poses, (2) time is of the essence for such ability to emerge, and (3) even time-limited humans are dissimilar to deep neural networks. —source [4]

Within the study [4], they note that humans need time to reach a task. Some tasks require not only visual recognition but in addition abstractive cognition, which requires time.

The generalization abilities that make humans capable come from understanding the laws that govern relations amongst objects. Humans recognize objects by extrapolating rules and chaining these rules to adapt to latest situations. Certainly one of the best rules is the “same-different relation”: the power to define whether two objects are the identical or different. This ability develops rapidly during infancy and can be importantly related to language development [5-7]. As well as, some animals comparable to geese and chimpanzees even have it [8]. In contrast, learning same-different relations could be very difficult for neural networks [9-10].

Example of a same-different task for a CNN. The network should return a label of 1 if the 2 objects are the identical or a label of 0 in the event that they are different. Image source: here

Convolutional networks show difficulty in learning this relationship. Likewise, they fail to learn other varieties of causal relationships which can be easy for humans. Due to this fact, many researchers have concluded that CNNs lack the inductive bias needed to have the opportunity to learn these relationships.

These negative results don’t mean that neural networks are completely incapable of learning same-different relations. Much larger and longer trained models can learn this relation. For instance, vision-transformer models pre-trained on ImageNet with contrastive learning can show this ability [12].

Can CNNs learn same-different relationships?

The undeniable fact that broad models can learn these sorts of relationships has rekindled interest in CNNs. The identical-different relationship is taken into account amongst the fundamental logical operations that make up the foundations for higher-order cognition and reasoning. Showing that shallow CNNs can learn this idea would allow us to experiment with other relationships. Furthermore, it would allow models to learn increasingly complex causal relationships. That is a vital step in advancing the generalization capabilities of AI.

Previous work suggests that CNNs shouldn’t have the architectural inductive biases to have the opportunity to learn abstract visual relations. Other authors assume that the issue is within the training paradigm. Typically, the classical gradient descent is used to learn a single task or a set of tasks. Given a task t or a set of tasks T, a loss function L is used to optimize the weights φ that ought to minimize the function L:

Image source from here

This might be viewed as simply the sum of the losses across different tasks (if we’ve a couple of task). As an alternative, the Model-Agnostic Meta-Learning (MAML) algorithm [13] is designed to look for an optimal point in weight space for a set of related tasks. MAML seeks to seek out an initial set of weights θ that minimizes the loss function across tasks, facilitating rapid adaptation:

Image source from here

The difference could seem small, but conceptually, this approach is directed toward abstraction and generalization. If there are multiple tasks, traditional training tries to optimize weights for various tasks. MAML tries to discover a set of weights that is perfect for various tasks but at the identical time equidistant in the load space. This start line θ allows the model to generalize more effectively across different tasks.

Meta-learning initial weights for generalization. Image source from here

Since we now have a way biased toward generalization and abstraction, we will test whether we will make CNNs learn the same-different relationship.

On this study [11], they compared shallow CNNs trained with classic gradient descent and meta-learning on a dataset designed for this report. The dataset consists of 10 different tasks that test for the same-different relationship.

The Same-Different dataset. Image source from here

The authors [11] compare CNNs of two, 4, or 6 layers trained in a standard way or with meta-learning, showing several interesting results:

  1. The performance of traditional CNNs shows similar behavior to random guessing.
  2. Meta-learning significantly improves performance, suggesting that the model can learn the same-different relationship. A 2-layer CNN performs little higher than likelihood, but by increasing the depth of the network, performance improves to near-perfect accuracy.
Comparison between traditional training and meta-learning for CNNs. Image source from here

One of the intriguing results of [11] is that the model might be trained in a leave-one-out way (use 9 tasks and leave one out) and show out-of-distribution generalization capabilities. Thus, the model has learned abstracting behavior that’s hardly seen in such a small model (6 layers).

out-of-distribution for same-different classification. Image source from here

Conclusions

Although convolutional networks were inspired by how the human brain processes visual stimuli, they don’t capture a few of its basic capabilities. This is particularly true relating to causal relations or abstract concepts. A few of these relationships might be learned from large models only with extensive training. This has led to the belief that small CNNs cannot learn these relations as a result of a scarcity of architecture inductive bias. Lately, efforts have been made to create latest architectures that would have a bonus in learning relational reasoning. Yet most of those architectures fail to learn these sorts of relationships. Intriguingly, this might be overcome through the usage of meta-learning.

The advantage of meta-learning is to incentivize more abstractive learning. Meta-learning pressure toward generalization, attempting to optimize for all tasks at the identical time. To do that, learning more abstract features is favored (low-level features, comparable to the angles of a selected shape, aren’t useful for generalization and are disfavored). Meta-learning allows a shallow CNN to learn abstract behavior that might otherwise require many more parameters and training.

The shallow CNNs and same-different relationship are a model for higher cognitive functions. Meta-learning and different forms of coaching may very well be useful to enhance the reasoning capabilities of the models.

One other thing!

, which incorporates

Reference

Here is the list of the principal references I consulted to jot down this text, only the primary name for an article is cited.

  1. Lindsay, 2020, Convolutional Neural Networks as a Model of the Visual System: Past, Present, and Future, link
  2. Li, 2020, A Survey of Convolutional Neural Networks: Evaluation, Applications, and Prospects, link
  3. He, 2015, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, link
  4. Ollikka, 2024, A comparison between humans and AI at recognizing objects in unusual poses, link
  5. Premark, 1981, The codes of man and beasts, link
  6. Blote, 1999, Young children’s organizational strategies on a same–different task: A microgenetic study and a training study, link
  7. Lupker, 2015, Is there phonologically based priming within the same-different task? Evidence from Japanese-English bilinguals, link
  8. Gentner, 2021, Learning  and  relations: cross-species comparisons, link
  9. Kim, 2018, Not-so-clevr: learning same–different relations strains feedforward neural networks, link
  10. Puebla, 2021, Can deep convolutional neural networks support relational reasoning within the same-different task? link
  11. Gupta, 2025, Convolutional Neural Networks Can (Meta-)Learn the Same-Different Relation, link
  12. Tartaglini, 2023, Deep Neural Networks Can Learn Generalizable Same-Different Visual Relations, link
  13. Finn, 2017, Model-agnostic meta-learning for fast adaptation of deep networks, link
ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x