Like human brains, large language models reason about diverse data in a general way

-

While early language models could only process text, contemporary large language models now perform highly diverse tasks on various kinds of data. As an illustration, LLMs can understand many languages, generate computer code, solve math problems, or answer questions on images and audio.   

MIT researchers probed the inner workings of LLMs to raised understand how they process such assorted data, and located evidence that they share some similarities with the human brain.

Neuroscientists imagine the human brain has a “semantic hub” within the anterior temporal lobe that integrates semantic information from various modalities, like visual data and tactile inputs. This semantic hub is connected to modality-specific “spokes” that route information to the hub. The MIT researchers found that LLMs use an analogous mechanism by abstractly processing data from diverse modalities in a central, generalized way. As an illustration, a model that has English as its dominant language would depend on English as a central medium to process inputs in Japanese or reason about arithmetic, computer code, etc. Moreover, the researchers exhibit that they will intervene in a model’s semantic hub by utilizing text within the model’s dominant language to vary its outputs, even when the model is processing data in other languages.

These findings could help scientists train future LLMs which can be higher capable of handle diverse data.

“LLMs are big black boxes. They’ve achieved very impressive performance, but we now have little or no knowledge about their internal working mechanisms. I hope this will be an early step to raised understand how they work so we are able to improve upon them and higher control them when needed,” says Zhaofeng Wu, an electrical engineering and computer science (EECS) graduate student and lead writer of a paper on this research.

His co-authors include Xinyan Velocity Yu, a graduate student on the University of Southern California (USC); Dani Yogatama, an associate professor at USC; Jiasen Lu, a research scientist at Apple; and senior writer Yoon Kim, an assistant professor of EECS at MIT and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL). The research will likely be presented on the International Conference on Learning Representations.

Integrating diverse data

The researchers based the brand new study upon prior work which hinted that English-centric LLMs use English to perform reasoning processes on various languages.

Wu and his collaborators expanded this concept, launching an in-depth study into the mechanisms LLMs use to process diverse data.

An LLM, which consists of many interconnected layers, splits input text into words or sub-words called tokens. The model assigns a representation to every token, which enables it to explore the relationships between tokens and generate the subsequent word in a sequence. Within the case of images or audio, these tokens correspond to particular regions of a picture or sections of an audio clip.

The researchers found that the model’s initial layers process data in its specific language or modality, just like the modality-specific spokes within the human brain. Then, the LLM converts tokens into modality-agnostic representations because it reasons about them throughout its internal layers, akin to how the brain’s semantic hub integrates diverse information.

The model assigns similar representations to inputs with similar meanings, despite their data type, including images, audio, computer code, and arithmetic problems. Despite the fact that a picture and its text caption are distinct data types, because they share the identical meaning, the LLM would assign them similar representations.

As an illustration, an English-dominant LLM “thinks” a few Chinese-text input in English before generating an output in Chinese. The model has an analogous reasoning tendency for non-text inputs like computer code, math problems, and even multimodal data.

To check this hypothesis, the researchers passed a pair of sentences with the identical meaning but written in two different languages through the model. They measured how similar the model’s representations were for every sentence.

Then they conducted a second set of experiments where they fed an English-dominant model text in a special language, like Chinese, and measured how similar its internal representation was to English versus Chinese. The researchers conducted similar experiments for other data types.

They consistently found that the model’s representations were similar for sentences with similar meanings. As well as, across many data types, the tokens the model processed in its internal layers were more like English-centric tokens than the input data type.

“A whole lot of these input data types seem extremely different from language, so we were very surprised that we are able to probe out English-tokens when the model processes, for instance, mathematic or coding expressions,” Wu says.

Leveraging the semantic hub

The researchers think LLMs may learn this semantic hub strategy during training since it is a cheap approach to process varied data.

“There are literally thousands of languages on the market, but lots of the knowledge is shared, like commonsense knowledge or factual knowledge. The model doesn’t must duplicate that knowledge across languages,” Wu says.

The researchers also tried intervening within the model’s internal layers using English text when it was processing other languages. They found that they might predictably change the model outputs, despite the fact that those outputs were in other languages.

Scientists could leverage this phenomenon to encourage the model to share as much information as possible across diverse data types, potentially boosting efficiency.

But then again, there may very well be concepts or knowledge that aren’t translatable across languages or data types, like culturally specific knowledge. Scientists might want LLMs to have some language-specific processing mechanisms in those cases.

“How do you maximally share each time possible but in addition allow languages to have some language-specific processing mechanisms? That may very well be explored in future work on model architectures,” Wu says.

As well as, researchers could use these insights to enhance multilingual models. Often, an English-dominant model that learns to talk one other language will lose a few of its accuracy in English. A greater understanding of an LLM’s semantic hub could help researchers prevent this language interference, he says.

“Understanding how language models process inputs across languages and modalities is a key query in artificial intelligence. This paper makes an interesting connection to neuroscience and shows that the proposed ‘semantic hub hypothesis’ holds in modern language models, where semantically similar representations of various data types are created within the model’s intermediate layers,” says Mor Geva Pipek, an assistant professor within the School of Computer Science at Tel Aviv University, who was not involved with this work. “The hypothesis and experiments nicely tie and extend findings from previous works and may very well be influential for future research on creating higher multimodal models and studying links between them and brain function and cognition in humans.”

This research is funded, partly, by the MIT-IBM Watson AI Lab.

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x