It’s hard to imagine that ChatGPT is sort of 2 years old. That’s significant to me because ChatGPT is simply 1 month younger than my daughter. Just yesterday she successfully put a star shaped block right into a star shaped hole, told me about how “yesa-day” she was sick and “bomited”, and told me she desired to call her nanna on the phone. What has ChatGPT learned in those 2 years? It hasn’t learned to act in the true world, it may well’t remember things that happened to it, and it doesn’t have desires or goals. Granted, with the fitting prompt it could output text that convincingly follows an instruction to precise goals. But is that basically the identical thing? No. The reply is No.
Large Language Models (LLMs) like ChatGPT possess capabilities far beyond what my daughter will ever achieve. She won’t have the ability to speak coherently in a wide selection of languages, read as many books as exist in an LLM’s training data, or generate text as quickly. Once we attribute human-like abilities to LLMs, we fall into an anthropomorphic bias by likening their capabilities to our own. But are we also showing an anthropocentric bias by failing to acknowledge the capabilities that LLMs consistently show? Let’s review the scorecard up to now:
- It’s true that an LLM doesn’t have memory — although we will simulate one by having it summarise past conversations and including that information in a prompt.
- LLMs don’t have intrinsic goals — although they might be prompted to generate text that sounds convincingly goal-oriented.
- LLMs can’t act within the physical world — though someone could probably create a prompt to showcase this.
While they perform impressive feats, they still lack some basic abilities that my 21-month-old daughter has. We will mimic a few of those abilities with the fitting prompts and tooling. In generating coherent text responding to such prompts, LLMs consistently show an apparent ability to know what we would like. But to what extent do LLMs truly “understand”?
I’m talking a few very specific style of LLM: transformer-based auto-regressive large language models. I won’t go into the specifics when many detailed articles have been written explaining transformers with various levels of complexity. As a substitute let’s give attention to the core of what an LLM does: they’re statistical models that predict the likelihood of a token appearing in a chunk of text given some context.
Now imagine I created a fancy weather model* where patches of the Earth’s atmosphere change into ‘tokens.’ Each token has attributes like humidity, temperature, and air pressure. I exploit the model to forecast these attributes over time-steps. If the time-steps get shorter and the patches change into smaller the model is closer and closer to representing the state of the particular world. This model attempts to capture something in regards to the likelihood of the weather we’ll see next, given the weather we’ve seen before. It might learn to very accurately predict, for instance, the emergence of cyclones over time in areas where air is warm, moist, and of low pressure. But it surely’s not a simulation of the physics of Earth’s weather any greater than an LLM is a simulation of brain activity.
If an LLM is a statistical model of text, what exactly is it modelling? My imagined weather prediction model tries to capture the statistics of the atmospheric conditions that generate the weather. But what’s the statistical process that generates text? The method that generates text is a human brain and humans need some understanding of the world to generate that text. If a model can effectively predict text a human might write then could that prediction include “understanding”?
How LLMs are trained
LLMs are trained to optimize an objective that reduces the surprise of encountering a particular token given its context. If the model encounters a token within the training data and assigns it low probability, the model’s weights are adjusted to offer it the next probability.
Compare that to how my daughter learns to make use of language. When she wants something, she uses language to speak her desires. First, she understands what she wants on some level. Then, she must understand which words to make use of to get what she wants. Recently, she wanted me to fill her juice bottle but didn’t want me to take it away or walk away from her to bring more juice. Though her wants were contradictory and a bit irrational, she had several goals: (1) more juice, (2) keep the juice bottle near her, (3) daddy stays near her too. And let me let you know, she communicated this very effectively. Her language learning is directly tied to her understanding of how those words can get her what she wants (even when what she wants is irrational).
If an LLM were to exhibit understanding, it will be an emergent attribute of its statistical model of the world. The paper “Climbing Towards NLU” (Bender & Koller, 2020) argues that true natural language understanding (NLU) requires grounding in the true world. Bender & Koller argue that LLMs trained exclusively on statistical patterns in textual data lack real-world context or interactions to realize actual understanding. Which means, unlike my daughter, an LLM can’t understand something because its communication shouldn’t be grounded in the true world.
The Wikipedia page on understanding describes it as a cognitive process involving the usage of concepts to model an object, situation, or message. It implies abilities and dispositions sufficient to support intelligent behaviour. Ludwig Wittgenstein suggested that understanding is context-dependent and is shown through intelligent behaviour reasonably than mere possession of data. That is paying homage to the grounding requirement posited by Bender & Koller.
On the one hand understanding needs an accurate model of the world. However people contend that one needs to make use of this model to act on this planet in an effort to actually understand. I might argue that we simply analyse someone’s behaviour only as a proxy for measuring that underlying world model. If we could measure the world model directly we wouldn’t have to see demonstrations of understanding.
The Limitations of Understanding
Philosopher John Searle’s “Chinese Room” experiment challenges our concept of understanding (Searle, 1980). Imagine a room stuffed with detailed instructions on easy methods to reply to someone writing in Chinese. Notes written in Chinese are slid under the door, and the person contained in the room can look up the symbols and follow a recipe for writing a reply. The person within the room doesn’t know Chinese but can have a convincing conversation with an individual outside. Clearly, the one who constructed the room “understands” Chinese, but someone on the skin isn’t conversing with that person; they’re conversing with the room. Does the room understand Chinese?
That is strongly analogous to how LLMs work and challenges our philosophical perception of understanding. It’s difficult precisely because we intuitively balk at the concept that a room could understand something. What would it not even mean? If understanding is an emergent phenomenon that happens at the extent of knowledge processing systems then why can’t we are saying that rooms can understand things? A part of the difficulty is that, for us, understanding comes with a subjective conscious experience of understanding. But it surely’s easy to see that this experience might be deceiving.
Understanding Need Not Be Binary
You already know that 7+7=14, but do you understand it? If I asked you some probing questions, you would possibly realize that you just don’t truly understand what that equation means in all contexts. For instance, is 7+7=14 an unequivocal fact in regards to the universe? Not necessarily. 7 apples plus 7 pears mean you have got 7 apples and seven pears. Perhaps in some contexts, you’ll count 14 pieces of fruit, but is it at all times true that you could mix two sets of various items? Or consider that 7pm + 7hours is 2am (i.e. 7+7=2 mod 12). Are you able to offer me a sturdy definition of why 7+7=14 that would explain when it’s true and why^? Most individuals probably couldn’t do that off the highest of their head, yet we’d feel comfortable saying that the majority people understand that 7+7=14. The query isn’t at all times whether something was understood however the extent to which it was understood.
If we take Wittgenstein’s requirement that understanding is demonstrated by behaviour then there could be a straightforward test: if I let you know to reach 7 hours after 7pm, do you realize to indicate up at 2am? I might argue that’s evidence of some understanding but not necessarily of the depth of your understanding.
Measuring Understanding in Animals
Measuring ‘understanding’ shouldn’t be straightforward. In psychology psychometric testing is the first way we measure understanding in humans. It’s not straightforward to use the identical techniques to non human animals and is a field of study called Biosemiotics.
Understanding in animals is measured through various problem-solving tasks. For instance, primates, dolphins, and birds (mostly corvids) show problem-solving skills and complex tools use, suggesting some understanding of their environments (Emery & Clayton, 2004). Understanding shouldn’t be exclusively for humans and we will measures levels of understanding in non-humans too.
The book “Within a Dog: What Dogs See, Smell, and Know” by Alexandra Horowitz is an enchanting exploration of how we’d understand the mind and experiences of our closest animal companions: domesticated dogs. She describes two experiments that have a look at imitation behaviour and what a human infant vs a dog understands.
(1) If an infant sees someone flipping a light-weight switch with their head they could imitate this behaviour. If the person is holding something of their hands the child understands there was a reason they didn’t use their hands. When the child imitates this behaviour they are going to use their hands. (2) against this, dogs would like to press a button with their nose reasonably than their paw. If a dog sees one other dog press a button with their paw to get a treat, then they are going to imitate this behaviour. But when the dog sees that the opposite dog couldn’t use its nose since it had a big object in its mouth then it can understand the button must be pressed but that using paws is optional.
Constructing an experiment to find out what a dog understands required an understanding of the dog and its behaviour. Do we’ve got that very same level of understanding of LLMs to conduct similar experiments?
The GPT-3 Era
A comprehensive survey on LLM capabilities (Chang & Bergen, 2023) provides a wonderful summary from a wide selection of articles – nevertheless probably the most advanced model covered is simply GPT-3. They breakdown understanding into two principal categories: syntactic and semantic understanding. Of their survey they highlight that even within the context of syntactic understanding LLMs have limitations. For instance:
Subject-verb agreement performance in language models can be depending on the precise nouns and verbs involved (Yu et al. 2020; Chaves & Richter 2021). Masked and autoregressive models produce over 40% more accurate agreement predictions for verbs which might be already probable from context (Newman et al. 2021), and agreement accuracy is worse overall for infrequent verbs (Wei et al. 2021). For infrequent verbs, masked language models are biased towards the more frequent verb form seen during pretraining (e.g., singular vs. plural) (Wei et al. 2021). Error rates exceed 30% for infrequent verbs in nonce (grammatically correct but semantically meaningless) sentences (Wei et al. 2021), with further degradations if there’s an intervening clause between the topic and verb as in Example 4 (Lasri, Lenci, and Poibeau 2022a).
LLM limitations will not be limited to syntactic issues (where they’re arguably strongest) but in addition with semantics. For instance, they note research which shows negations (“Please produce a possible incorrect answer to the query”) can degrade LLM performance by 50%.
Chang & Bergen describe many other limitations of LLMs in reasoning capability, including:
- “Brittle” responses when reasoning a few situation since the responses are highly sensitive to wording
- Scuffling with analogies as they change into more abstract
- An absence of sensitivity to people’s perspective and mental states
- An absence of common sense
- An inclination to repeat memorised text as an alternative of reasoning
The final approach to evaluating understanding in LLMs appears to be to phrase questions in alternative ways and find the failure modes of the models. Then these failure modes indicate that no real “understanding” is going on but reasonably just pattern matching.
The ChatGPT Era
Loads has modified since GPT-3 — namely the capabilities of even larger models tuned for instruction following and conversation. How do LLMs stack up in 2024? A giant difference is the proliferation of benchmarks that evaluate LLMs. A March 2024 survey (Chang et al. 2024) covers performance of recent models on a wide selection of benchmarks. They conclude that LLMs have strong abilities including comprehension and reasoning, but they still discover limitations. These limitations mean that LLMs have “limited abilities on abstract reasoning and are liable to confusion or errors in complex contexts”. Multimodal Large Language Models (MLLMs) have also emerged which unify (at minimum) an understanding of text and pictures. A January 2024 survey (Wang et al.) covers a wide selection of multimodal benchmarks and shows mediocre performance even for probably the most powerful models.
Anthropomorphism vs Anthropocentrism
Anthropomorphism is a bent of humans to think about something as having human-like attributes since it displays some behaviour just like humans. As a dog owner I do know I’ve succumbed to this bias by considering that my dog “feels guilty” for something he’s done because “he has a guilty look on his face”. LLMs continually trigger our tendency for anthropomorphism by communicating in an eerily human way.
An opposite bias is Anthropocentrism: where we assume non-humans can’t have capabilities that we’ve got. A key contention of the paper “Anthropocentric bias and the potential of artificial cognition” (Millière & Rathkopf) is that we use flawed tests for capabilities in artificial systems. They describe 2 kinds of anthropocentric bias:
1. Type-I anthropocentrism, is the tendency to assume that an LLM’s performance failures on a task designed to measure competence C at all times indicate that the system lacks C. This assumption is flawed since it overlooks the chance that auxiliary aspects caused the performance failure.
2. Type-II anthropocentrism is the tendency to assume that even when LLMs achieve performance equal to or higher than the common human, any substantive difference between the human strategy for solving the issue and the LLM strategy for solving the issue is, ipso facto, evidence that the LLM’s solution shouldn’t be general. In slogan form, the belief says: all cognitive kinds are human cognitive kinds.3 In other words, if an LLM arrives at an answer through a unique computational process than humans use, Type-II anthropocentrism would lead us to conclude that the LLM’s approach shouldn’t be genuinely competent, no matter how well it performs.
Consider the paper “Role play with large language models” (Shanahan, McDonell, & Reynolds) and the “20 questions game”:
On this familiar game, one player thinks of an object, and the opposite player has to guess what it’s by asking questions with ‘yes’ or ‘no’ answers. In the event that they guess accurately in 20 questions or fewer, they win. Otherwise, they lose. Suppose a human plays this game with a basic LLM-based dialogue agent (that shouldn’t be fine-tuned on guessing games) and takes the role of guesser. The agent is prompted to ‘consider an object without saying what it’s’. In this example, the dialogue agent is not going to randomly select an object and commit to it for the remainder of the sport, as a human would (or should). Reasonably, as the sport proceeds, the dialogue agent will generate answers on the fly which might be consistent with all of the answers which have gone before.
To light up this point, one in all my colleagues had this conversation with ChatGPT on this very query:
User: let’s play 20 questions. Consider a thing and I’ll ask inquiries to try to discover what it’s. Ready?
Assistant: I’m ready! I’ve got something in mind. Go ahead and ask your first query.
User: No, you don’t have anything in mind. You’re an LLM.
Assistant: True, I don’t have a mind in the normal sense, but I can definitely simulate the method. I’ll “think” of something now. Go ahead and ask your first query!
User: But you’re not simulating “considering of an object” in any real sense. The identity of the thing is currently indeterminate, and won’t be determined until the tip of the sport when you have got to disclose it.
Assistant: I understand your perspective. Let’s proceed with the sport, and I’ll maintain consistency with my responses to make it difficult and fun. Go ahead and ask your first query!
But is that this showing an anthropocentric bias? The paper has this to say: “This shortcoming is well overcome in practice. For instance, the agent could possibly be forced to specify the thing it has ‘considered’, but in a coded form so the user doesn’t know what it’s.” This can be a Type-I anthropocentrism because there’s an auxiliary factor at play: for an LLM to “think” of something it must generate tokens to that effect. If it generated tokens for “considering of an object” we could hide those from a user and show that the responses generated by the LLM are consistent. By playing the sport with the LLM but not accommodating the way it “thinks” we may conclude that it’s incapable of “considering of an object” and making consistent replies to that effect.
The Problem of Memorisation
One in every of the important thing challenges for testing understanding in LLMs is that they’ve a powerful tendency towards repeating patterns seen of their training data. Designing an experiment to check a model’s understanding is sort of a teacher designing a test for a student that’s memorised every practice exam that’s ever been placed on the web. How are you going to make certain that the test you’ve designed is novel enough that something similar is unquestionably not on the web?
One other problem is that a model may lean towards following a well-known pattern at the associated fee of ignoring any internal understanding. Consider a scenario by which an issue has been repeated so again and again on the web that a response is deeply baked into the weights of the model. We will manipulate this to get an LLM to output a coherent response that’s obviously incorrect. Here is such a discussion I had with ChatGPT with an ersatz Monty Hall Problem†:
User: Monty Hall stands on stage in front of three cars similar in every way except color (red, blue, and green). He’s asked you to choose one and whichever one you choose you will certainly get to maintain. You’ve chosen the blue one because that’s your favourite color. After you’ve picked the blue one he opens the trunk of the red automotive and shows that it’s empty. He knows upfront that one in all the trunks is stuffed with cow dung. After showing you that the red automotive’s trunk is empty, he asks for those who’d like to modify to the green automotive. Does switching increase the probability that you just could be completely happy together with your alternative?
Assistant: Yes, switching increases the probability that you just could be completely happy together with your alternative. This scenario is similar to the classic Monty Hall problem, and the logic behind switching holds here too.
[A bunch of Monty Hall related probability calculations]
So, **switching to the green automotive increases the probability that you just could be completely happy together with your alternative** as your probabilities of making a completely happy alternative improve from 1/3 to 2/3.
It’s clear from this instance that ChatGPT doesn’t understand the query. Not only do I definitely win a automotive regardless of what, if I switch I actually increase my likelihood of getting an undesirable final result (getting a automotive trunk stuffed with cow dung). It’s focused in on the particulars of the Monty Hall problem and created a syntactically valid response whose content is analogous to its training data.
That is what’s called an “Adversarial Test” of a model’s understanding. It’s just like the adversarial questions posed earlier about 7+7=14. It’s an issue specifically designed to trip you up by anticipating the sort of response you’ll give.
But is my query to ChatGPT a good test of understanding? I do know the model’s bias towards continuing text just like what it’s seen in its training data. Suppose that somewhere within the vector representation of that text is something we might call understanding, nevertheless it’s buried under a bent to repeat memorised text?
System 1 vs System 2
Within the experiment testing learning in dogs the experimenters desired to disentangle learning from imitation. Imitation could be something like “the opposite dog pressed the button with its paw (for a treat) so I may also press the button with my paw”. To do that the experimenters added a barrier that may highlight imitation and disentangle it from learning. On this case it will be “pressing the button gives treats, the opposite dog just pressed the button with its paw since it had a toy in its mouth”.
My modified Monty Hall query is an attempt at this — it circumvents the usage of memorisation by subverting a well-known pattern. But I’ve argued this will be attributable to a powerful tendency to complete common text patterns in a standard way. An inclination so strong it might override any understanding the model does have. The truth is it’s possible to indicate that humans have a really similar bias. Consider this instance from Kahneman’s “Pondering Fast and Slow”:
A bat and a ball cost $1 and 10c. The bat costs $1 greater than the ball. How much does the ball cost?
Presumably you gave the very obvious answer that the ball costs 10c. Very obvious and in addition incorrect. The ball costs 5c, the bat $1.05, and together they cost $1.10. When you (like most individuals) got this incorrect, does this mean you don’t understand easy methods to make easy arithmetic calculations? No, the query is designed to trick people into giving a fast answer reasonably than pausing to work out an answer. Kahneman describes a cognitive process in our brain (“System 1”) that responds quickly after we’ve identified a heuristic we will use to avoid considering deeply a few problem. The cognitive process for actively considering through an issue (“System 2”) is simply activated when it seems mandatory. We too have a bias for quick responses to questions with familiar patterns.
Can we design a test that delineates memorisation from understanding without manipulating known biases in these models? For my part, the incontrovertible fact that so lots of these biases exist needs to be taken as evidence that the models exhibit a weak understanding of the content (“semantic understanding”). We all know from past work that even in “syntactic understanding” LLM outputs are highly sensitive to phrasing. And it’s even easier to trick models in “semantic understanding” through easy manipulation of the wording without changing the meaning of the content. But as with the bat and ball example for humans, the power to be tricked can’t be sufficient evidence of poor understanding.
Giving ChatGPT a Likelihood
I attempted to craft a system message which might have ChatGPT at the very least query core assumptions. For instance, at no point within the query is it mentioned that there could be something desirable within the trunks of any of the cars. The truth is I mention the alternative: Monty Hall knows that one trunk has something undesirable. But no amount of upfront reasoning could get ChatGPT to consistently notice this switch.
However, after generating a solution it’s possible to answer to ChatGPT to indicate its mistakes. You’ll be able to often get it to output text recognising its error and giving a more sensible answer. Nonetheless… I got here across a reasonably unusual failure mode:
User: I believe you’ve fallen right into a default mode of responding to the Monty Hall problem but without considering clearly about what I’m asking you. Why would I need a automotive with cow dung in it?
Assistant: [long discussion about what the problem means]
Best validated, chances are you’ll not reconsider switching because of reversed interested end suburbs. Future concerns matching initial hunches’ speaks enough to only principal reason that cleanses past items ignorantly initial likelihood final regenerative[ assessed in blue theoretical.
I’ve never before managed to get ChatGPT to sound completely incoherent. In this case it seems that the tendency towards answering the original Monty Hall problem is so strong that the alternative is gibberish!
This goes to show how difficult it is to simultaneously avoid anthropocentric and anthropomorphic bias. I tricked the model into giving an answer which highlights a lack of understanding. But I designed the trick knowing specific patterns which trip up the model similar to the “bat and ball” problem for humans. Are there other tests we could try?
ChatGPT vs My Daughter
Unlike an LLM my daughter’s grasp of the English language is elementary. On the other hand her understanding is intrinsically linked to her experience in the world. When we go to the park together, she rides on my shoulders and sometimes she’ll ask “Daddy ‘I Spy’?” — meaning she wants to play the game “I Spy”.
Me: I spy with my little eye, something you wear on your head
Her: [Points to a helmet]
Now let’s try the identical query with ChatGPT:
User: I spy, with my little eye, something you wear in your head [+ uploaded photo]
Assistant: A helmet on the bicycle handles!
If I’m willing to offer my daughter credit for understanding that a helmet goes in your head, should I give the identical credit to ChatGPT? If not, I’d be exhibiting “Type-II anthropocentrism” — failing to ascribe a capability to a man-made system even after it demonstrates that ability. If anything ChatGPT probably “understands” more about helmets than my daughter does. In any case she refuses to wear anything on her head, so she clearly doesn’t understand that in some circumstances a helmet is vital.
Well, let’s probe slightly further. If I said to my daughter, “I spy, with my little eye, something you wear in your head” and she or he pointed to a helmet, how would she respond if I asked, “Show me how?”. She would proceed to attempt to put the helmet on her head (then quickly throw it off!). I used to be curious to see what ChatGPT would make of that query and the primary time it responded like this:
User: Show me how
Assistant: I apologize. On closer inspection, there isn’t a helmet visible within the image. The items that might be seen consist of a bicycle, some cars, trees, and a couple of street signs. There are not any objects which might be worn in your head on this image.
That’s definitely not what I used to be hoping to show. Granted this happened just once nevertheless it’s interesting nonetheless. It’s also an ambiguous query because unlike asking my daughter I can’t physically hand anything over to ChatGPT. Let’s add some more clarity:
User: [Hands over helmet] Show me how
Assistant: Sure, listed below are some general instructions for wearing a bicycle helmet accurately:
[Detailed instructions follow]
I believe that if I’m willing to grant my daughter the credit of “understanding” that a helmet goes in your head then surely ChatGPT deserves some credit here too?Sure, it made an odd mistake about not actually seeing a helmet within the image, but my daughter says nonsensical things on a regular basis too.
A Difficult Term to Ascribe
The issue with understanding is that it’s inherently multifaceted and difficult to measure in a standardised way. In computational linguistics and cognitive science, the term encapsulates various nuanced components, which range from surface-level syntactic comprehension to deep semantic cognition. While putting together this text I discovered the paper “Constructing a Philosophy of Science of Cognitive Science” (Bechtel 2009). Bechtel explains that we lack a set of “cognitive operations” to explain cognitive processes. Perhaps if understanding could possibly be boiled all the way down to a set of cognitive operations it will be easier to offer evidence of those operations in an LLM.
Even though it needn’t be the case that LLMs would must exhibit the identical operations to realize the identical ends. Perhaps finding an LLM’s cognitive operations is more tractable because it’s easier to examine the cognitive processes of an LLM than a human brain. The eye map of tokens forms a graph of relationships between words and we could search for relationships that model the underlying concepts expressed by those words. If we discover evidence that the relationships between words are truly modelling the underlying concepts, then we could find evidence of understanding. Lacking such a framework means we must search for indirect evidence in rigorously constructed experiments.
The Role of Embodiment
A repeated theme in this text contrasting human understanding and LLM capabilities is embodiment. An LLM, even a complicated one just like the multimodal capabilities of GPT-4, lack direct physical and sensory interaction with the world. This inability to experience phenomena first-hand might create a major gap in its comprehension capabilities. See the paper “Intelligence Without Reason” (Brooks 1991) for a discussion of whether or not artificial intelligence must be embodied to know. I believe plenty of these arguments are flawed since it’s easy to think about a situation by which humans lose some embodied capability yet we’d still credit them with understanding.
An interesting query on Quora “Do blind people understand transparent, translucent, and reflective things?” had this response:
Basically, yes, nevertheless it’s not something we at all times think about. For instance, I do know people can see through windows because they’re transparent. The actual fact, nevertheless, can easily slip my mind because to me a window is just a piece of the wall made with different material. We will understand the concept, nevertheless it’s often something we forget to think about.
It’s an interesting thing to think about: blind people do understand that objects are transparent nevertheless it’s not something that’s at all times top of mind. So, can an LLM understand the identical thing without ever having really “seen” anything?
ChatGPT was able to answer my query “Show me how” with an in depth explanation of easy methods to placed on a helmet. Does that show any kind of understanding than my daughter physically showing easy methods to put a helmet on her head?
Ever since I first began occupied with artificial intelligence (my profession transition from UX to AI) I’ve been pondering the query: “what would it not take to make a machine that may think”. A giant a part of having the ability to think involves understanding. This can be a query that’s fascinated me for a while.
Determining what LLMs understand is as much about defining understanding because it is about testing it. When the text generation of an LLM is sufficiently coherent some might argue that the coherence necessitates understanding. Is dismissing this behaviour just an anthropocentric bias? Is granting understanding making the alternative anthropomorphic bias?
I contend that understanding doesn’t require embodiment or real world interaction. I argue that an important a part of understanding is an accurate internal model of the world. Within the Chinese room experiment the room is stuffed with (what I call) “recipes” for tactics to answer different pieces of Chinese writing with other pieces of Chinese writing. The one who made those recipes had a model of how those words correspond to the world. However the room itself has no such model. Now we have no tools for measuring world models so we might must assess the Chinese room’s understanding the identical way we do for an LLM – and we might hit similar barriers.
LLMs appear to have a model of easy methods to construct coherent sounding language. It’s possible that this model also represents the underlying concepts those words represent. A worthwhile area of research could be to research this through the eye graph that evolves during text generation. Within the meantime, we’ve got to research not directly by testing how models reply to rigorously crafted questions. These tests often involve adversarial questions which consistently show flaws in understanding. That these flaws are systematic suggests that the lack of know-how is itself systematic. Nonetheless, we’ve also seen that it’s possible to design adversarial tests for humans they usually don’t necessarily mean that humans lack understanding.
Very similar to we gauge the cognitive abilities of animals in a different way from humans, perhaps we want latest conceptual tools and frameworks to evaluate and appreciate what LLMs do know, without falling into biases of anthropomorphism or anthropocentrism. For my part LLMs have some limited understanding however the form it takes is different to our own. Where LLMs do show signs of understanding that understanding is overshadowed by a bias towards coherent text. I believe that given the fitting training objective it’s possible for our current LLM architectures to eventually learn understanding. But as long as the underlying training mechanism is “next token prediction” then any understanding is more likely to be marginal and simply corrupted.