3 Questions: Jacob Andreas on large language models

Artificial Intelligence

3 Questions: Jacob Andreas on large language models

admin

May 12, 2023

3 Questions: Jacob Andreas on large language models

Q: Language is a wealthy ecosystem ripe with subtle nuances that humans use to speak with each other — sarcasm, irony, and other types of figurative language. There’s quite a few ways to convey meaning beyond the literal. Is it possible for big language models to understand the intricacies of context? What does it mean for a model to realize “in-context learning”? Furthermore, how do multilingual transformers process variations and dialects of various languages beyond English?

A: After we take into consideration linguistic contexts, these models are able to reasoning about much, for much longer documents and chunks of text more broadly than really anything that we have known find out how to construct before. But that is just one sort of context. With humans, language production and comprehension takes place in a grounded context. For instance, I do know that I’m sitting at this table. There are objects that I can discuss with, and the language models we have now at once typically can’t see any of that when interacting with a human user.

There is a broader social context that informs a whole lot of our language use which these models are, not less than not immediately, sensitive to or aware of. It isn’t clear find out how to give them information in regards to the social context wherein their language generation and language modeling takes place. One other vital thing is temporal context. We’re shooting this video at a specific moment in time when particular facts are true. The models that we have now at once were trained on, again, a snapshot of the web that stopped at a specific time — for many models that we have now now, probably a few years ago — they usually don’t find out about anything that is happened since then. They do not even know at what moment in time they’re doing text generation. Determining find out how to provide all of those different sorts of contexts can be an interesting query.

Possibly one of the crucial surprising components here is that this phenomenon called in-context learning. If I take a small ML [machine learning] dataset and feed it to the model, like a movie review and the star rating assigned to the movie by the critic, you give just a few examples of this stuff, language models generate the flexibility each to generate plausible sounding movie reviews but in addition to predict the star rankings. More generally, if I even have a machine learning problem, I even have my inputs and my outputs. As you give an input to the model, you give it yet another input and ask it to predict the output, the models can often do that very well.

That is a brilliant interesting, fundamentally different way of doing machine learning, where I even have this one big general-purpose model into which I can insert numerous little machine learning datasets, and yet without having to coach a latest model in any respect, classifier or a generator or whatever specialized to my particular task. This is definitely something we have been pondering rather a lot about in my group, and in some collaborations with colleagues at Google — trying to know exactly how this in-context learning phenomenon actually comes about.

Q: We prefer to imagine humans are (not less than somewhat) in pursuit of what’s objectively and morally known to be true. Large language models, perhaps with under-defined or yet-to-be-understood “moral compasses,” aren’t beholden to the reality. Why do large language models are inclined to hallucinate facts, or confidently assert inaccuracies? Does that limit the usefulness for applications where factual accuracy is critical? Is there a number one theory on how we’ll solve this?

A: It’s well-documented that these models hallucinate facts, that they don’t seem to be all the time reliable. Recently, I asked ChatGPT to explain a few of our group’s research. It named five papers, 4 of which aren’t papers that really exist, and certainly one of which is an actual paper that was written by a colleague of mine who lives in the UK, whom I’ve never co-authored with. Factuality continues to be a giant problem. Even beyond that, things involving reasoning in a extremely general sense, things involving complicated computations, complicated inferences, still appear to be really difficult for these models. There may be even fundamental limitations of this transformer architecture, and I feel rather a lot more modeling work is required to make things higher.

Why it happens continues to be partly an open query, but possibly, just architecturally, there are reasons that it’s hard for these models to construct coherent models of the world. They’ll do this a bit bit. You possibly can query them with factual questions, trivia questions, they usually get them right more often than not, possibly much more often than your average human user off the road. But unlike your average human user, it’s really unclear whether there’s anything that lives inside this language model that corresponds to a belief in regards to the state of the world. I believe that is each for architectural reasons, that transformers don’t, obviously, have anywhere to place that belief, and training data, that these models are trained on the web, which was authored by a bunch of various people at different moments who imagine various things in regards to the state of the world. Due to this fact, it’s difficult to expect models to represent those things coherently.

All that being said, I do not think this can be a fundamental limitation of neural language models or much more general language models basically, but something that is true about today’s language models. We’re already seeing that models are approaching having the ability to construct representations of facts, representations of the state of the world, and I believe there’s room to enhance further.

Q: The pace of progress from GPT-2 to GPT-3 to GPT-4 has been dizzying. What does the pace of the trajectory appear like from here? Will it’s exponential, or an S-curve that may diminish in progress within the near term? If that’s the case, are there limiting aspects when it comes to scale, compute, data, or architecture?

A: Definitely within the short term, the thing that I’m most scared about has to do with these truthfulness and coherence issues that I used to be mentioning before, that even one of the best models that we have now today do generate incorrect facts. They generate code with bugs, and since of the way in which these models work, they accomplish that in a way that is particularly difficult for humans to identify since the model output has all the fitting surface statistics. After we take into consideration code, it’s still an open query whether it’s actually less work for any person to jot down a function by hand or to ask a language model to generate that function after which have the person undergo and confirm that the implementation of that function was actually correct.

There’s a bit danger in rushing to deploy these tools instantly, and that we’ll wind up in a world where every little thing’s a bit bit worse, but where it’s actually very difficult for people to really reliably check the outputs of those models. That being said, these are problems that could be overcome. The pace that things are moving at especially, there’s a whole lot of room to handle these problems with factuality and coherence and correctness of generated code in the long run. These really are tools, tools that we are able to use to free ourselves up as a society from a whole lot of unpleasant tasks, chores, or drudge work that has been difficult to automate — and that’s something to be enthusiastic about.

LEAVE A REPLY Cancel reply