Home Artificial Intelligence Multi-AI collaboration helps reasoning and factual accuracy in large language models

Multi-AI collaboration helps reasoning and factual accuracy in large language models

1
Multi-AI collaboration helps reasoning and factual accuracy in large language models

An age-old adage, often introduced to us during our childhood, is designed to nudge us beyond our self-centered, nascent minds: “Two heads are higher than one.” This proverb encourages collaborative considering and highlights the potency of shared intellect.

Fast forward to 2023, and we discover that this wisdom holds true even within the realm of artificial intelligence: Multiple language models, working in harmony, are higher than one. 

Recently, a team from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) embodied this ancient wisdom throughout the frontier of contemporary technology. They introduced a method that leverages multiple AI systems to debate and argue with one another to converge on a best-possible answer to a given query. This method empowers these expansive language models to heighten their adherence to factual data and refine their decision-making. 

The crux of the issue with large language models (LLMs) lies within the inconsistency of their generated responses, resulting in potential inaccuracies and flawed reasoning. This recent approach lets each agent actively assess every other agent’s responses, and uses this collective feedback to refine its own answer. In technical terms, the method consists of multiple rounds of response generation and critique. Each language model generates a solution to the given query, after which incorporates the feedback from all other agents to update its own response. This iterative cycle culminates in a final output from a majority vote across the models’ solutions. It somewhat mirrors the dynamics of a gaggle discussion — where individuals contribute to succeed in a unified and well-reasoned conclusion.

One real strength of the approach lies in its seamless application to existing black-box models. Because the methodology revolves around generating text, it will possibly even be implemented across various LLMs with no need access to their internal workings. This simplicity, the team says, could help researchers and developers use the tool to enhance the consistency and factual accuracy of language model outputs across the board.

“Employing a novel approach, we don’t simply depend on a single AI model for answers. As an alternative, our process enlists a large number of AI models, each bringing unique insights to tackle an issue. Although their initial responses could seem truncated or may contain errors, these models can sharpen and improve their very own answers by scrutinizing the responses offered by their counterparts,” says Yilun Du, an MIT PhD student in electrical engineering and computer science, affiliate of MIT CSAIL, and lead creator on a recent paper in regards to the work. “As these AI models engage in discourse and deliberation, they’re higher equipped to acknowledge and rectify issues, enhance their problem-solving abilities, and higher confirm the precision of their responses. Essentially, we’re cultivating an environment that compels them to delve deeper into the crux of an issue. This stands in contrast to a single, solitary AI model, which frequently parrots content found on the web. Our method, nevertheless, actively stimulates the AI models to craft more accurate and comprehensive solutions.”

The research checked out mathematical problem-solving, including grade school and middle/highschool math problems, and saw a big boost in performance through the multi-agent debate process. Moreover, the language models showed off enhanced abilities to generate accurate arithmetic evaluations, illustrating potential across different domains.

The tactic also can help address the problem of “hallucinations” that always plague language models. By designing an environment where agents critique one another’s responses, they were more incentivized to avoid spitting out random information and prioritize factual accuracy. 

Beyond its application to language models, the approach may be used for integrating diverse models with specialized capabilities. By establishing a decentralized system where multiple agents interact and debate, they might potentially use these comprehensive and efficient problem-solving abilities across various modalities like speech, video, or text. 

While the methodology yielded encouraging results, the researchers say that existing language models may face challenges with processing very long contexts, and the critique abilities is probably not as refined as desired. Moreover,the  multi-agent debate format, inspired by human group interaction, has yet to include the more complex forms of dialogue that contribute to intelligent collective decision-making — an important area for future exploration, the team says. Advancing the technique could involve a deeper understanding of the computational foundations behind human debates and discussions, and using those models to reinforce or complement existing LLMs. 

“Not only does this approach offer a pathway to raise the performance of existing language models, but it surely also presents an automatic technique of self-improvement. By utilizing the controversy process as supervised data, language models can enhance their factuality and reasoning autonomously, reducing reliance on human feedback and offering a scalable approach to self-improvement,” says Du. “As researchers proceed to refine and explore this approach, we are able to catch up with to a future where language models not only mimic human-like language but in addition exhibit more systematic and reliable considering, forging a recent era of language understanding and application.”

“It makes a lot sense to make use of a deliberative process to enhance the model’s overall output, and it’s a giant step forward from chain-of-thought prompting,” says Anca Dragan, associate professor on the University of California at Berkeley’s Department of Electrical Engineering and Computer Sciences, who was not involved within the work. “I’m enthusiastic about where this will go next. Can people higher judge the answers coming out of LLMs after they see the deliberation, whether or not it converges? Can people arrive at higher answers by themselves deliberating with an LLM? Can the same idea be used to assist a user probe a LLM’s answer so as to arrive at a greater one?”

Du wrote the paper with three CSAIL affiliates: Shuang Li SM ’20, PhD ’23; MIT professor of electrical engineering and computer science Antonio Torralba; and MIT professor of computational cognitive science and Center for Brains, Minds, and Machines member Joshua Tenenbaum. Google DeepMind researcher Igor Mordatch was also a co-author.

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here