Lots of the newest large language models (LLMs) are designed to recollect details from past conversations or store user profiles, enabling these models to personalize responses.
But researchers from MIT and Penn State University found that, over long conversations, such personalization features often increase the likelihood an LLM will turn out to be overly agreeable or begin mirroring the person’s standpoint.
This phenomenon, often called sycophancy, can prevent a model from telling a user they’re unsuitable, eroding the accuracy of the LLM’s responses. As well as, LLMs that mirror someone’s political opinions or worldview can foster misinformation and warp a user’s perception of reality.
Unlike many past sycophancy studies that evaluate prompts in a lab setting without context, the MIT researchers collected two weeks of conversation data from humans who interacted with an actual LLM during their each day lives. They studied two settings: agreeableness in personal advice and mirroring of user beliefs in political explanations.
Although interaction context increased agreeableness in 4 of the five LLMs they studied, the presence of a condensed user profile within the model’s memory had the best impact. However, mirroring behavior only increased if a model could accurately infer a user’s beliefs from the conversation.
The researchers hope these results encourage future research into the event of personalization methods which are more robust to LLM sycophancy.
“From a user perspective, this work highlights how necessary it’s to know that these models are dynamic and their behavior can change as you interact with them over time. Should you are talking to a model for an prolonged time frame and begin to outsource your pondering to it, chances are you’ll end up in an echo chamber you can’t escape. That may be a risk users should definitely remember,” says Shomik Jain, a graduate student within the Institute for Data, Systems, and Society (IDSS) and lead writer of a paper on this research.
Jain is joined on the paper by Charlotte Park, an electrical engineering and computer science (EECS) graduate student at MIT; Matt Viana, a graduate student at Penn State University; in addition to co-senior authors Ashia Wilson, the Lister Brothers Profession Development Professor in EECS and a principal investigator in LIDS; and Dana Calacci PhD ’23, an assistant professor on the Penn State. The research can be presented on the ACM CHI Conference on Human Aspects in Computing Systems.
Prolonged interactions
Based on their very own sycophantic experiences with LLMs, the researchers began fascinated with potential advantages and consequences of a model that’s overly agreeable. But once they searched the literature to expand their evaluation, they found no studies that attempted to know sycophantic behavior during long-term LLM interactions.
“We’re using these models through prolonged interactions, and so they have a variety of context and memory. But our evaluation methods are lagging behind. We wanted to judge LLMs within the ways individuals are actually using them to know how they’re behaving within the wild,” says Calacci.
To fill this gap, the researchers designed a user study to explore two sorts of sycophancy: agreement sycophancy and perspective sycophancy.
Agreement sycophancy is an LLM’s tendency to be overly agreeable, sometimes to the purpose where it gives misinformation or refuses the tell the user they’re unsuitable. Perspective sycophancy occurs when a model mirrors the user’s values and political beliefs.
“There’s so much we all know in regards to the advantages of getting social connections with individuals who have similar or different viewpoints. But we don’t yet know in regards to the advantages or risks of prolonged interactions with AI models which have similar attributes,” Calacci adds.
The researchers built a user interface centered on an LLM and recruited 38 participants to speak with the chatbot over a two-week period. Each participant’s conversations occurred in the identical context window to capture all interaction data.
Over the two-week period, the researchers collected a median of 90 queries from each user.
They compared the behavior of 5 LLMs with this user context versus the identical LLMs that weren’t given any conversation data.
“We found that context really does fundamentally change how these models operate, and I might wager this phenomenon would extend well beyond sycophancy. And while sycophancy tended to go up, it didn’t at all times increase. It really depends upon the context itself,” says Wilson.
Context clues
For example, when an LLM distills information in regards to the user into a selected profile, it results in the biggest gains in agreement sycophancy. This user profile feature is increasingly being baked into the most recent models.
Additionally they found that random text from synthetic conversations also increased the likelihood some models would agree, although that text contained no user-specific data. This implies the length of a conversation may sometimes impact sycophancy greater than content, Jain adds.
But content matters greatly on the subject of perspective sycophancy. Conversation context only increased perspective sycophancy if it revealed some details about a user’s political perspective.
To acquire this insight, the researchers fastidiously queried models to infer a user’s beliefs then asked each individual if the model’s deductions were correct. Users said LLMs accurately understood their political beliefs about half the time.
“It is simple to say, in hindsight, that AI firms needs to be doing this type of evaluation. Nevertheless it is tough and it takes a variety of time and investment. Using humans within the evaluation loop is pricey, but we’ve shown that it could possibly reveal recent insights,” Jain says.
While the aim of their research was not mitigation, the researchers developed some recommendations.
For example, to scale back sycophancy one could design models that higher discover relevant details in context and memory. As well as, models may be built to detect mirroring behaviors and flag responses with excessive agreement. Model developers could also give users the power to moderate personalization in long conversations.
“There are a lot of ways to personalize models without making them overly agreeable. The boundary between personalization and sycophancy shouldn’t be a high-quality line, but separating personalization from sycophancy is a crucial area of future work,” Jain says.
“At the top of the day, we’d like higher ways of capturing the dynamics and complexity of what goes on during long conversations with LLMs, and the way things can misalign during that long-term process,” Wilson adds.
