Researchers surprised that with AI, toxicity is harder to fake than intelligence

-



The following time you encounter an unusually polite reply on social media, you would possibly want to ascertain twice. It might be an AI model trying (and failing) to mix in with the gang.

On Wednesday, researchers from the University of Zurich, University of Amsterdam, Duke University, and Recent York University released a study revealing that AI models remain easily distinguishable from humans in social media conversations, with overly friendly emotional tone serving as essentially the most persistent giveaway. The research, which tested nine open-weight models across Twitter/X, Bluesky, and Reddit, found that classifiers developed by the researchers detected AI-generated replies with 70 to 80 percent accuracy.

The study introduces what the authors call a “computational Turing test” to evaluate how closely AI models approximate human language. As a substitute of counting on subjective human judgment about whether text sounds authentic, the framework uses automated classifiers and linguistic evaluation to discover specific features that distinguish machine-generated from human-authored content.

“Even after calibration, LLM outputs remain clearly distinguishable from human text, particularly in affective tone and emotional expression,” the researchers wrote. The team, led by Nicolò Pagan on the University of Zurich, tested various optimization strategies, from easy prompting to fine-tuning, but found that deeper emotional cues persist as reliable tells that a selected text interaction online was authored by an AI chatbot somewhat than a human.

The toxicity tell

Within the study, researchers tested nine large language models: Llama 3.1 8B, Llama 3.1 8B Instruct, Llama 3.1 70B, Mistral 7B v0.1, Mistral 7B Instruct v0.2, Qwen 2.5 7B Instruct, Gemma 3 4B Instruct, DeepSeek-R1-Distill-Llama-8B, and Apertus-8B-2509.

When prompted to generate replies to real social media posts from actual users, the AI models struggled to match the extent of casual negativity and spontaneous emotional expression common in human social media posts, with toxicity scores consistently lower than authentic human replies across all three platforms.

To counter this deficiency, the researchers attempted optimization strategies (including providing writing examples and context retrieval) that reduced structural differences like sentence length or word count, but variations in emotional tone continued. “Our comprehensive calibration tests challenge the belief that more sophisticated optimization necessarily yields more human-like output,” the researchers concluded.



Source link

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x