AI models are using material from retracted scientific papers

“If [a tool is] facing most people, then using retraction as a type of quality indicator may be very vital,” says Yuanxi Fu, an information science researcher on the University of Illinois Urbana-Champaign. There’s “type of an agreement that retracted papers have been struck off the record of science,” she says, “and the people who find themselves outside of science—they needs to be warned that these are retracted papers.” OpenAI didn’t provide a response to a request for comment in regards to the paper results.

The issue will not be limited to ChatGPT. In June, tested AI tools specifically advertised for research work, reminiscent of Elicit, Ai2 ScholarQA (now a part of the Allen Institute for Artificial Intelligence’s Asta tool), Perplexity, and Consensus, using questions based on the 21 retracted papers in Gu’s study. Elicit referenced five of the retracted papers in its answers, while Ai2 ScholarQA referenced 17, Perplexity 11, and Consensus 18—all without noting the retractions.

Some corporations have since made moves to correct the difficulty. “Until recently, we didn’t have great retraction data in our search engine,” says Christian Salem, cofounder of Consensus. His company has now began using retraction data from a mix of sources, including publishers and data aggregators, independent web crawling, and Retraction Watch, which manually curates and maintains a database of retractions. In a test of the identical papers in August, Consensus cited only five retracted papers.

Elicit told that it removes retracted papers flagged by the scholarly research catalogue OpenAlex from its database and is “still working on aggregating sources of retractions.” Ai2 told us that its tool doesn’t routinely detect or remove retracted papers currently. Perplexity said that it “[does] not ever claim to be 100% accurate.”

Nevertheless, counting on retraction databases is probably not enough. Ivan Oransky, the cofounder of Retraction Watch, is careful not to explain it as a comprehensive database, saying that creating one would require more resources than anyone has: “The explanation it’s resource intensive is because someone has to do all of it by hand for those who want it to be accurate.”

Further complicating the matter is that publishers don’t share a uniform approach to retraction notices. “Where things are retracted, they might be marked as such in very other ways,” says Caitlin Bakker from University of Regina, Canada, an authority in research and discovery tools. “Correction,” “expression of concern,” “erratum,” and “retracted” are amongst some labels publishers may add to research papers—and these labels might be added for a lot of reasons, including concerns in regards to the content, methodology, and data or the presence of conflicts of interest.

AI models are using material from retracted scientific papers

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Accelerating PyTorch distributed fine-tuning with Intel technologies

an Interactive Tool for Datasets

Getting Began with Hugging Face Transformers for IPUs with Optimum

Introducing Snowball Fight ☃️, our first ML-Agents environment

Training CodeParrot 🦜 from Scratch

AI models are using material from retracted scientific papers

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.