Latest AI classifier for indicating AI-written text

We’re launching a classifier trained to differentiate between AI-written and human-written text.

We’ve trained a classifier to differentiate between text written by a human and text written by AIs from a wide range of providers. While it’s inconceivable to reliably detect all AI-written text, we consider good classifiers can inform mitigations for false claims that AI-generated text was written by a human: for instance, running automated misinformation campaigns, using AI tools for tutorial dishonesty, and positioning an AI chatbot as a human.

Our classifier just isn’t fully reliable. In our evaluations on a “challenge set” of English texts, our classifier accurately identifies 26% of AI-written text (true positives) as “likely AI-written,” while incorrectly labeling human-written text as AI-written 9% of the time (false positives). Our classifier’s reliability typically improves because the length of the input text increases. In comparison with our previously released classifier, this latest classifier is significantly more reliable on text from newer AI systems.

We’re making this classifier publicly available to get feedback on whether imperfect tools like this one are useful. Our work on the detection of AI-generated text will proceed, and we hope to share improved methods in the long run.

Try our free work-in-progress classifier yourself:

Limitations

Our classifier has quite a lot of vital limitations. It shouldn’t be used as a primary decision-making tool, but as an alternative as a complement to other methods of determining the source of a bit of text.

The classifier could be very unreliable on short texts (below 1,000 characters). Even longer texts are sometimes incorrectly labeled by the classifier.
Sometimes human-written text shall be incorrectly but confidently labeled as AI-written by our classifier.
We recommend using the classifier just for English text. It performs significantly worse in other languages and it’s unreliable on code.
Text that could be very predictable can’t be reliably identified. For instance, it’s inconceivable to predict whether an inventory of the primary 1,000 prime numbers was written by AI or humans, because the right answer is all the time the identical.
AI-written text will be edited to evade the classifier. Classifiers like ours will be updated and retrained based on successful attacks, but it surely is unclear whether detection has a bonus within the long-term.
Classifiers based on neural networks are known to be poorly calibrated outside of their training data. For inputs which are very different from text in our training set, the classifier is typically extremely confident in a fallacious prediction.

Training the classifier

Our classifier is a language model fine-tuned on a dataset of pairs of human-written text and AI-written text on the identical topic. We collected this dataset from a wide range of sources that we consider to be written by humans, similar to the pretraining data and human demonstrations on prompts submitted to InstructGPT. We divided each text right into a prompt and a response. On these prompts we generated responses from a wide range of different language models trained by us and other organizations. For our web app, we adjust the arrogance threshold to maintain the false positive rate low; in other words, we only mark text as likely AI-written if the classifier could be very confident.

Impact on educators and call for input

We recognize that identifying AI-written text has been a vital point of dialogue amongst educators, and equally vital is recognizing the boundaries and impacts of AI generated text classifiers within the classroom. Now we have developed a preliminary resource on the usage of ChatGPT for educators, which outlines among the uses and associated limitations and considerations. While this resource is targeted on educators, we expect our classifier and associated classifier tools to have an effect on journalists, mis/dis-information researchers, and other groups.

We’re engaging with educators within the US to learn what they’re seeing of their classrooms and to debate ChatGPT’s capabilities and limitations, and we are going to proceed to broaden our outreach as we learn. These are vital conversations to have as a part of our mission is to deploy large language models safely, in direct contact with affected communities.

In case you’re directly impacted by these issues (including but not limited to teachers, administrators, parents, students, and education service providers), please provide us with feedback using this type. Direct feedback on the preliminary resource is useful, and we also welcome any resources that educators are developing or have found helpful (e.g., course guidelines, honor code and policy updates, interactive tools, AI literacy programs).

Latest AI classifier for indicating AI-written text

Limitations

Training the classifier

Impact on educators and call for input

What are your thoughts on this topic?
Let us know in the comments below.

166 COMMENTS

Share this article

Recent posts

AI in Finance and Its Impact on Worker Retention

AI’s Growing Power Needs: Tech Industry’s Move Towards Nuclear Power

“Human Intelligence Created”… Human Intelligence Challenge Spreads Against ‘Made by AI’

What We Still Don’t Understand About Machine Learning

OpenAI Unveils SearchGPT: A Recent AI-Powered Search Engine

Latest AI classifier for indicating AI-written text

Limitations

Training the classifier

Impact on educators and call for input

What are your thoughts on this topic? Let us know in the comments below.

166 COMMENTS

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.