Home Artificial Intelligence How Do We Know if a Text Is AI-generated? 1. N-gram Evaluation: 2. Perplexity: 3. Burstiness: 4. Stylometry: 5. Consistency and Coherence Evaluation: Final Thoughts

How Do We Know if a Text Is AI-generated? 1. N-gram Evaluation: 2. Perplexity: 3. Burstiness: 4. Stylometry: 5. Consistency and Coherence Evaluation: Final Thoughts

0
How Do We Know if a Text Is AI-generated?
1. N-gram Evaluation:
2. Perplexity:
3. Burstiness:
4. Stylometry:
5. Consistency and Coherence Evaluation:
Final Thoughts

Different Statistical Approaches to Detecting AI-generated Text.

Photo by Andreas Fickl on Unsplash

Within the fascinating and rapidly advancing realm of artificial intelligence, some of the exciting advances has been the event of AI text generation. AI models, like GPT-3, Bloom, BERT, AlexaTM, and other large language models, can produce remarkably human-like text. That is each exciting and concerning at the identical time. Such technological advances allow us to be creative in ways we didn’t before. Still, additionally they open the door to deception. And the higher these models get, the tougher it would be to tell apart between a human-written text and an AI-generated text.

For the reason that release of ChatGPT, people everywhere in the globe have been testing the boundaries of such AI models and using them to each gain knowledge, but additionally, within the case of some students, to resolve homework and exams, which challenges the moral implications of such technology. Especially as these models have turn into sophisticated enough to mimic human writing styles and maintain context over multiple passages, they still must be fixed, even when their errors are minor.

That raises a very important query, a matter I get asked very often by my friends and members of the family (I got asked that query many over and over since ChatGPT was released…),

How can we all know if a text is human-written or AI-generated?

This query just isn’t latest to the research world; detecting AI-generated text, we call this “deep fake text detection.” Today, there are different tools you could use to detect if a text is human-written or AI-generated, corresponding to GPT-2 by OpenAI. But how do such tools work?

Different approaches are currently used to detect AI-generated text; latest techniques are being researched and implemented to detect such text because the models used to generate these texts get more advanced.

This text will explore 5 different statistical approaches that could be used to detect AI-generated text.

Let’s get right to it…

An N-gram is a sequence of N words or tokens from a given text sample. The “N” in N-gram is what number of words are within the N-gram. For instance:

  1. Recent York (2-gram).
  2. The Three Musketeers (3-gram).
  3. The group met recurrently (4-gram).

Analyzing the frequency of various N-grams in a text makes it possible to find out patterns. For instance, among the many three N-gram examples we just went through, the primary is probably the most common, and the third is the least common. By tracking the several N-grams, we are able to resolve that they’re kind of common in AI-generated text than in human-written text. As an example, an AI might use specific phrases or word mixtures more ceaselessly than a human author. We are able to find the relation between the frequency of N-grams utilized by AI vs. humans by training our model on data generated by humans and AI.

When you look up the word perplexed within the English dictionary, it would be defined as surprised or shocked, but, within the context of AI and NLP, specifically, perplexity measures how confidently a language model predicts a text. Estimating the perplexity of a model is finished by quantifying how long a model needs to answer a latest text, or in other words, how “surprised” the model is by the brand new text. For instance, an AI-generated text might lower the perplexity of a model; the higher the model predicts the text. Perplexity is fast to calculate, which supplies it a bonus over other approaches.

In NLP, Slava Katz defines burstiness because the phenomenon where certain words appear in “bursts” inside a document or a set of documents. The thought is that when a word is used once in a document, it’s more likely to be used again in the identical document. AI-generated texts exhibit different patterns of burstiness than that written by a human, as they don’t have the required cognitive processes to decide on other synonyms.

Stylometry is the study of linguistic style, and it could possibly be used to discover authors or, on this case, the source of a text (human vs. AI). Everyone uses language. In a different way some prefer short sentences, and a few prefer long, connected ones. People use semi-colons and em0dashes (And other unique punctuations) otherwise from one person to a different. Furthermore, some people use the passive voice greater than the energetic one or use more complex vocabulary. An AI-generated text might exhibit different stylistic features, even writing in regards to the same topic greater than once. And since an AI doesn’t have a method, these different styles could be used to detect if an AI writes a text.

Following up on Stylometry, since AI models don’t have their very own style, the text they generate sometimes needs more consistency and long-term coherence. For instance, AI might contradict itself or change topics and elegance abruptly in the midst of the text, resulting in a more difficult-to-follow flow of ideas.

LEAVE A REPLY

Please enter your comment!
Please enter your name here