ChatGPT Generated Food Industry Reviews: Realism Assessment

Artificial Intelligence

ChatGPT Generated Food Industry Reviews: Realism Assessment

admin

June 9, 2023

ChatGPT Generated Food Industry Reviews: Realism Assessment

Similarity Assessment

Next, I wanted to take a look at the similarities between each batch of the generated reviews and the unique reviews. To do that, we will use cosine similarity to calculate how similar the several sentence vectors from each source are. First, we will create a cosine similarity matrix that can first transform our sentences into vectors using TfidVectorizer() after which calculate the cosine similarity between the 2 latest sentence vectors.

def cosine_similarity(sentence1, sentence2):
"""
A function that accepts two sentences as input and outputs their cosine
similarityInputs:
sentence1 (str): A string of word
sentence2 (str): A string of words 
Returns:
cosine_sim: Cosine similarity rating for the 2 input sentences
"""
# Initialize the TfidfVectorizer
vectorizer = TfidfVectorizer()
# Create the TF-IDF matrix
tfidf_matrix = vectorizer.fit_transform([sentence1, sentence2])
# Calculate the cosine similarity
cosine_sim = cosine_similarity(tfidf_matrix[0], tfidf_matrix[1])
return cosine_sim[0][0]

One problem I had was the datasets were now so big that the calculations were taking too long (and sometimes I didn’t have enough RAM on Google Colab to proceed). To combat this issue, I randomly sampled 200 reviews from each of the datasets for calculating the similarity.

#Random Sample 200 Reviews
o_review = sample(reviews_dict['original review'],200)
p_review = sample(reviews_dict['fake positive review'],200)
n_review = sample(reviews_dict['fake negative review'],200)r_dict = {'original review': o_review,
'fake positive review': p_review,
'fake negative review':n_review}

Now that we’ve the randomly chosen samples, we will have a look at cosine similarities between different mixtures of the datasets.

#Cosine Similarity Calcualtion
source = ['original review','fake negative review','fake positive review']
source_to_compare = ['original review','fake negative review','fake positive review']
avg_cos_sim_per_word = {}
for s in source:
count = []
for s2 in source_to_compare:
if s != s2:
for sent in r_dict[s]:
for sent2 in r_dict[s2]:
similarity = calculate_cosine_similarity(sent, sent2)
count.append(similarity)
avg_cos_sim_per_word['{0} to {1}'.format(s,s2)] = np.mean(count)results = pd.DataFrame(avg_cos_sim_per_word,index=[0]).T

Cosine Similarity Results (Image from Writer)

For the unique dataset, the negative reviews were more similar. My hypothesis is that this is as a consequence of my using more prompts to create negative reviews than positive reviews. No surprise, the ChatGPT-generated reviews showed the very best signs of similarity between themselves.

Great, we’ve the cosine similarities, but is there one other step we will take to evaluate the similarities of the reviews? There’s! Let’s visualize the sentences as vectors. To do that, we must embed the sentences (turn them into vectors of numbers) after which we will visualize them in 2D space. I used Spacy to embed my vectors and visualize them.

# Load pre-trained GloVe model
nlp = spacy.load('en_core_web_lg')source_embeddings = {}
for source, source_sentences in reviews_dict.items():
source_embeddings[source] = []
for sentence in source_sentences:
# Tokenize the sentence using spaCy
doc = nlp(sentence)
# Retrieve word embeddings
word_embeddings = np.array([token.vector for token in doc])
# Save word embeddings for the source
source_embeddings[source].append(word_embeddings)
def legend_without_duplicate_labels(figure):
handles, labels = plt.gca().get_legend_handles_labels()
by_label = dict(zip(labels, handles))
figure.legend(by_label.values(), by_label.keys(), loc='lower right')
# Plot embeddings with colours based on source
fig, ax = plt.subplots()
colours = ['g', 'b', 'r']  # Colours for every source
i=0
for source, embeddings in source_embeddings.items():
for embedding in embeddings:
ax.scatter(embedding[:, 0], embedding[:, 1], c=colours[i], label=source)
i+=1
legend_without_duplicate_labels(plt)
plt.show()

The excellent news is we will clearly see the embeddings and distributions of the sentence vectors closely align. Visual inspection shows there may be more variability within the distribution of the original reviews, supporting the assertion they’re more diverse. Since ChatGPT generated positive and negative reviews, we’d suspect their distributions to be the identical. Notice, nevertheless, the fake negative reviews even have a wider distribution and more variance than positive reviews. Why might this be? Probably it’s due partly to the undeniable fact that I needed to trick ChatGPT to create the fake negative reviews (ChatGPT is designed to say positive statements) and I had to really provide more prompts to ChatGPT to get enough negative reviews vs. positive ones. This helps the dataset because, with the extra diversity within the dataset, we will train higher-performing machine learning models.

Sentence Vectors by Dataset (Image from Writer)

Next, we will inspect the differences within the three different distributions of reviews and see if there are any distinguishing patterns.

What will we see? Visually, we will see that the majority of the reviews for the dataset are centered across the origin and span from -10 to 10. It is a positive sign and supports the usage of fake reviews for training prediction models. The variances are somewhat the identical, nevertheless, the unique reviews had a wider variance of their distribution, each laterally and longitudinally, a proxy that there may be more diversity within the lexicon inside those reviews. The reviews from ChatGPT definitely had similar distributions, however the positive reviews had more outliers. As stated, these distinctions may very well be a results of the way in which I used to be prompting the system to generate reviews.

Similarity Assessment

LEAVE A REPLY Cancel reply