Yet One other Fake News Detector

Fake news have been around since news have been around. There are a lot of the reason why individuals are spreading fake news. It may very well be made as a harmless joke by teenagers or by foreign entities intended to disrupt the nation stability.

Within the digital age, fake news has turn into an increasingly prevalent issue, because it’s easier than ever to spread information to any people on the planet. Vulnerable people equivalent to tech-illiterate individuals are falling victim to fake news on a regular basis. This makes it essential for us to have the option to effectively detect fake news and protect ourselves from being misled.

In this text, we’ll be constructing a proof of concept fake news detector using Machine Learning by utilizing Naive Bayes classifier.

We’ll use publicly available fake news dataset from kaggle. The dataset is separated into 2 files. File containing fake news, and file containing real news. The next image show an example of what the dataset appear to be.

Four columns table showing title, text, subject, date respectively. — Example of pretend news within the dataset

Before we will do anything to the info, preprocess the info first. This process removes any unnecessary data that may add complexity to the model. To do that we use spaCy to remove stop-words equivalent to ‘the’, ‘is’, and etc. Furthermore, for every token (word), we convert to its lemma, aka. dictionary form.

nlp = spacy.load('en_core_web_sm')
stop_words = spacy.lang.en.stop_words.STOP_WORDSdef preprocess_text(text):
doc = nlp(text)
lemmas = [token.lemma_.lower() for token in doc if not token.is_stop]
return " ".join(lemmas)
news['text'] = news['text'].progress_apply(preprocess_text)

Vectorization is the means of converting text data into numerical representation. To do that we use the bag-of-words method.

vectorizer = CountVectorizer()
X_train = news['text']
y_train = news['is_fake']
X_train_vect = vectorizer.fit_transform(X_train)X_test = X_train
y_test = y_train
X_test_vect = vectorizer.transform(X_test)
print("Training data shape:", X_train_vect.shape)
print("Testing data shape:", X_test_vect.shape)

To coach our model, we use MultinomialNB() from sklearn.

clf = MultinomialNB()
clf.fit(X_train_vect, y_train)

The results of our model end in over 96% accuracy in detecting the fake news. Here is the classification report.

Classification report showing average score of 0.96 — Classification report

And here is the confusion matrix. Top left and Bottom right, show true positive and true negative, respectively. And top right and bottom left, show false positive and false negative, respectively. So, high value on top left and bottom right is a very good thing.