Gaussian Naive Bayes, Explained: A Visual Guide with Code Examples for Beginners

CLASSIFICATION ALGORITHM

Bell-shaped assumptions for higher predictions

⛳️ More CLASSIFICATION ALGORITHM, explained: · Dummy Classifier · K Nearest Neighbor Classifier · Bernoulli Naive Bayes ▶ Gaussian Naive Bayes · Decision Tree Classifier · Logistic Regression · Support Vector Classifier · Multilayer Perceptron (soon!)

Constructing on our previous article about Bernoulli Naive Bayes, which handles binary data, we now explore Gaussian Naive Bayes for continuous data. Unlike the binary approach, this algorithm assumes each feature follows a traditional (Gaussian) distribution.

Here, we’ll see how Gaussian Naive Bayes handles continuous, bell-shaped data — ringing in accurate predictions — all without stepping into the intricate math of Bayes’ Theorem.

All visuals: Creator-created using Canva Pro. Optimized for mobile; may appear oversized on desktop.

Like other Naive Bayes variants, Gaussian Naive Bayes makes the “naive” assumption of feature independence. It assumes that the features are conditionally independent given the category label.

Nevertheless, while Bernoulli Naive Bayes is suited to datasets with binary features, Gaussian Naive Bayes assumes that the features follow a continuous normal (Gaussian) distribution. Although this assumption may not all the time hold true in point of fact, it simplifies the calculations and sometimes results in surprisingly accurate results.

Bernoulli NB assumes binary data, Multinomial NB works with discrete counts, and Gaussian NB handles continuous data assuming a traditional distribution.

Throughout this text, we’ll use this artificial golf dataset (made by writer) for instance. This dataset predicts whether an individual will play golf based on weather conditions.

Columns: ‘RainfallAmount’ (in mm), ‘Temperature’ (in Celcius), ‘Humidity’ (in %), ‘WindSpeed’ (in km/h) and ‘Play’ (Yes/No, goal feature)

# IMPORTING DATASET #
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import pandas as pd
import numpy as npdataset_dict = {
'Rainfall': [0.0, 2.0, 7.0, 18.0, 3.0, 3.0, 0.0, 1.0, 0.0, 25.0, 0.0, 18.0, 9.0, 5.0, 0.0, 1.0, 7.0, 0.0, 0.0, 7.0, 5.0, 3.0, 0.0, 2.0, 0.0, 8.0, 4.0, 4.0],
'Temperature': [29.4, 26.7, 28.3, 21.1, 20.0, 18.3, 17.8, 22.2, 20.6, 23.9, 23.9, 22.2, 27.2, 21.7, 27.2, 23.3, 24.4, 25.6, 27.8, 19.4, 29.4, 22.8, 31.1, 25.0, 26.1, 26.7, 18.9, 28.9],
'Humidity': [85.0, 90.0, 78.0, 96.0, 80.0, 70.0, 65.0, 95.0, 70.0, 80.0, 70.0, 90.0, 75.0, 80.0, 88.0, 92.0, 85.0, 75.0, 92.0, 90.0, 85.0, 88.0, 65.0, 70.0, 60.0, 95.0, 70.0, 78.0],
'WindSpeed': [2.1, 21.2, 1.5, 3.3, 2.0, 17.4, 14.9, 6.9, 2.7, 1.6, 30.3, 10.9, 3.0, 7.5, 10.3, 3.0, 3.9, 21.9, 2.6, 17.3, 9.6, 1.9, 16.0, 4.6, 3.2, 8.3, 3.2, 2.2],
'Play': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes', 'No', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes']
}
df = pd.DataFrame(dataset_dict)
# Set feature matrix X and goal vector y
X, y = df.drop(columns='Play'), df['Play']
# Split the information into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.5, shuffle=False)
print(pd.concat([X_train, y_train], axis=1), end='nn')
print(pd.concat([X_test, y_test], axis=1))

Gaussian Naive Bayes works with continuous data, assuming each feature follows a Gaussian (normal) distribution.

Calculate the probability of every class within the training data.
For every feature and sophistication, estimate the mean and variance of the feature values inside that class.
For a brand new instance:
a. For every class, calculate the probability density function (PDF) of every feature value under the Gaussian distribution of that feature throughout the class.
b. Multiply the category probability by the product of the PDF values for all features.
Predict the category with the best resulting probability.

Gaussian Naive Bayes uses the traditional distribution to model the likelihood of various feature values for every class. It then combines these likelihoods to make a prediction.

Transforming non-Gaussian distributed data

Do not forget that this algorithm naively assume that every one the input features are having Gaussian/normal distribution?

Since we should not really sure concerning the distribution of our data, especially for features that clearly don’t follow a Gaussian distribution, applying an influence transformation (like Box-Cox) before using Gaussian Naive Bayes could be helpful. This approach may help make the information more Gaussian-like, which aligns higher with the assumptions of the algorithm.

All columns are scaled using Power Transformation (Box-Cox Transformation) after which standardized.

from sklearn.preprocessing import PowerTransformer# Initialize and fit the PowerTransformer
pt = PowerTransformer(standardize=True) # Standard Scaling already included
X_train_transformed = pt.fit_transform(X_train)
X_test_transformed = pt.transform(X_test)

Now we’re ready for the training.

1. Class Probability Calculation: For every class, calculate its probability: (Variety of instances on this class) / (Total variety of instances)

from fractions import Fractiondef calc_target_prob(attr):
total_counts = attr.value_counts().sum()
prob_series = attr.value_counts().apply(lambda x: Fraction(x, total_counts).limit_denominator())
return prob_series
print(calc_target_prob(y_train))

2. Feature Probability Calculation : For every feature and every class, calculate the mean (μ) and standard deviation (σ) of the feature values inside that class using the training data. Then, calculate the probability using Gaussian Probability Density Function (PDF) formula.

For every weather condition, determine the mean and standard deviation for each “YES” and “NO” instances. Then calculate their PDF using the PDF formula for normal/Gaussian distribution.

The identical process is applied to all the other features.

def calculate_class_probabilities(X_train_transformed, y_train, feature_names):
classes = y_train.unique()
equations = pd.DataFrame(index=classes, columns=feature_names)for cls in classes:
X_class = X_train_transformed[y_train == cls]
mean = X_class.mean(axis=0)
std = X_class.std(axis=0)
k1 = 1 / (std * np.sqrt(2 * np.pi))
k2 = 2 * (std ** 2)
for i, column in enumerate(feature_names):
equation = f"{k1[i]:.3f}·exp(-(x-({mean[i]:.2f}))²/{k2[i]:.3f})"
equations.loc[cls, column] = equation
return equations
# Use the function with the transformed training data
equation_table = calculate_class_probabilities(X_train_transformed, y_train, X.columns)
# Display the equation table
print(equation_table)

3. Smoothing: Gaussian Naive Bayes uses a novel smoothing approach. Unlike Laplace smoothing in other variants, it adds a tiny value (0.000000001 times the most important variance) to all variances. This prevents numerical instability from division by zero or very small numbers.

Given a brand new instance with continuous features:

1. Probability Collection:
For every possible class:
· Start with the probability of this class occurring (class probability).
· For every feature in the brand new instance, calculate the probability density function of that feature throughout the class.

For ID 14, we calculate the PDF each of the feature for each “YES” and “NO” instances.

2. Rating Calculation & Prediction:
For every class:
· Multiply all of the collected PDF values together.
· The result’s the rating for this class.
· The category with the best rating is the prediction.

from scipy.stats import normdef calculate_class_probability_products(X_train_transformed, y_train, X_new, feature_names, target_name):
classes = y_train.unique()
n_features = X_train_transformed.shape[1]
# Create column names using actual feature names
column_names = [target_name] + list(feature_names) + ['Product']
probability_products = pd.DataFrame(index=classes, columns=column_names)
for cls in classes:
X_class = X_train_transformed[y_train == cls]
mean = X_class.mean(axis=0)
std = X_class.std(axis=0)
prior_prob = np.mean(y_train == cls)
probability_products.loc[cls, target_name] = prior_prob
feature_probs = []
for i, feature in enumerate(feature_names):
prob = norm.pdf(X_new[0, i], mean[i], std[i])
probability_products.loc[cls, feature] = prob
feature_probs.append(prob)
product = prior_prob * np.prod(feature_probs)
probability_products.loc[cls, 'Product'] = product
return probability_products
# Assuming X_new is your latest sample reshaped to (1, n_features)
X_new = np.array([-1.28, 1.115, 0.84, 0.68]).reshape(1, -1)
# Calculate probability products
prob_products = calculate_class_probability_products(X_train_transformed, y_train, X_new, X.columns, y.name)
# Display the probability product table
print(prob_products)

For this particular dataset, this accuracy is taken into account quite good.

from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score# Initialize and train the Gaussian Naive Bayes model
gnb = GaussianNB()
gnb.fit(X_train_transformed, y_train)
# Make predictions on the test set
y_pred = gnb.predict(X_test_transformed)
# Calculate the accuracy
accuracy = accuracy_score(y_test, y_pred)
# Print the accuracy
print(f"Accuracy: {accuracy:.4f}")

GaussianNB is understood for its simplicity and effectiveness. The fundamental thing to recollect about its parameters is:

priors: That is essentially the most notable parameter, just like Bernoulli Naive Bayes. Normally, you don’t must set it manually. By default, it’s calculated out of your training data, which frequently works well.
var_smoothing: It is a stability parameter that you just rarely need to regulate. (the default is 0.000000001)

The important thing takeaway is that this algoritm is designed to work well out-of-the-box. In most situations, you should use it without worrying about parameter tuning.

Pros:

Simplicity: Maintains the easy-to-implement and understand trait.
Efficiency: Stays swift in training and prediction, making it suitable for large-scale applications with continuous features.
Flexibility with Data: Handles each small and huge datasets well, adapting to the size of the issue at hand.
Continuous Feature Handling: Thrives with continuous and real-valued features, making it ideal for tasks like predicting real-valued outputs or working with data where features vary on a continuum.

Cons:

Independence Assumption: Still assumes that features are conditionally independent given the category, which could not hold in all real-world scenarios.
Gaussian Distribution Assumption: Works best when feature values truly follow a traditional distribution. Non-normal distributions may result in suboptimal performance (but could be fixed with Power Transformation we’ve discussed)
Sensitivity to Outliers: Could be significantly affected by outliers within the training data, as they skew the mean and variance calculations.

Gaussian Naive Bayes stands as an efficient classifier for a big selection of applications involving continuous data. Its ability to handle real-valued features extends its use beyond binary classification tasks, making it a go-to alternative for various applications.

While it makes some assumptions about data (feature independence and normal distribution), when these conditions are met, it gives robust performance, making it a favourite amongst each beginners and seasoned data scientists for its balance of simplicity and power.

import pandas as pd
from sklearn.naive_bayes import GaussianNB
from sklearn.preprocessing import PowerTransformer
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split# Load the dataset
dataset_dict = {
'Rainfall': [0.0, 2.0, 7.0, 18.0, 3.0, 3.0, 0.0, 1.0, 0.0, 25.0, 0.0, 18.0, 9.0, 5.0, 0.0, 1.0, 7.0, 0.0, 0.0, 7.0, 5.0, 3.0, 0.0, 2.0, 0.0, 8.0, 4.0, 4.0],
'Temperature': [29.4, 26.7, 28.3, 21.1, 20.0, 18.3, 17.8, 22.2, 20.6, 23.9, 23.9, 22.2, 27.2, 21.7, 27.2, 23.3, 24.4, 25.6, 27.8, 19.4, 29.4, 22.8, 31.1, 25.0, 26.1, 26.7, 18.9, 28.9],
'Humidity': [85.0, 90.0, 78.0, 96.0, 80.0, 70.0, 65.0, 95.0, 70.0, 80.0, 70.0, 90.0, 75.0, 80.0, 88.0, 92.0, 85.0, 75.0, 92.0, 90.0, 85.0, 88.0, 65.0, 70.0, 60.0, 95.0, 70.0, 78.0],
'WindSpeed': [2.1, 21.2, 1.5, 3.3, 2.0, 17.4, 14.9, 6.9, 2.7, 1.6, 30.3, 10.9, 3.0, 7.5, 10.3, 3.0, 3.9, 21.9, 2.6, 17.3, 9.6, 1.9, 16.0, 4.6, 3.2, 8.3, 3.2, 2.2],
'Play': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes', 'No', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes']
}
df = pd.DataFrame(dataset_dict)
# Prepare data for model
X, y = df.drop('Play', axis=1), (df['Play'] == 'Yes').astype(int)
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, shuffle=False)
# Apply PowerTransformer
pt = PowerTransformer(standardize=True)
X_train_transformed = pt.fit_transform(X_train)
X_test_transformed = pt.transform(X_test)
# Train the model
nb_clf = GaussianNB()
nb_clf.fit(X_train_transformed, y_train)
# Make predictions
y_pred = nb_clf.predict(X_test_transformed)
# Check accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")

Gaussian Naive Bayes, Explained: A Visual Guide with Code Examples for Beginners

CLASSIFICATION ALGORITHM

Bell-shaped assumptions for higher predictions

Transforming non-Gaussian distributed data

Pros:

Cons:

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

a Leaderboard for Real World Use Cases

Patch Time Series Transformer in Hugging Face

Constitutional AI with Open LLMs

Hugging Face Text Generation Inference available for AWS Inferentia2

The best way to Leverage Slash Commands to Code Effectively

Gaussian Naive Bayes, Explained: A Visual Guide with Code Examples for Beginners

CLASSIFICATION ALGORITHM

Bell-shaped assumptions for higher predictions

Transforming non-Gaussian distributed data

Pros:

Cons:

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.