The Machine Learning “Advent Calendar” Day 5: GMM in Excel

Within the previous article, we explored distance-based clustering with K-Means.

further: to enhance how the gap could be measured we add variance, with the intention to get the Mahalanobis distance.

So, if k-Means is the unsupervised version of the Nearest Centroid classifier, then the natural query is:

What’s the unsupervised version of QDA?

Which means like QDA, each cluster now needs to be described not only by its mean, but additionally by its variance (and we also need to add covariance if the variety of features is higher than 2). But here every little thing is learned .

So that you see the thought, right?

And well, the name of this model is the Gaussian Mixture Model (GMM)…

GMM and the names of those models…

Because it is commonly the case, the names of the models come from historical reasons. They are usually not all the time designed to focus on the connections between models, in the event that they are usually not found together.

Different researchers, different periods, different use cases… and we find yourself with names that sometimes hide the true structure behind the ideas.

Here, the name “Gaussian Mixture Model” simply signifies that the info is represented as a mixture of several Gaussian distributions.

If we follow the identical naming logic as , it could have been clearer to call it something like k-Gaussian Mixture

Because, in practice, as a substitute of only using the means, we add the variance. And we could just use the Mahalanobis distance, or one other weighted distance using each means and variance. But Gaussian distribution gives us probabilities which might be easier to interpret.

So we decide a number of Gaussian components.

And by the way in which, GMM isn’t the just one.

In reality, the complete machine learning framework is definitely way more recent than most of the models it accommodates. Most of those techniques were originally developed in statistics, signal processing, econometrics, or pattern recognition.

Then, much later, the sphere we now call “machine learning” emerged and regrouped all these models under one umbrella. However the names didn’t change.

So today we use a combination of vocabularies coming from different eras, different communities, and different intentions.

For this reason the relationships between models are usually not all the time obvious once you look only on the names.

If we needed to rename every little thing with a contemporary, unified machine-learning style, the landscape would actually be much clearer:

GMM would develop into k-Gaussian Clustering
QDA would develop into Nearest Gaussian Classifier
LDA, well, Nearest Gaussian Classifier with the identical variance across classes.

And suddenly, all of the links appear:

k-Means ↔ Nearest Centroid
GMM ↔ Nearest Gaussian (QDA)

For this reason GMM is so natural after K-Means. If K-Means groups points by their closest centroid, then GMM groups them by their closest Gaussian shape.

Why this complete section to debate the names?

Well, the reality is that, since we already covered the k-means algorithm, and we already did the transition from Nearest Centroids Classifier to QDA, we already know all about this algorithm, and the training algorithm won’t change…

And what’s the NAME of this training algorithm?

Oh, Lloyd’s algorithm.

Actually, before k-means was called so, it was simply generally known as Lloyd’s algorithm, published by Stuart Lloyd in 1957. Only later, the machine learning community modified it to “k-means”.

And this algorithm manipulated only the means, so we want one other name, right?

You see where that is going: the Expectation-Maximizing algorithm!

EM is solely the overall type of Lloyd’s idea. Lloyd updates the means, EM updates : means, variances, weights, and probabilities.

So, you already know every little thing about GMM!

But since my article is named “GMM in Excel”, I cannot end my article here…

GMM in 1 Dimension

Allow us to start with this easy dataset, the identical we used for k-means: 1, 2, 3, 11, 12, 13

Hmm, the 2 Gaussians may have the identical variances. So take into consideration twiddling with other numbers in Excel!

And we naturally want 2 clusters.

Listed below are the various steps.

Initialization

We start with guesses for means, variances, and weights.

GMM in Excel – initialization step- image by writer

Expectation step (E-step)

For every point, we compute how likely it’s to belong to every Gaussian.

GMM in Excel – expectation step – image by writer

Maximization step (M-step)

Using these probabilities, we update the means, variances, and weights.

GMM in Excel – maximization step – image by writer

Iteration

We repeat E-step and M-step until the parameters stabilise.

Each step is incredibly easy once the formulas are visible.
You will note that EM is nothing greater than updating averages, variances, and probabilities.

We can even do some visualization to see how the Gaussian curves move through the iterations.

In the beginning, the 2 Gaussian curves overlap heavily since the initial means and variances are only guesses.

The curves slowly separate, adjust their widths, and at last settle exactly on the 2 groups of points.

By plotting the Gaussian curves at each iteration, you possibly can literally the model learn:

the means slide toward the centers of the info
the variances shrink to match the spread of every group
the overlap disappears
the ultimate shapes match the structure of the dataset

This visual evolution is incredibly helpful for intuition. When you see the curves move, EM is not any longer an abstract algorithm. It becomes a dynamic process you possibly can follow step-by-step.

GMM in 2 Dimensions

The logic is precisely the identical as in 1D. Nothing latest conceptually. We simply extend the formulas…

As an alternative of getting one feature per point, we now have two.

Each Gaussian must now learn:

a mean for x1
a mean for x2
a variance for x1
a variance for x2
AND a covariance term between the 2 features.

When you write the formulas in Excel, you will notice that the method stays the exact same:

Well, the reality is that for those who have a look at the screenshot, you may think: ” And this isn’t all of it.

But don’t be fooled. The formula is long only because we write out the 2-dimensional Gaussian density explicitly:

one part for the gap in x1
one part for the gap in x2
the covariance term
the normalization constant

Nothing more.

It is solely the density formula expanded cell by cell.
Long to type, but perfectly comprehensible when you see the structure: a weighted distance, inside an exponential, divided by the determinant.

So yes, the formula looks big… but the thought behind it is incredibly easy.

Conclusion

K-Means gives hard boundaries.

GMM gives probabilities.

Once the EM formulas are written in Excel, the model becomes easy to follow: the means move, the variances adjust, and the Gaussians naturally settle around the info.

GMM is just the subsequent logical step after k-Means, offering a more flexible solution to represent clusters and their shapes.

The Machine Learning “Advent Calendar” Day 5: GMM in Excel

GMM and the names of those models…