The Machine Learning “Advent Calendar” Day 8: Isolation Forest in Excel

with Decision Trees, each for Regression and Classification, we are going to proceed to make use of the principle of Decision Trees today.

And this time, we’re in unsupervised learning, so there aren’t any labels.

The algorithm known as Isolation Forest, and the concept is to construct many decision trees to form a forest. The principle is to detect anomalies by isolating them.

To maintain every little thing easy to know, let’s take a quite simple example dataset that I created myself:

1, 2, 3, 9

(And since the chief editor of TDS jogged my memory about legal details about mentioning the source of the information, let me state this properly: . It’s a four-point dataset that I handcrafted, and I’m blissful to grant everyone the suitable to make use of it for educational purposes.)

The goal here is easy: find the anomaly, the intruder.

I do know you already see which one it’s.

As at all times, the concept is to show this into an algorithm that may detect it robotically.

Anomaly Detection within the Classic ML Framework

Before going further, allow us to take one step back and see where anomaly detection sits in the larger picture.

Classic ML framework – image by writer

On the left, now we have supervised learning, with labeled data and two fundamental types:

Regression when the goal is numerical
Classification when the goal is categorical

That is where we used Decision Trees to date.

On the suitable, now we have unsupervised learning, with no labels.

We don’t predict anything. We simply manipulate the observations (clustering and anomaly detection) or manipulate the features (dimensionality reduction, and other methods).

Dimensionality reduction manipulates the features. Regardless that it sits within the “unsupervised” category, its goal is sort of different from the others. Because it reshapes the features themselves, it almost looks like feature engineering.

For observation-level methods, now we have two possibilities:

Clustering: group observations
Anomaly detection: assign a rating to every statement

In practice, some models can do the 2 at the identical time. For instance, the k-means is able to detecting anomalies.

Isolation Forest is barely for Anomaly Detection, and never clustering.

So, today, we’re exactly here:
Unsupervised learning → Clustering / Anomaly detection → Anomaly detection

The Painful Part: Constructing Trees in Excel

Now we start the implementation in Excel, and I even have to be honest: this part is basically painful…

It’s painful because we want to construct many small rules, and the formulas usually are not easy to tug. That is certainly one of the restrictions of Excel when the model is predicated on decisions. Excel is great when the formulas look the identical for each row. But here, each node within the tree follows a unique rule, so the formulas don’t generalize easily.

For Decision Trees, we saw that with a single split, the formula worked. But I finished there on purpose. Why? Because adding more splits in Excel becomes complicated. The structure of a call tree isn’t naturally “drag-friendly”.

Nevertheless, for Isolation Forest, now we have no selection.

We’d like to construct a full tree, all the way in which down, to see how each point is isolated.

Should you, dear readers, have ideas to simplify this, please contact me.

Isolation Forest in 3 Steps

Regardless that the formulas usually are not easy, I attempted my best to structure the approach. Here is the whole method in only three steps.

Isolation Forest in Excel – image by writer

1. Isolation Tree Construction

We start by creating one isolation tree.

At each node, we pick a random split value between the minimum and maximum of the present group.

This split divides the observations into “left” (L) and “right” (R).

When an statement becomes isolated, I mark it as F for “Final”, meaning it has reached a leaf.

By repeating this process, we obtain a full binary tree where anomalies are likely to be isolated in fewer steps. For every statement, we are able to then count its depth, which is solely the variety of splits needed to isolate it.

2. Average Depth Calculation

One tree isn’t enough. So we repeat the identical random process several times to construct multiple trees.

For every data point, we count what number of splits were needed to isolate it in each tree.

Then we compute the typical depth (or average path length) across all trees.

This offers a stable and meaningful measure of how easy it’s to isolate each point.

At this point, the typical depth already gives us a solid indicator:
the lower the depth, the more likely the purpose is an anomaly.

A brief depth means the purpose is isolated in a short time, which is a signature of an anomaly.

An extended depth means the purpose behaves like the remaining of the information, because they stay grouped together, and usually are not easy to separate.

In our example, the rating makes perfect sense.

First, 9 is the anomaly, with the typical depth of 1. For all 5 trees, one split is sufficient to isolate it. (Although, this isn’t at all times the case, you possibly can test it yourself.)
For the opposite three observations, the depth is comparable, and noticeably larger. And the best rating is attributed to 2, which sits in the midst of the group, and this is strictly what we expect.

If at some point you will have to clarify this algorithm to another person, be at liberty to make use of this dataset: easy to recollect and intuitive for example. And please, don’t forget to say my copyright on it!

3. Anomaly Rating Calculation

The ultimate step is to normalize the typical depth, to provide a regular anomaly rating, between 0 and 1.

Saying that an statement has a mean depth of doesn’t mean much by itself.

This value relies on the entire number of information points, so we cannot interpret it directly as “normal” or “anomalous”.

The concept is to match the typical path length of every point to a typical value expected under pure randomness. This tells us how surprising (or not) the depth really is.

We are going to see the transformation later, however the goal is easy:
turn the raw depth right into a relative rating that is smart with none context.

Short depths will naturally turn into scores near 1 (anomalies),
and long depths will turn into scores near 0 (normal observations).

And eventually, some implementations adjust the rating in order that it has a unique meaning: positive values indicate normal points, and negative values indicate anomalies. This is solely a metamorphosis of the unique anomaly rating.

The underlying logic doesn’t change in any respect: short paths still correspond to anomalies, and long paths correspond to normal observations.

Isolation Tree Constructing

So that is the painful part.

Quick Overview

I created a table to capture the various steps of the tree-building process.

It isn’t regular, and it isn’t perfectly structured, but I attempted my best to make it readable.

And I’m unsure that each one the formulas generalized well.

Get the minimum and maximum values of the present group.
Generate a random split value between this min and max.
Split the observations into left (L) and right (R).
Count what number of observations fall into L and R.
If a gaggle comprises only one statement, mark it as F (Final) and stop for that branch.
Repeat the method for each non-final group until all observations are isolated.

That is the whole logic of constructing one isolation tree.

Developed Explanation

We start with all of the observations together.

Step one is to take a look at the minimum and maximum of this group. These two values define the interval where we are able to make a random cut.

Next, we generate a random split value somewhere between the min and max. Unlike decision trees, there isn’t a optimization, no criterion, no impurity measure. The split is solely random.

We will use RAND in Excel, as you possibly can see the in following screenshot.

Once now we have the random split, we divide the information into two groups:

Left (L): observations lower than or equal to the split
Right (R): observations greater than the split

This is solely done by comparing the split with the observations with IF formula.

After the split, we count what number of observations went to both sides.
If certainly one of these groups comprises just one statement, this point is now isolated.

We mark it as F for “Final”, meaning it sits in a leaf and no further splitting is required for that branch.

The VLOOKUP is to get the observations which have 1 on its side, from the table of the counts.

For all other groups that also contain multiple observations, we repeat the exact same process.

We stop only when every statement is isolated, meaning each appears in its own final leaf. The complete structure that emerges is a binary tree, and the variety of splits needed to isolate each statement is its depth.

Here, we all know that 3 splits are enough.

At the tip, you get the ultimate table of 1 fully grown isolation tree.

Anomaly Rating Calculation

The part about averaging the depth is just repeating the identical process, and you possibly can copy paste.

Now, I’ll give more details concerning the anomaly rating calculation.

Normalization factor

To compute the anomaly rating, Isolation Forest first needs a called c(n).

This value represents the of a random point in a random binary search tree with n observations.

Why do we want it?

Because we wish to match the depth of some extent to the depth expected under randomness.

Some extent that’s isolated much faster than expected is probably going an anomaly.

The formula for c(n) uses harmonic numbers.
A harmonic number H(k) is roughly:

where γ = 0.5772156649 is the Euler–Mascheroni constant.

Using this approximation, the normalizing factor becomes:

Then we are able to calculate this number in Excel.

Once now we have c(n), the anomaly rating is:

where h(x) is the typical depth needed to isolate the purpose across all trees.

If the rating is near 0, the purpose is normal

If the rating is near 1, the purpose is an anomaly

So we are able to transform the depths into scores.

Finally, for the adjusted rating, we are able to use an offset, that’s the typical value of the anomaly scores, and we translate.

Additional Elements in Real Algorithm

In practice, Isolation Forest includes just a few extra steps that make it more robust.

1. Select a subsample of the information
As a substitute of using the total dataset for each tree, the algorithm picks a small random subset.

This reduces computation and adds diversity between trees.
It also helps prevent the model from being overwhelmed by very large datasets.

So plainly a reputation like “Random Isolation Forest” is more suitable, right?

2. Pick a random feature first
When constructing each split, Isolation Forest doesn’t at all times use the identical feature.

It first selects a feature at random, then chooses a random split value inside that feature.

This makes the trees much more diverse and helps the model work well on datasets with many variables.

These easy additions make Isolation Forest surprisingly powerful for real-world applications.

That is again what a “Random Isolation Forest” would do, this name is certainly higher!

Benefits of Isolation Forest

Compared with many distance-based models, Isolation Forest has several vital benefits:

Works with categorical features
Distance-based methods struggle with categories, but Isolation Forest can handle them more naturally.
Handles many features easily
High-dimensional data isn’t an issue.
The algorithm doesn’t depend on distance metrics that break in high dimensions.
No assumptions about distributions
There is no such thing as a need for normality, no density estimation, no distances to compute.
Scales well to high dimensions
Its performance doesn’t collapse when the variety of features grows.
Very fast
Splitting is trivial: pick a feature, pick a random value, cut.
No optimization step, no gradient, no impurity calculation.

Isolation Forest also has a really refreshing way of considering:

As a substitute of asking “What should normal points appear like?”,
Isolation Forest asks, “How briskly can I isolate this point?”

This straightforward change of perspective solves many difficulties of classical anomaly detection.

Conclusion

Isolation Forest is an algorithm that appears complicated from the surface, but when you break it down, the logic is definitely quite simple.

The Excel implementation is painful, yes. But the concept isn’t.
And when you understand the concept, every little thing else becomes much easier: how the trees work, why the depth matters, how the rating is computed, and why the algorithm works so well in practice.

Isolation Forest doesn’t attempt to model “normal” behavior. As a substitute, it asks a totally different query: how briskly can I isolate this statement?

This small change of perspective solves many problems that distance-based or density-based models struggle with.

The Machine Learning “Advent Calendar” Day 8: Isolation Forest in Excel

Anomaly Detection within the Classic ML Framework

The Painful Part: Constructing Trees in Excel