AdaBoost Classifier, Explained: A Visual Guide with Code Examples

-

ENSEMBLE LEARNING

Putting the burden where weak learners need it most

Everyone makes mistakes — even the only decision trees in machine learning. As a substitute of ignoring them, AdaBoost (Adaptive Boosting) algorithm does something different: it learns (or adapts) from these mistakes to recuperate.

Unlike Random Forest, which makes many trees without delay, AdaBoost starts with a single, easy tree and identifies the instances it misclassifies. It then builds recent trees to repair those errors, learning from its mistakes and improving with each step.

Here, we’ll illustrate exactly how AdaBoost makes its predictions, constructing strength by combining targeted weak learners identical to a workout routine that turns focused exercises into full-body power.

All visuals: Creator-created using Canva Pro. Optimized for mobile; may appear oversized on desktop.

AdaBoost is an ensemble machine learning model that creates a sequence of weighted decision trees, typically using shallow trees (often just single-level “stumps”). Each tree is trained on your entire dataset, but with adaptive sample weights that give more importance to previously misclassified examples.

For classification tasks, AdaBoost combines the trees through a weighted voting system, where better-performing trees get more influence in the ultimate decision.

The model’s strength comes from its adaptive learning process — while each easy tree is perhaps a “weak learner” that performs only barely higher than random guessing, the weighted combination of trees creates a “strong learner” that progressively focuses on and corrects mistakes.

AdaBoost is an element of the boosting family of algorithms since it builds trees separately. Each recent tree tries to repair the mistakes made by the previous trees. It then uses a weighted vote to mix their answers and make its final prediction.

Throughout this text, we’ll concentrate on the classic golf dataset for example for classification.

Columns: ‘Outlook (one-hot-encoded into 3 columns)’, ’Temperature’ (in Fahrenheit), ‘Humidity’ (in %), ‘Windy’ (Yes/No) and ‘Play’ (Yes/No, goal feature)
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
# Create and prepare dataset
dataset_dict = {
'Outlook': ['sunny', 'sunny', 'overcast', 'rainy', 'rainy', 'rainy', 'overcast',
'sunny', 'sunny', 'rainy', 'sunny', 'overcast', 'overcast', 'rainy',
'sunny', 'overcast', 'rainy', 'sunny', 'sunny', 'rainy', 'overcast',
'rainy', 'sunny', 'overcast', 'sunny', 'overcast', 'rainy', 'overcast'],
'Temperature': [85.0, 80.0, 83.0, 70.0, 68.0, 65.0, 64.0, 72.0, 69.0, 75.0, 75.0,
72.0, 81.0, 71.0, 81.0, 74.0, 76.0, 78.0, 82.0, 67.0, 85.0, 73.0,
88.0, 77.0, 79.0, 80.0, 66.0, 84.0],
'Humidity': [85.0, 90.0, 78.0, 96.0, 80.0, 70.0, 65.0, 95.0, 70.0, 80.0, 70.0,
90.0, 75.0, 80.0, 88.0, 92.0, 85.0, 75.0, 92.0, 90.0, 85.0, 88.0,
65.0, 70.0, 60.0, 95.0, 70.0, 78.0],
'Wind': [False, True, False, False, False, True, True, False, False, False, True,
True, False, True, True, False, False, True, False, True, True, False,
True, False, False, True, False, False],
'Play': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes',
'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes', 'No', 'No', 'No', 'Yes', 'Yes',
'Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes']
}
# Prepare data
df = pd.DataFrame(dataset_dict)
df = pd.get_dummies(df, columns=['Outlook'], prefix='', prefix_sep='', dtype=int)
df['Wind'] = df['Wind'].astype(int)
df['Play'] = (df['Play'] == 'Yes').astype(int)

# Rearrange columns
column_order = ['sunny', 'overcast', 'rainy', 'Temperature', 'Humidity', 'Wind', 'Play']
df = df[column_order]

# Prepare features and goal
X,y = df.drop('Play', axis=1), df['Play']
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.5, shuffle=False)Fundamental Mechanism

Here’s how AdaBoost works:

  1. Initialize Weights: Assign equal weight to every training example.
  2. Iterative Learning: In each step, a straightforward decision tree is trained and its performance is checked. Misclassified examples get more weight, making them a priority for the subsequent tree. Appropriately classified examples stay the identical, and all weights are adjusted so as to add as much as 1.
  3. Construct Weak Learners: Each recent, easy tree targets the mistakes of the previous ones, making a sequence of specialised weak learners.
  4. Final Prediction: Mix all trees through weighted voting, where each tree’s vote is predicated on its importance value, giving more influence to more accurate trees.
An AdaBoost Classifier makes predictions through the use of many easy decision trees (normally 50–100). Each tree, called a “stump,” focuses on one necessary feature, like temperature or humidity. The ultimate prediction is made by combining all of the trees’ votes, each weighted by how necessary that tree is (“alpha”).

Here, we’ll follow the SAMME (Stagewise Additive Modeling using a Multi-class Exponential loss function) algorithm, the usual approach in scikit-learn that handles each binary and multi-class classification.

1.1. Resolve the weak learner for use. A one-level decision tree (or “stump”) is the default selection.
1.2. Resolve what number of weak learner (on this case the variety of trees) you need to construct (the default is 50 trees).

We start with depth-1 decision trees (stumps) as our weak learners. Each stump makes only one split, and we’ll train 50 of them sequentially, adjusting weights along the best way.

1.3. Start by giving each training example equal weight:
· Each sample gets weight = 1/N (N is total variety of samples)
· All weights together sum to 1

All data points start with equal weights (0.0714), with the overall weight adding as much as 1. This ensures every example is equally necessary when training begins.

For the First Tree

2.1. Construct a choice stump while considering sample weights

Before making the primary split, the algorithm examines all data points with their weights to search out the perfect splitting point. These weights influence how necessary each example is in making the split decision.

a. Calculate initial weighted Gini impurity for the basis node

The algorithm calculates the Gini impurity rating at the basis node, but now considers the weights of all data points.

b. For every feature:
· Sort data by feature values (exactly like in Decision Tree classifier)

For every feature, the algorithm sorts the information and identifies potential split points, exactly like the usual Decision Tree.

· For every possible split point:
·· Split samples into left and right groups
·· Calculate weighted Gini impurity for each groups
·· Calculate weighted Gini impurity reduction for this split

The algorithm calculates weighted Gini impurity for every potential split and compares it to the parent node. For feature “sunny” with split point 0.5, this impurity reduction (0.066) shows how much this split improves the information separation.

c. Pick the split that provides the biggest Gini impurity reduction

After checking all possible splits across features, the column ‘overcast’ (with split point 0.5) gives the best impurity reduction of 0.102. This implies it’s essentially the most effective technique to separate the classes, making it the perfect selection for the primary split.

d. Create a straightforward one-split tree using this decision

Using the perfect split point found, the algorithm divides the information into two groups, each keeping their original weights. This easy decision tree is purposely kept small and imperfect, making it just barely higher than random guessing.

2.2. Evaluate how good this tree is
a. Use the tree to predict the label of the training set.
b. Add up the weights of all misclassified samples to get error rate

The primary weak learner makes predictions on the training data, and we check where it made mistakes (marked with X). The error rate of 0.357 shows this easy tree gets some predictions improper, which is predicted and can help guide the subsequent steps of coaching.

c. Calculate tree importance (α) using:
α = learning_rate × log((1-error)/error)

Using the error rate, we calculate the tree’s influence rating (α = 0.5878). Higher scores mean more accurate trees, and this tree earned moderate importance for its decent performance.

2.3. Update sample weights
a. Keep the unique weights for accurately classified samples
b. Multiply the weights of misclassified samples by e^(α).
c. Divide each weight by the sum of all weights. This normalization ensures all weights still sum to 1 while maintaining their relative proportions.

Cases where the tree made mistakes (marked with X) get higher weights for the subsequent round. After increasing these weights, all weights are normalized to sum to 1, ensuring misclassified examples get more attention in the subsequent tree.

For the Second Tree

2.1. Construct a brand new stump, but now using the updated weights
a. Calculate recent weighted Gini impurity for root node:
· Will probably be different because misclassified samples now have greater weights
· Appropriately classified samples now have smaller weights

Using the updated weights (where misclassified examples now have higher importance), the algorithm calculates the weighted Gini impurity at the basis node. This begins the means of constructing the second decision tree.

b. For every feature:
· Same process as before, however the weights have modified
c. Pick the split with best weighted Gini impurity reduction
· Often completely different from the primary tree’s split
· Focuses on samples the primary tree got improper

With updated weights, different split points show different effectiveness. Notice that “overcast” isn’t any longer the perfect split — the algorithm now finds temperature (84.0) gives the best impurity reduction, showing how weight changes affect split selection.

d. Create the second stump

Using temperature ≤ 84.0 because the split point, the algorithm assigns YES/NO to every leaf based on which class has more total weight in that group, not only by counting examples. This weighted voting helps correct the previous tree’s mistakes.

2.2. Evaluate this recent tree
a. Calculate error rate with current weights
b. Calculate its importance (α) using the identical formula as before
2.3. Update weights again — Same process: increase weights for mistakes then normalize.

The second tree achieves a lower error rate (0.222) and better importance rating (α = 1.253) than the primary tree. Like before, misclassified examples get higher weights for the subsequent round.

For the Third Tree onwards

Repeat Step 2.1–2.3 for all remaining trees.

The algorithm builds 50 easy decision trees sequentially, each with its own importance rating (α). Each tree learns from previous mistakes by specializing in different features of the information, creating a powerful combined model. Notice how some trees (like Tree 2) get higher importance scores after they perform higher.

Step 3: Final Ensemble
3.1. Keep all trees and their importance scores

The 50 easy decision trees work together as a team, each with its own importance rating (α). When making predictions, trees with higher α values (like Tree 2 with 1.253) have more influence on the ultimate decision than trees with lower scores.
from sklearn.tree import plot_tree
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import plot_tree
import matplotlib.pyplot as plt

# Train AdaBoost
np.random.seed(42) # For reproducibility
clf = AdaBoostClassifier(algorithm='SAMME', n_estimators=50, random_state=42)
clf.fit(X_train, y_train)

# Create visualizations for trees 1, 2, and 50
trees_to_show = [0, 1, 49]
feature_names = X_train.columns.tolist()
class_names = ['No', 'Yes']

# Arrange the plot
fig, axes = plt.subplots(1, 3, figsize=(14,4), dpi=300)
fig.suptitle('Decision Stumps from AdaBoost', fontsize=16)

# Plot each tree
for idx, tree_idx in enumerate(trees_to_show):
plot_tree(clf.estimators_[tree_idx],
feature_names=feature_names,
class_names=class_names,
filled=True,
rounded=True,
ax=axes[idx],
fontsize=12) # Increased font size
axes[idx].set_title(f'Tree {tree_idx + 1}', fontsize=12)

plt.tight_layout(rect=[0, 0.03, 1, 0.95])

Each node shows its ‘value’ parameter as [weight_NO, weight_YES], which represents the weighted proportion of every class at that node. These weights come from the sample weights we calculated during training.

Testing Step

For predicting:
a. Get each tree’s prediction
b. Multiply each by its importance rating (α)
c. Add all of them up
d. The category with higher total weight shall be the ultimate prediction

When predicting for brand new data, each tree makes its prediction and multiplies it by its importance rating (α). The ultimate decision comes from adding up all weighted votes — here, the NO class gets the next total rating (23.315 vs 15.440), so the model predicts NO for this unseen example.

Evaluation Step

After constructing all of the trees, we are able to evaluate the test set.

By iteratively training and weighting weak learners to concentrate on misclassified examples, AdaBoost creates a powerful classifier that achieves high accuracy — typically higher than single decision trees or simpler models!
# Get predictions
y_pred = clf.predict(X_test)

# Create DataFrame with actual and predicted values
results_df = pd.DataFrame({
'Actual': y_test,
'Predicted': y_pred
})
print(results_df) # Display results DataFrame

# Calculate and display accuracy
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, y_pred)
print(f"nModel Accuracy: {accuracy:.4f}")

Listed below are the important thing parameters for AdaBoost, particularly in scikit-learn:

estimator: That is the bottom model that AdaBoost uses to construct its final solution. The three commonest weak learners are:
a. Decision Tree with depth 1 (Decision Stump): That is the default and hottest selection. Since it only has one split, it is taken into account a really weak learner that’s only a bit higher than random guessing, exactly what is required for enhancing process.
b. Logistic Regression: Logistic regression (especially with high-penalty) can be used here regardless that it will not be really a weak learner. It might be useful for data that has linear relationship.
c. Decision Trees with small depth (e.g., depth 2 or 3): These are barely more complex than decision stumps. They’re still fairly easy, but can handle barely more complex patterns than the choice stump.

AdaBoost’s base models could be easy decision stumps (depth=1), small trees (depth 2–3), or penalized linear models. Each type is kept easy to avoid overfitting while offering alternative ways to capture patterns.

n_estimators: The variety of weak learners to mix, typically around 50–100. Using greater than 100 rarely helps.

learning_rate: Controls how much each classifier affects the end result. Common starting values are 0.1, 0.5, or 1.0. Lower numbers (like 0.1) and a bit higher n_estimator normally work higher.

Key differences from Random Forest

As each Random Forest and AdaBoost works with multiple trees, it is straightforward to confuse the parameters involved. The important thing difference is that Random Forest combines many trees independently (bagging) while AdaBoost builds trees one after one other to repair mistakes (boosting). Listed below are another details about their differences:

  1. No bootstrap parameter because AdaBoost uses all data but with changing weights
  2. No oob_score because AdaBoost doesn’t use bootstrap sampling
  3. learning_rate becomes crucial (not present in Random Forest)
  4. Tree depth is usually kept very shallow (normally just stumps) unlike Random Forest’s deeper trees
  5. The main focus shifts from parallel independent trees to sequential dependent trees, making parameters like n_jobs less relevant

Pros:

  • Adaptive Learning: AdaBoost gets higher by giving more weight to mistakes it made. Each recent tree pays more attention to the hard cases it got improper.
  • Resists Overfitting: Though it keeps adding more trees one after the other, AdaBoost normally doesn’t get too focused on training data. It’s because it uses weighted voting, so no single tree can control the ultimate answer an excessive amount of.
  • Built-in Feature Selection: AdaBoost naturally finds which features matter most. Each easy tree picks essentially the most useful feature for that round, which implies it routinely selects necessary features because it trains.

Cons:

  • Sensitive to Noise: Since it gives more weight to mistakes, AdaBoost can have trouble with messy or improper data. If some training examples have improper labels, it would focus an excessive amount of on these bad examples, making the entire model worse.
  • Must Be Sequential: Unlike Random Forest which might train many trees without delay, AdaBoost must train one tree at a time because each recent tree must understand how the previous trees did. This makes it slower to coach.
  • Learning Rate Sensitivity: While it has fewer settings to tune than Random Forest, the training rate really affects how well it really works. If it’s too high, it would learn the training data too exactly. If it’s too low, it needs many more trees to work well.

AdaBoost is a key boosting algorithm that many more moderen methods learned from. Its principal idea — improving by specializing in mistakes — has helped shape many modern machine learning tools. While other methods attempt to be perfect from the beginning, AdaBoost tries to point out that sometimes the perfect technique to solve an issue is to learn out of your errors and keep improving.

AdaBoost also works best in binary classification problems and when your data is clean. While Random Forest is perhaps higher for more general tasks (like predicting numbers) or messy data, AdaBoost can provide really good results when utilized in the fitting way. The undeniable fact that people still use it after so a few years shows just how well the core idea works!

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier

# Create dataset
dataset_dict = {
'Outlook': ['sunny', 'sunny', 'overcast', 'rainy', 'rainy', 'rainy', 'overcast',
'sunny', 'sunny', 'rainy', 'sunny', 'overcast', 'overcast', 'rainy',
'sunny', 'overcast', 'rainy', 'sunny', 'sunny', 'rainy', 'overcast',
'rainy', 'sunny', 'overcast', 'sunny', 'overcast', 'rainy', 'overcast'],
'Temperature': [85.0, 80.0, 83.0, 70.0, 68.0, 65.0, 64.0, 72.0, 69.0, 75.0, 75.0,
72.0, 81.0, 71.0, 81.0, 74.0, 76.0, 78.0, 82.0, 67.0, 85.0, 73.0,
88.0, 77.0, 79.0, 80.0, 66.0, 84.0],
'Humidity': [85.0, 90.0, 78.0, 96.0, 80.0, 70.0, 65.0, 95.0, 70.0, 80.0, 70.0,
90.0, 75.0, 80.0, 88.0, 92.0, 85.0, 75.0, 92.0, 90.0, 85.0, 88.0,
65.0, 70.0, 60.0, 95.0, 70.0, 78.0],
'Wind': [False, True, False, False, False, True, True, False, False, False, True,
True, False, True, True, False, False, True, False, True, True, False,
True, False, False, True, False, False],
'Play': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes',
'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes', 'No', 'No', 'No', 'Yes', 'Yes',
'Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes']
}
df = pd.DataFrame(dataset_dict)

# Prepare data
df = pd.get_dummies(df, columns=['Outlook'], prefix='', prefix_sep='', dtype=int)
df['Wind'] = df['Wind'].astype(int)
df['Play'] = (df['Play'] == 'Yes').astype(int)

# Split features and goal
X, y = df.drop('Play', axis=1), df['Play']
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.5, shuffle=False)

# Train AdaBoost
ada = AdaBoostClassifier(
estimator=DecisionTreeClassifier(max_depth=1), # Create base estimator (decision stump)
n_estimators=50, # Typically fewer trees than Random Forest
learning_rate=1.0, # Default learning rate
algorithm='SAMME', # The one currently available algorithm (shall be removed in future scikit-learn updates)
random_state=42
)
ada.fit(X_train, y_train)

# Predict and evaluate
y_pred = ada.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred)}")

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x