Home Artificial Intelligence 10 Confusing XGBoost Hyperparameters and How one can Tune Them Like a Pro in 2023

10 Confusing XGBoost Hyperparameters and How one can Tune Them Like a Pro in 2023

0
10 Confusing XGBoost Hyperparameters and How one can Tune Them Like a Pro in 2023

Afterwards, you will have to find out the variety of decision trees (often called base learners in XGBoost) to plant during training using num_boost_round. The default is 100 but that is hardly enough for today’s large datasets.

Increasing the parameter will plant more trees but significantly increases the possibilities of overfitting because the model becomes more complex.

One trick I learned from Kaggle is to set a high number like 100,000 for num_boost_round and make use of .

In each boosting round, XGBoost plants yet one more decision tree to enhance the collective rating of the previous ones. That’s why it is named boosting. This process continues until num_boost_round rounds, regardless whether each recent round is an improvement on the last or not.

But by utilizing early stopping, we will stop the training and thus planting of unnecessary trees when the rating hasn’t been improving for the last 5, 10, 50 or any arbitrary variety of rounds.

With this trick, we will find the proper variety of decision trees without even tuning num_boost_round and we’ll save time and computation resources. Here is how it could seem like in code:

# Define the remaining of the params
params = {...}

# Construct the train/validation sets
dtrain_final = xgb.DMatrix(X_train, label=y_train)
dvalid_final = xgb.DMatrix(X_valid, label=y_valid)

bst_final = xgb.train(
params,
dtrain_final,
num_boost_round=100000 # Set a high number
evals=[(dvalid_final, "validation")],
early_stopping_rounds=50, # Enable early stopping
verbose_eval=False,
)

The above code would’ve made XGBoost use 100k decision trees but due to early stopping, it should stop when the validation rating hasn’t been improving for the last 50 rounds. Often, the variety of required trees can be lower than 5000–10000.

Controlling num_boost_round can also be one among the most important aspects in how long the training process runs as more trees require more resources.

LEAVE A REPLY

Please enter your comment!
Please enter your name here