XGBoost: Theory and Application Introduction The Theory Behind XGBoost Baseball Application Conclusion


  • Velocity (release_speed): The rate of the pitch in miles per hour.
  • Extension (release_extension): Measure of the true release point from the pitching rubber. The gap in feet closer to home plate than the 60.5 ft from pitching rubber to home.
  • Spin rate (release_spin_rate): Rate of spin on the ball after it was released by the pitcher, measured in RPM.
  • Induced vertical break (pfx_z): Vertical movement of the ball in inches attributable to the spin of the pitch. For instance, the induced vertical break of a fastball is positive, because the backspin of a fastball counteracts gravity, in a way lifting it up from its normal path. Alternatively, the induced vertical break of a curveball is negative because its over-the-top spin augments the vertical drop attributable to gravity. Each of those movement patterns could be explained by the Magnus effect.
  • Horizontal break (pfx_x): Horizontal movement of the ball in inches.
# X is a knowledge frame of pitch movement characteristics
# Y is a knowledge frame of pitch results (1 = success, 0 = failure)

# split data into train/test sets
xtrain, xtest, ytrain, ytest = train_test_split(X, y, test_size = 0.2)

# run algorithm without hyperparameter tuning
xgbr = xgb.XGBRegressor(objective = 'reg:squarederror')
xgbr.fit(xtrain, ytrain)

# tune parameters
params = { 'max_depth': [2, 3, 4, 5, 6],
'learning_rate': [0.01, 0.05, 0.1, 0.3],
'colsample_bytree': np.arange(0.5, 0.6, 0.7, 0.8, 0.9),
'n_estimators': [100, 150, 300, 1000] }

xgbr = xgb.XGBRegressor(seed = 20)

clf = GridSearchCV(estimator = xgbr,
param_grid = params,
scoring = 'neg_mean_squared_error',
verbose = 1)
clf.fit(X, y)

  • max_depth: the utmost depth of every tree
  • learning_rate: the educational rate of the model
  • colsample_bytree: the fraction of columns to be random samples for every tree.
  • n_estimators: the variety of trees within the forest
# update algorithm with tuned hyperparameters
xgb1 = xgb.XGBRegressor(learning_rate = 0.05,
n_estimators = 150,
max_depth = 3,
colsample_bytree = 0.8,
objective = 'reg:squarederror',
seed = 20)
xgb1.fit(xtrain, ytrain)
from xgboost import plot_importance


  1. Felix Bautista
  2. Taj Bradley
  3. Ryan Helsley
  4. Eury Perez
  5. Peter Fairbanks
  6. Trevor Megill
  7. Justin Martinez
  8. Spencer Strider
  9. Beau Brieske
  10. Jhoan Duran
Felix Bautista


What are your thoughts on this topic?
Let us know in the comments below.


0 0 votes
Article Rating
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

Would love your thoughts, please comment.x