Generate accurate forecasts to grasp how each prediction has been made.
Machine learning for time series forecasting achieved significant successes throughout the previous few years; machine learning methods dominated the leaderboard within the M5 Kaggle Walmart forecasting competition.
And once I say machine learning, I mean precisely this โ machine learning, not deep learning. As someone who has not only witnessed deep learning forecasting systems built by less savvy data science teams blowing in production but needed to fix it successfully, I won’t ever tire of claiming that โ๐๐๐๐ฉ ๐๐๐๐ซ๐ง๐ข๐ง๐ ๐๐ฌ ๐๐ก๐๐ญ ๐๐จ๐ฎ ๐๐จ ๐๐จ๐ญ ๐๐๐๐โ.
If you should learn why deep learning will not be the reply to time series forecasting, please read my Medium article โ๐๐๐๐ฉ ๐๐๐๐ซ๐ง๐ข๐ง๐ ๐๐ฌ ๐๐ก๐๐ญ ๐๐จ๐ฎ ๐๐จ ๐๐จ๐ญ ๐๐๐๐โ. You possibly can thank me later for not straying on the unsuitable path and saving your organization numerous effort and money.
Many powerful machine learning methods can deliver superior forecasting performance (follow me on LinkedIn and Twitter as I often highlight latest SOTA developments in time series forecasting), with each ensembles and boosted trees often surpassing other methods. Nonetheless, using complex machine learning and ensemble methods leads to black box models, where it’s obscure the trail resulting in individual predictions.
To deal with this issue, BlueYonder research has published a novel โCyclicBoostingโ machine learning algorithm. CyclicBoosting is a generic supervised machine learning model performing accurate regression and classification tasks efficiently. At the identical time, CyclicBoosting (CB) allows for an in depth understanding of how each prediction was made.
Such understanding is precious in situations where stakeholders would love to know the way individual predictions were made and which aspects contributed to them. Understanding how individual predictions were made will also be a regulatory requirement in lots of industries, equivalent to health, finance, and insurance and desirable in other industries, equivalent to manufacturing or retail.
As well as, many machine learning algorithms struggle to learn rare events, as such events usually are not representative of a lot of the data. Nonetheless, from the business perspective, accurately forecasting such events could be very useful. Imagine a retailer that sells goods that have spikes during promotions and events equivalent to Amazon Prime Day; underestimation of the amount of products sold during such events may end up in missed sales, customers turning elsewhere and damage brand loyalty.
CyclicBoosting is a supervised machine learning, the primary idea is that every feature Xj contributes in a selected technique to the prediction of the goal Y. All such contributions may be computed on the granular level, and every prediction for a given commentary Y_hat may be transparently interpreted by analysing how much each feature Xj contributed to such prediction. To attain the required granularity of forecasts, CyclicBoosting performs binning of continuous features.
During training, CyclicBoosting considers each feature in turn and computes modification to prediction appropriately by adjusting aspects fj,k where j is an index of the feature and k is the bin’s index. This process continues until a stopping criterion is met, e.g. the utmost variety of iterations or no further improvement of an error metric equivalent to the mean absolute deviation (MAD) or mean squared error (MSE).
The training proceeds as follows:
- Calculate the worldwide average ฮผ from all observed y across all bins k and features j.
- Initialise the aspects f_k_j โ1.
- Cyclically iterate through features j= 1,โฆ,p and calculate in turn for every bin k the partial aspects g and corresponding aggregated aspects f, where indices t(current iteration) and ฯ (current or preceding iteration) discuss with iterations of full feature cycles because the training of the algorithm progresses.
Confused with plenty of math formulas? Letโs illustrate with an easy example ctly is going on here. Imagine it’s a hot day ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ and you desire to to predict sales of ice cream given the temperature.
Suppose we now have a dataset with one feature: Temperature
and a goal variable Ice Cream Sales
. The Temperature
feature has three possible values (Low, Medium, and High).
Step one within the Cyclic Boosting algorithm is to calculate the worldwide mean of the goal variable Ice Cream Sales
. This is completed by taking the common of all of the observed values of Ice Cream Sales
within the training dataset.
Next, the algorithm estimates the weights for every bin of the Temperature
feature. This is completed by dividing the information into bins based on the values of the feature and calculating the common goal value for every bin. For instance, suppose we now have the next data:
The worldwide mean of icecream sales could be calculated as (10+12+14+11+13+15)/6 = 12.5
Next, the algorithm would estimate the weights for every bin of the Temperature
feature. For instance, it would estimate a weight of 0.8 for bin Low, a weight of 1.0 for the bin Medium, and a weight of 1.2 for the bin High. These weights are calculated based on how much the common goal value for every bin differs from the worldwide mean, which is actually what the long math formula means.
To predict a latest data point with Temperature
= High, the algorithm would multiply the worldwide mean of Ice Cream Sales
by the load estimate for the bin High (remember it’s a hot ๐ฅ๐ฅ๐ฅ๐ฅ๐ฅ day) of the feature Temperature
(1.2). The resulting prediction could be 12.5 * 1.2 = 15.
That is just an easy example for example how Cyclic Boosting might be used to predict ice cream sales based on temperature alone. In practice, more features might be added to enhance the accuracy of the predictions.
After doing the same old preprocessing and creating features to show the time series problem right into a supervised machine learning problem (remember CB is a generic supervised machine learning model, not a time series model as such), we are able to fit CB with the same old scikit-learn-based functionality.
# model training
model = cbm.CBM()
model.fit(x_train_df, y_train)
You possibly can then make predictions
# test on train error
y_pred_train = model.predict(x_train_df).flatten()
print('RMSE', mean_squared_error(y_pred_train, y_train, squared=False))
Finally, and that is where the important thing good thing about using Cyclic Boosting comes from, you’ll be able to produce vital plots showing how each factor and its level has contributed to predictions.
Additional materials:
- Cyclic Boosting โ an explainable supervised machine learning algorithm
- Cyclic Boosting implementation from BlueYonder
- Cyclic Boosting implementation from Microsoft
- ๐๐๐๐ฉ ๐๐๐๐ซ๐ง๐ข๐ง๐ ๐๐ฌ ๐๐ก๐๐ญ ๐๐จ๐ฎ ๐๐จ ๐๐จ๐ญ ๐๐๐๐