Home Artificial Intelligence Time Series Forecasting Using Cyclic Boosting

Time Series Forecasting Using Cyclic Boosting

0
Time Series Forecasting Using Cyclic Boosting

Generate accurate forecasts to grasp how each prediction has been made.

Image by pressfoto on Freepik

Machine learning for time series forecasting achieved significant successes throughout the previous few years; machine learning methods dominated the leaderboard within the M5 Kaggle Walmart forecasting competition.

And once I say machine learning, I mean precisely this โ€” machine learning, not deep learning. As someone who has not only witnessed deep learning forecasting systems built by less savvy data science teams blowing in production but needed to fix it successfully, I won’t ever tire of claiming that โ€˜๐ƒ๐ž๐ž๐ฉ ๐‹๐ž๐š๐ซ๐ง๐ข๐ง๐  ๐ˆ๐ฌ ๐–๐ก๐š๐ญ ๐˜๐จ๐ฎ ๐ƒ๐จ ๐๐จ๐ญ ๐๐ž๐ž๐โ€™.

If you should learn why deep learning will not be the reply to time series forecasting, please read my Medium article โ€˜๐ƒ๐ž๐ž๐ฉ ๐‹๐ž๐š๐ซ๐ง๐ข๐ง๐  ๐ˆ๐ฌ ๐–๐ก๐š๐ญ ๐˜๐จ๐ฎ ๐ƒ๐จ ๐๐จ๐ญ ๐๐ž๐ž๐โ€™. You possibly can thank me later for not straying on the unsuitable path and saving your organization numerous effort and money.

Many powerful machine learning methods can deliver superior forecasting performance (follow me on LinkedIn and Twitter as I often highlight latest SOTA developments in time series forecasting), with each ensembles and boosted trees often surpassing other methods. Nonetheless, using complex machine learning and ensemble methods leads to black box models, where it’s obscure the trail resulting in individual predictions.

To deal with this issue, BlueYonder research has published a novel โ€CyclicBoostingโ€ machine learning algorithm. CyclicBoosting is a generic supervised machine learning model performing accurate regression and classification tasks efficiently. At the identical time, CyclicBoosting (CB) allows for an in depth understanding of how each prediction was made.

Such understanding is precious in situations where stakeholders would love to know the way individual predictions were made and which aspects contributed to them. Understanding how individual predictions were made will also be a regulatory requirement in lots of industries, equivalent to health, finance, and insurance and desirable in other industries, equivalent to manufacturing or retail.

As well as, many machine learning algorithms struggle to learn rare events, as such events usually are not representative of a lot of the data. Nonetheless, from the business perspective, accurately forecasting such events could be very useful. Imagine a retailer that sells goods that have spikes during promotions and events equivalent to Amazon Prime Day; underestimation of the amount of products sold during such events may end up in missed sales, customers turning elsewhere and damage brand loyalty.

CyclicBoosting is a supervised machine learning, the primary idea is that every feature Xj contributes in a selected technique to the prediction of the goal Y. All such contributions may be computed on the granular level, and every prediction for a given commentary Y_hat may be transparently interpreted by analysing how much each feature Xj contributed to such prediction. To attain the required granularity of forecasts, CyclicBoosting performs binning of continuous features.

During training, CyclicBoosting considers each feature in turn and computes modification to prediction appropriately by adjusting aspects fj,k where j is an index of the feature and k is the bin’s index. This process continues until a stopping criterion is met, e.g. the utmost variety of iterations or no further improvement of an error metric equivalent to the mean absolute deviation (MAD) or mean squared error (MSE).

The anticipated values of the goal variables are calculated using aspects for every feature and bin

The training proceeds as follows:

  1. Calculate the worldwide average ฮผ from all observed y across all bins k and features j.
  2. Initialise the aspects f_k_j โ†1.
  3. Cyclically iterate through features j= 1,โ€ฆ,p and calculate in turn for every bin k the partial aspects g and corresponding aggregated aspects f, where indices t(current iteration) and ฯ„ (current or preceding iteration) discuss with iterations of full feature cycles because the training of the algorithm progresses.

Confused with plenty of math formulas? Letโ€™s illustrate with an easy example ctly is going on here. Imagine it’s a hot day ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ and you desire to to predict sales of ice cream given the temperature.

Suppose we now have a dataset with one feature: Temperature and a goal variable Ice Cream Sales. The Temperature feature has three possible values (Low, Medium, and High).

Step one within the Cyclic Boosting algorithm is to calculate the worldwide mean of the goal variable Ice Cream Sales. This is completed by taking the common of all of the observed values of Ice Cream Sales within the training dataset.

Next, the algorithm estimates the weights for every bin of the Temperature feature. This is completed by dividing the information into bins based on the values of the feature and calculating the common goal value for every bin. For instance, suppose we now have the next data:

The worldwide mean of icecream sales could be calculated as (10+12+14+11+13+15)/6 = 12.5

Next, the algorithm would estimate the weights for every bin of the Temperature feature. For instance, it would estimate a weight of 0.8 for bin Low, a weight of 1.0 for the bin Medium, and a weight of 1.2 for the bin High. These weights are calculated based on how much the common goal value for every bin differs from the worldwide mean, which is actually what the long math formula means.

To predict a latest data point with Temperature = High, the algorithm would multiply the worldwide mean of Ice Cream Sales by the load estimate for the bin High (remember it’s a hot ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ day) of the feature Temperature (1.2). The resulting prediction could be 12.5 * 1.2 = 15.

That is just an easy example for example how Cyclic Boosting might be used to predict ice cream sales based on temperature alone. In practice, more features might be added to enhance the accuracy of the predictions.

Kaggle Competition โ€” predict sales of things at different

After doing the same old preprocessing and creating features to show the time series problem right into a supervised machine learning problem (remember CB is a generic supervised machine learning model, not a time series model as such), we are able to fit CB with the same old scikit-learn-based functionality.

# model training
model = cbm.CBM()
model.fit(x_train_df, y_train)

You possibly can then make predictions

# test on train error
y_pred_train = model.predict(x_train_df).flatten()
print('RMSE', mean_squared_error(y_pred_train, y_train, squared=False))

Finally, and that is where the important thing good thing about using Cyclic Boosting comes from, you’ll be able to produce vital plots showing how each factor and its level has contributed to predictions.

Additional materials:

  1. Cyclic Boosting โ€” an explainable supervised machine learning algorithm
  2. Cyclic Boosting implementation from BlueYonder
  3. Cyclic Boosting implementation from Microsoft
  4. ๐ƒ๐ž๐ž๐ฉ ๐‹๐ž๐š๐ซ๐ง๐ข๐ง๐  ๐ˆ๐ฌ ๐–๐ก๐š๐ญ ๐˜๐จ๐ฎ ๐ƒ๐จ ๐๐จ๐ญ ๐๐ž๐ž๐

LEAVE A REPLY

Please enter your comment!
Please enter your name here