Home Artificial Intelligence Conformal prediction for regression The information The workflow Data processing Training and calibration Conformal prediction Predictions quality estimation Optimizing normalization sensitivity parameter beta Optimizing error rate “Easy” approach Conclusion References

Conformal prediction for regression The information The workflow Data processing Training and calibration Conformal prediction Predictions quality estimation Optimizing normalization sensitivity parameter beta Optimizing error rate “Easy” approach Conclusion References

2
Conformal prediction for regression
The information
The workflow
Data processing
Training and calibration
Conformal prediction
Predictions quality estimation
Optimizing normalization sensitivity parameter beta
Optimizing error rate
“Easy” approach
Conclusion
References

A elaborate image to attract attention produced by Midjourney that attempted to visualise the conformal prediction.

Apparently after writing a blog post on learn how to use conformal prediction for classification in Knime, I couldn’t but proceed and describe the regression case. This use case is a little more complicated, nevertheless I think I did my best to explain it well. As within the previous post there will likely be two approaches described — “advanced” and “easy”. So on this paper we’re going through the regression problem of predicting the worth of used cars, considering two essential parameters: error rate and normalization sensitivity parameter beta (each are described below). As at all times all the pieces will likely be wrapped into Integrated Deployment nodes so the workflows are ready for production immediately.

Today we’re going to work on predicting the worth for used cars (Kaggle) from such characteristics as producer, miles per gallon rate, 12 months of production, mileage, style of transmission and so forth. One can consider this as an application for individuals who would really like to sell their automotive and estimate how much money they could expect to earn, or for the businesses who’re dealing within the used cars market to assist with price adjustments.

Let’s first take a take a look at the information: the producer distribution and model per producer distribution (see figure 1).

Figure 1. The distribution of producers (top) and models per producer (bottom).

Here we are able to see that typically we now have loads of representatives for many of the producers, aside from Toyota and Škoda , where low counts are going to be a superb test for conformal prediction.

As within the previous post I’m going to concentrate on describing the “advanced” case that features training multiple models and getting calibration tables for them, optimizing error rate and normalization sensitivity parameter beta.

Those that are impatient can directly jump to the KNIME Community Hub and download the Conformal prediction regression advanced (link) and Conformal prediction regression easy (link) workflows. The implementation of each cases will likely be described below.

Initially it’s at all times good to ascertain if there are any correlated features, because it is the simplest, but not the one approach to finding redundant features. Because it is predicted there may be a correlation between 12 months and mileage (-0.74) which is sort of expected (see figure. 2): the older the automotive, the more miles it went over. This fashion we are able to eliminate the “12 months” feature, since “mileage” appears to be more granular and accurate.

Figure 2. Correlation matrix of features.

One other thing that’s typically good to do is to normalize the goal variable and the features, and this is applicable to our use case where we now have features of various scale, and that could be relevant for some predictive algorithms.

The information set is definitely quite big and excessive, so readers are free to manage the dimensions of the information sets for training, calibration, testing with Row Sampling node. That’s why I’m also planning to cut back the information set and take only 20% of the unique set, considering stratified sampling over goal variable — price. Finally the information set is split into 2 parts: training + calibration and test sets.

The ultimate part here is the component called Conformal prediction configuration where users are allowed to pick the essential parameters for conformal prediction regression: the specified error rate, whether there will likely be a normalization for conformal predictions, and if that’s the case then provide a normalization sensitivity parameter beta that has range (0; 1]. So long as normalization must be defined at each calibration and prediction steps it is healthier to settle this thing within the very starting as it is going to be more convenient to propagate these settings as flow variables. The a part of the workflow dedicated to processing step is shown on figure 3.

So long as there are already a loads of combos for the training process I encourage readers to spend a while fooling around with multiple combos of the initial conditions: normalize or don’t normalize the information, normalize or don’t normalize the conformal prediction. The experiments for optimizing beta and error rate will likely be described later in this text.

Figure 3. Initial processing and configuration a part of the workflow.

Then comes the identical procedure as for conformal prediction classification (see figure 4): split training + calibration into training set that will likely be used for training regression models and calibration set that will likely be used for creating calibration tables. In today’s example I again use the Random Forest model to suit the “price” column using all features aside from 12 months. The loop pattern is the next: the train data set is used for training the model, then this model is used to predict values on the calibration data set, then if normalization goes for use there ought to be introduced the measure of prediction difficulty (Sigma) — note that normalization is an optional step. Probably the most straightforward candidate for it’s an absolute error, since it is rather easy to calculate — in this instance we’re going to use it as well. Sigma can easily be calculated with a Math Formula node by taking absolute values from the difference between predicted price and real price.

Figure 4. Overview of the training and calibration step.

Once there are predictions and Sigma are ready they could be fed in to Conformal Calibrator (Regression) node (see figure 5). After that the pairs of models and calibration tables are gathered and synchronized along with Conformal Prediction Loop End node.

Figure 5. The dialog of the Conformal Calibrator (Regression) node. Here the user should provide the columns for the goal variable and the anticipated values, so the node can estimate the conformity of predicted values to the true ones and create a calibration table. There may be an optional feature to make use of normalization, in that case the user also needs to provide a problem column and the normalization sensitivity parameter beta.

This step starts with the Capture Workflow Start node since that is the part we’re going to take and use for deployment. It has 4 inputs: 3 tables and 1 flow variable:

  • input for the models obtained on the previous step;
  • input for calibration tables obtained on the previous step;
  • the brand new data for prediction — unlabeled or test data set;
  • the settings that were used during training — this is vital to make use of the identical setting as were used on the training step for normalization, for the reason that Conformal Predictor and Classifier (Regression) node expects the identical parameters utilized in Conformal Calibrator (Regression) node. Providing mixed settings will result in invalid predictions,

On this block the models and calibration tables are synchronized with Conformal Prediction Loop Start so the nodes iterates each pair: the model is used to predict the values for brand new data set, and calibration table is used for obtaining conformal prediction in Conformal Predictor and Classifier (Regression) node. Please concentrate that much like the previous step there comes a Math Formula node that calculates Sigma (absolute error) that could be used for predictions depending on the settings supplied with flow variables. On this workflow the settings for Conformal Predictor and Classifier (Regression) node are overwritten by flow variables for the user’s ease.

Finally all of the predictions are aggregated with median function within the Conformal Prediction Loop End node. After that there comes the Capture Workflow End node that wraps up the a part of the workflow we would really like to deploy. Then the workflow could be deployed with Workflow Author node or propagated to Workflow Executor node.

Once the second loop is over it is feasible to estimate the conformal prediction with Conformal Scorer (Regression) node. The essential metrics there are:

  • error rate — the experimental (real) one which we get within the predictions;
  • the mean interval size that describes the band inside which the predictions are positioned.

On the whole the experimental error rate corresponds to the expected one which we arrange within the Conformal Predictor and Classifier (Regression) node. The interval size expectedly increases with a decrease of the error rate — so here comes a trade-off between the interval size and error rate. This will likely result in two polar situations when there could be ridiculously large intervals between lower and upper sure which can definitely contain the true and predicted values, or the high percentage of the errors that the user is able to tolerate within the exchange of the smaller and more reasonable interval size. This will likely be discussed more within the error rate optimization section.

One thing which may potentially help with the trade-off issue described above is normalization of the conformal prediction. This method allows to cut back the interval size for the straightforward examples, and keep it greater for the difficult ones. Because it was previously stated, to calculate normalized values a measure of difficulty ought to be introduced, and probably the most straightforward and simple measure that could be obtained is absolute error. In that case beta is the sensitivity parameter of the normalization determining the influence of the normalization (you possibly can learn more about in these papers: PDF, PDF).

To be able to find one of the best values for beta there may be a separate branch within the workflow called “Beta optimization”. Please concentrate that it calls the workflow that has been described previously, so with a purpose to run this branch you possibly can either run the essential part or you possibly can deploy after which read it with the Workflow Reader node. The concept of the optimization branch is to check out a spread of values for beta, training and calculating calibration tables in accordance with this value after which estimate the prediction with a scorer node and manually analyze the information with a dashboard. In this instance I’m going to make use of interval [0.25; 1] with a step of 0.25 for beta values.

Figure 6. A branch of workflow for beta optimization.

Once the optimization loop is over and the prediction for all beta values are obtained they ought to be properly stacked to give you the option to plot and compare. To do that Group Loop Start is used to iterate over the groups of producer and beta values. Within the loop body the dynamic names for the columns are used to append them eventually within the Loop End (Column Append) node (this part could be seen on figure 6). Finally within the component called Select producer user can select which producer prices, predictions and bounds to plot, also in these components all of the values are denormalized so it is going to be possible to see meaningful price ranges.

Figure 7. The examples of the conformal prediction for various beta values for Ford Fiesta.

On the figure 7 one can see the instance of Ford Fiesta cars prices, predictions and 4 pairs of upper and lower predictions sure for all of the samples. On the whole this plot shouldn’t be useful for choosing the beta coefficient, moderately to research the prediction and its ranges — it is sort of handy to click on the legend to incorporate or exclude some lines. Good thing is that we are able to consult with the information from Conformal Scorer (Regression) node outputs — see figure 8.

Figure 8. The outcomes of beta optimization with fixed error rate = 0.1.

And here again one must define what’s one of the best criteria for choosing beta value. Let’s say you like to have the smallest minimal interval that it is healthier to take beta=0.25, same for median and mean values, however if one would really like to have the smallest maximum interval, then it is healthier to take beta=1. In the subsequent section I’m going to make use of beta=0.25, but again I encourage the readers to do their very own investigations.

One other parameter that’s interesting to optimize is after all error rate. On this post we’re going to do just about similar to within the classification case. The one difference is that now it is feasible to go with or without normalization and have beta as one other parameter. It has been optimized within the previous section, so on this experiment I’m going to have normalization with beta=0.25.

The part for optimizing error rate is just about similar to for optimizing beta. The range of values is [0; 0.25] with a step 0.05. The deployed workflow is executed using the error rate values from the interval, on the output of the loop end node we get estimation from the scorer and the conformal predictions. The identical way because it was implemented within the previous section the predictions are aligned by producer and the error rate value. The producer could be chosen in Select producer component and plotted with a Line Plot (Plotly) node downstream. So let’s have a take a look at the outcomes on figure 9!

Let’s take a take a look at Toyota because it was one among the producers with the least amount of records. There one can see that predictions are grouped by the models, so there are different levels of costs. By clicking on legend values it is feasible to remove or add different lines, so it’s more convenient to check the prediction bounds. Nonetheless the pattern here is sort of much like what we had within the classification case:

  • decreasing tolerable error rate results in increasing the upper and lower bounds, but at the identical time user can be sure that the true values usually tend to be inside this band;
  • vice versa — loosening the error rate results in smaller prediction bounds whereas more predictions might go outside of this sure.

Here one can even notice quite a ridiculous case for error rate=0, because the lower sure for nearly all of samples becomes negative, which in terms or price doesn’t make any sense in any respect. Yet one more note is that including normalization affects the bounds: for easier cases the sure becomes more narrow, while for the hard ones it becomes wider. It might be hard to see on this plot, nevertheless it is feasible to calculate the sure length with Math Formula node.

Figure 9. The plot of real price, predicted price and multiple prediction bounds for Toyota.

Nonetheless as an alternative of manual evaluation of prediction on the producer and even automotive model level it is healthier to have a look the aggregated Conformal Scorer (Regression) outputs (see figure 10) which might be plotted with an Error rate optimization evaluation component. At the highest plot one can see that the theoretical error rate (that we arrange within the node settings) corresponds to the true error rate (one we estimate with scorer) quite well. One other useful insight here comes from the second plot, where one can see how different aggregated intervals change depending on the error rate values that we utilized in the optimization loop.

So this plot could be useful to define the specified error rate depending on the user’s requirement for prediction — what ought to be the utmost upper or lower bounds, or perhaps it is healthier to search out optimal mean interval range. Unfortunately conformal prediction doesn’t provide a solution to all of the questions, its power is in providing a scale of the knowledge or uncertainty of the predictions for this particular data set. User takes the ultimate decision as usual.

Figure 10. Visualization by Error rate optimization evaluation component showing comparison of experimental vs theoretical error rate (top) and error rate vs aggregated intervals.

I also prepared the “easy” implementation of conformal prediction for regression. As within the previous post the simplicity means going without loops for training multiple models and obtaining multiple calibration tables. Also there isn’t any part for any parameter optimization, nevertheless it deploys the part with results post-processing, so it’s as SIMPLE as possible. This approach is nice for quick and dirty implementation, for first try or for studying purposes. The potential drawback is that the experimental error rate could be a bit higher than for the theoretical on account of multiple reasons related to the information set itself, complexity of the series of the goal variable, model and its parameters selection and other many reasons that we now have in data science. Nonetheless I encourage readers to offer it a try to compare two approaches.

On this blog post I described learn how to use conformal prediction for a regression case using Knime, how one can prepare the workflow for deployment with help of Integrated Deployment nodes , learn how to use the deployed workflow to optimize two essential parameters: the normalization sensitivity beta and error rate. We went through a use case with a reasonably easy data set that is sort of easily interpretable and got some insights from it.

The aim of this blog post, because it was for classification — is the gentle introduction with a practical example that could be easily tailored to a recent data set, that’s why I avoided math and formulas.

I actually hope the readers will find this series of blog posts useful. I also hope that these examples will encourage you to offer it a try to make use of conformal prediction in your projects. I understand that there could be more exciting and complex use cases for each classification and regression, comparable to anomaly detection or time-series evaluation, but perhaps I’ll describe them later.

  1. Advanced workflow
  2. Easy workflow
  3. First post explaining the speculation
  4. Previous post for classification case

2 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here