Unlock the Power of ROC Curves: Intuitive Insights for Higher Model Evaluation

all been in that moment, right? Looking at a chart as if it’s some ancient script, wondering how we’re speculated to make sense of all of it. That’s exactly how I felt once I was asked to clarify the AUC for the ROC curve at work recently.

Though I had a solid understanding of the mathematics behind it, breaking it down into easy, digestible terms proved to be a challenge. I spotted that if I used to be combating it, others probably were too. So, I made a decision to write down this text to share an intuitive solution to understand the AUC-ROC curve through a practical example. No dry definitions here—just clear, straightforward explanations focused on the intuition.

Here’s the code¹ utilized in this text.

Every data scientist goes through a phase of evaluating classification models. Amidst an array of evaluation metrics, Receiver Operating Characteristic (ROC) curve and the Area Under The Curve (AUC) is an indispensable tool for gauging model’s performance. On this comprehensive article, we are going to discuss basic concepts and see them in motion using our good old Titanic dataset².

Section 1: ROC Curve

To totally grasp the ROC curve, let’s delve into the concepts:

Sensitivity/Recall (True Positive Rate): Sensitivity quantifies a model’s adeptness at appropriately identifying positive instances. In our Titanic example, sensitivity corresponds to the the proportion of actual survival cases that the model accurately labels as positive.

Specificity (True Negative Rate): Specificity measures a model’s proficiency in appropriately identifying negative instances. For our dataset, it represents the proportion of actual non-survived cases (Survival = 0) that the model appropriately identifies as negative.

False Positive Rate: FPR measures the proportion of negative instances which can be incorrectly classified as positive by the model.

Notice that Specificity and FPR are complementary to one another. While specificity focuses on the proper classification of negative instances, FPR focuses on the wrong classification of negative instances as positive. Thus-

Now that we all know the definitions, let’s work with an example. For Titanic dataset, I even have built an easy logistic regression model that predicts whether the passenger survived the shipwreck or not, using following features: Note that, the model predicts the ‘probability of survival’. The default threshold for logistic regression in sklearn is 0.5. Nevertheless, this default threshold may not all the time make sense for the issue being solved and we want to mess around with the probability threshold i.e. if the anticipated probability > threshold, instance is predicted to be positive else negative.

Now, let’s revisit the definitions of Sensitivity, Specificity and FPR above. Since our predicted binary classification relies on the probability threshold, for the given model, these three metrics will change based on the probability threshold we use. If we use a better probability threshold, we are going to classify fewer cases as positives i.e. our true positives can be fewer, leading to lower Sensitivity/Recall. A better probability threshold also means fewer false positives, so low FPR. As such, increasing sensitivity/recall could lead on to increased FPR.

For our training data, we are going to use 10 different probability cutoffs and calculate Sensitivity/TPR and FPR and plot in a chart below. Note, the dimensions of circles within the scatterplot correspond to the probability threshold used for classification.

Chart 1: FPR vs TPR chart together with actual values within the DataFrame (image by creator)

Well, that’s it. The graph we created above plots Sensitivity (TPR) Vs. FPR at various probability thresholds IS the ROC curve!

In our experiment, we used 10 different probability cutoffs with an increment of 0.1 giving us 10 observations. If we use a smaller increment for the probability threshold, we are going to find yourself with more data points and the graph will appear to be our familiar ROC curve.

To verify our understanding, for the model we built for predicting passenger’s survival, we are going to loop through various predicted probability thresholds and calculate TPR, FPR for the testing dataset . Plot the ends in a graph and compare this graph with the ROC curve plotted using sklearn’s roc_curve³ .

Chart 2: sklearn ROC curve on the left and manually created ROC curve on right (image by creator)

As we are able to see, the 2 curves are almost an identical. Note the AUC=0.92 was calculated using the roc_auc_score⁴ function. We are going to discuss this AUC within the later a part of this text.

To summarize, ROC curve plots TPR and FPR for the model at various probability thresholds. Note that, the actual probabilities are NOT displayed within the graph, but one can assume that the observations on the lower left side of the curve correspond to higher probability thresholds (low TPR), and statement on the highest right side correspond to lower probability thresholds (high TPR).

To visualise what’s stated above, check with the below chart, where I even have tried to annotate TPR and FPR at different probability cutoffs.

Chart 3: ROC Curve with different probability cutoffs (image by creator)

Section 2: AUC

Now that we now have developed some intuition around what ROC curve is, the following step is to grasp Area Under the Curve (AUC). But before delving into the specifics, let’s take into consideration what an ideal classifier looks like. In the perfect case, we wish the model to realize perfect separation between positive and negative observations. In other words, the model assigns low probabilities to negative observations and high probabilities to positive observations with no overlap. Thus, there’ll exist some probability cut off, such that every one observations with predicted probability < cut off are negative, and all observations with probability >= cut off are positive. When this happens, True Positive Rate can be 1 and False Positive Rate can be 0.So the perfect state to realize is TPR=1 and FPR=0.

Normally, as TPR increases with lowering probability threshold, the FPR also increases . We would like TPR to be much higher than FPR. That is characterised by the ROC curve that’s bent towards the highest left side. The next ROC space chart shows the proper classifier with a blue circle (TPR=1 and FPR=0). Models that yield the ROC curve closer to the blue circle are higher. Among the many ROC curves in the next chart, light blue is best followed by green and orange. The dashed diagonal line represents random guesses (consider a coin flip).

Now that we understand ROC curves skewed to the highest left are higher, how will we quantify this? Well, mathematically, this could be quantified by calculating the Area Under the Curve. Among the many above ROC curves, the model corresponding to the sunshine blue ROC curve is healthier in comparison with green and orange because it has higher AUC.

But how is AUC calculated? Computationally, AUC involves integrating the Roc curve. For models generating discrete predictions, AUC could be approximated using the trapezoidal rule⁶. In its simplest form, the trapezoidal rule works by approximating the region under the graph as a trapezoid and calculating its area. I’ll probably discuss this in one other article.

This brings us to the last and essentially the most awaited part — learn how to intuitively make sense of AUC? Let’s say you built a primary version of a classification model with AUC 0.7 and also you later tremendous tune the model. The revised model has an AUC of 0.9. We understand that the model with higher AUC is healthier. But what does it really mean? What does it imply about our improved prediction power? Why does it matter? Well, there’s numerous literature explaining AUC and its interpretation. A few of them are too technical, some incomplete, and a few are outright fallacious! One interpretation that made essentially the most sense to me is:

AUC is the probability that a randomly chosen positive instance possesses a better predicted probability than a randomly chosen negative instance.

Let’s confirm this interpretation. For the easy logistic regression we built, we are going to visualize the anticipated probabilities of positive and negative classes (i.e. Survived the shipwreck or not).

Chart 5: Predicted Probabilities of Survived and Not Survived Passengers (image by creator)

We will see the model performs pretty much in assigning a better probability to Survived cases than those who didn’t. There’s some overlap of probabilities in the center section. The AUC calculated using the auc rating function in sklearn for our model on the test dataset is 0.92 (see chart 2). So based on the above interpretation of AUC, if we randomly select a positive instance and a negative instance, the probability that the positive instance could have a better predicted probability than the negative instance ought to be ~92%.

For this purpose, we are going to create pools of predicted probabilities of positive and negative outcomes. Now we randomly select one statement each from each the pools and compare their predicted probabilities. We repeat this 100K times. Later we calculate % of times the anticipated probability of a positive instance was > predicted probability of a negative instance. If our interpretation is correct, this ought to be equal to .

We did indeed get 0.92! Hope this helps.

Let me know your comments and be at liberty to attach with me on LinkedIn.

Note — this text is revised version of the original article that I wrote on Medium in 2023.

Unlock the Power of ROC Curves: Intuitive Insights for Higher Model Evaluation

Section 1: ROC Curve

Section 2: AUC

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Constructing LLM Apps That Can See, Think, and Integrate: Using o3 with Multimodal Input and Structured Output

An Interactive Guide to 4 Fundamental Computer Vision Tasks Using Transformers

Google rolls out 10 latest AI upgrades to Chrome, including Gemini integration

Google brings AI to Chrome

Deploying a PICO Extractor in Five Steps

Unlock the Power of ROC Curves: Intuitive Insights for Higher Model Evaluation

Section 1: ROC Curve

Section 2: AUC

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.