YOLOv1 Loss Function Walkthrough: Regression for All

In my previous article I explained how YOLOv1 works and tips on how to construct the architecture from scratch with PyTorch. In today’s article, I’m going to deal with the loss function used to coach the model. I highly recommend you read my previous YOLOv1 article before reading this one because it covers numerous fundamentals that you must know. Click on the link at reference number [1] to get there.

What’s a Loss Function?

I imagine all of us already know that loss function is a particularly vital component in deep learning (and likewise machine learning), where it’s used to guage how good our model is in predicting the bottom truth. Generally speaking, a loss function should give you the chance to take two input values, namely the and the made by the model. This function goes to return a big value each time the prediction is much from the bottom truth. Conversely, the loss value will probably be small each time the model successfully gives a prediction near the goal.

Normally, a model is used for either classification or regression only. Nonetheless, YOLOv1 is a bit special because it incorporates a classification task — to categorise the detected objects, whereas the objects themselves will probably be enclosed with bounding boxes which the coordinates and the sizes are determined using continuous numbers — hence a regression task. We typically use cross entropy loss when coping with classification task, and for regression we are able to use something like MAE, MSE, SSE, or RMSE. But because the prediction made by YOLOv1 comprises each classification and regression directly, we’d like to create a custom loss function to accommodate each tasks. And here’s where things begin to get interesting.

Breaking Down the Components

Now let’s have a take a look at the loss function itself. Below is what it looks like in keeping with the unique YOLOv1 paper [2].

Figure 1. The loss function of YOLOv1 [2].

Yes, the above equation looks scary at glance, and that’s exactly what I felt once I first saw it. But don’t worry as you will discover this equation straightforward as we get deeper into it. I’ll definitely try my best to elucidate all the things in easy words.

Here you’ll be able to see that the loss function mainly consists of 5 rows. Now let’s get into each of them one after the other.

Row #1: Loss

Figure 2. The part for calculating the midpoint coordinate prediction loss [2].

The primary term of the loss function focuses on evaluating the article midpoint coordinate prediction. You may see in Figure 2 above that what it essentially does is solely comparing the anticipated midpoint (, ) with the corresponding goal midpoint () by subtraction before summing the squared results of the and the parts. We do that iteratively for the 2 predicted bounding boxes () inside all cells () and sum the error values from all of them. Or in other words, what we mainly do here is to compute the SSE (Sum of Squared Errors) of the coordinate predictions. Assuming that we use the default YOLOv1 configuration (i.e., =7 and =2), we can have the primary and the second sigma iterates 49 and a couple of times, respectively.

Moreover, the variable you see here’s a binary mask, through which the worth can be 1 each time there’s an object midpoint inside the corresponding cell in the bottom truth. But when there isn’t a object midpoint contained inside, then the worth can be 0 as an alternative, which cancels out all operations inside that cell because there’s indeed nothing to predict.

Row #2: Size Loss

Figure 3. The part for calculating the bounding box size prediction loss [2].

The main target of the second row is to guage the correctness of the bounding box size. I imagine the above variables are pretty straightforward: denotes the width and denotes the peak, where those with are the predictions made by the model. Should you take a more in-depth take a look at this row, you’ll notice that this is largely the identical because the previous one, except that here we take the square root of the variables first before doing the remaining computation.

Using square root is definitely a really clever idea. Naturally, if we directly compute the variables as they’re (without square root), the identical inaccuracy on small bounding box can be weighted the identical as that on large bounding box. This is definitely not a very good thing because the identical deviation within the variety of pixels on small box will visually appear more misaligned from the bottom truth than that of the larger box. Take a look at Figure 4 below to raised understand this concept. Here you’ll be able to see that though the deviation of each cases are 60 pixels on the peak axis, but on the smaller bounding box the error appears worse. That is usually because within the case of the smaller box the deviation of 60 pixels is 75% of the particular object height, whereas on the larger box it only deviates 25% from the goal height.

Figure 4. The identical deviation within the variety of pixels will appear worse on small object than that of the larger one [3].

By taking the square root of and , we can have inaccuracy in smaller box penalized greater than that within the larger one. Let’s do a bit of little bit of math to prove this. To make things simpler, I put the 2 examples in Figure 4 to Gemini and let it compute the peak prediction error based on the equation in Figure 3. You may see the result below that the error of the small bounding box prediction is bigger than that of the massive bounding box (8.349 vs 3.345).

Figure 5. Proof that the square root operation allows us to present higher penalty for inaccuracy on smaller box [3].

Row #3: Object Loss

Figure 6. The part for computing the article loss [2].

Moving on to the third row, this a part of the YOLOv1 loss function is used to measure how confident the model is in predicting whether or not there’s an object inside a cell. At any time when an object is present in the bottom truth, we’d like to set to the IoU of the bounding box. Assuming that the anticipated box perfectly matches with the goal box, we essentially want our model to provide near 1. But when the anticipated box shouldn’t be quite accurate, say it has an IoU of 0.8, then we expect our model to provide near 0.8 as well. Just consider it like this: if the bounding box itself is inaccurate, then we must always expect our model to know that the article doesn’t perfectly present inside that box. Meanwhile, each time an object is indeed not present in the bottom truth, then the variable must be exactly 0. Again, we then sum all of the squared difference between and across all predictions made throughout your complete image to acquire the of a single image.

It’s value noting that is designed to reflect two things concurrently: the probability that the article being there (a.k.a. ) and the accuracy of the bounding box (IoU). This is basically the rationale that we define ground truth because the multiplication of the and the IoU as mentioned within the paper. By doing so, we implicitly ask the model to present , whose value incorporates each components.

Figure 7. Bounding box confidence is defined because the multiplication of objectness and IoU [2].

As a refresher, IoU is a metric we commonly use to measure how good our bounding box prediction is in comparison with the bottom truth when it comes to area coverage. The technique to compute IoU is solely to take the ratio of the intersection of the goal and predicted bounding boxes to the union of them, hence the name: .

Figure 8. An illustration of tips on how to compute IoU [3]. The IoU of two bounding boxes that completely overlap one another is 1, whereas if two bounding boxes don’t overlap in any respect then the IoU can be 0.

Row #4: No Object Loss

Figure 9. The so-called no-object loss term within the YOLOv1 loss function [2].

The so-called is sort of unique. Despite having the same computation because the within the third row, the binary mask causes this part to work something just like the inverse of the . It is because the binary mask value can be 1 if there isn’t a object midpoint present inside a cell in the bottom truth. Otherwise, if an object midpoint is present, then the binary mask can be 0, causing the remaining operations for that single cell to be canceled out. So in brief, this row goes to return a non-zero number each time there isn’t a object in the bottom truth but is predicted as containing an object midpoint.

Row #5: Classification Loss

Figure 10. The part for computing object classification loss [2].

The last row within the YOLOv1 loss function is the . This a part of the loss function is probably the most straightforward if I were to say, because what we essentially do here is just to match the actual and the anticipated class, which has similarities to the one in the standard multi-class classification task. Nonetheless, what that you must be mindful here is that we still use the identical regression loss (i.e., SSE) to compute the error. It’s mentioned within the paper that the authors decided to make use of this regression loss for each the regression and the classification parts for the sake of simplicity.

Adjustable Parameters

Notice that I actually haven’t discussed the and parameters. The previous is used to present more weight to the bounding box prediction, which is why it’s applied to the primary and the second row of the loss function. You may return to Figure 1 to confirm this. The parameter by default is about to a big value (i.e., 5) because we wish our model to deal with the correctness of the bounding box creation. So, any small inaccuracy within the prediction will probably be penalized 5 times larger than what it must be.

Meanwhile, is used to manage the , i.e., the one within the fourth row within the loss function. It’s mentioned within the paper that the authors set a default value of 0.5 for this parameter, which mainly causes the part to not be weighted as much. This is largely because within the case of object detection the variety of objects is usually much lower than the entire variety of cells, causing nearly all of the cells to not contain an object. Thus, if we don’t give a small multiplier to the term, the will give a really high contribution to the entire loss, which the truth is shouldn’t be that vital. By setting to a small number, we are able to suppress the contribution of this loss.

Code Implementation

I do acknowledge that our previous discussion was very mathy. Don’t worry if you happen to haven’t grasped your complete idea of the loss function just yet. I imagine you’ll eventually understand once we get into the code implementation.

So now, let’s start the code by importing the required modules as shown in Codeblock 1 below.

# Codeblock 1
import torch
import torch.nn as nn

The IoU Function

Before we get into the YOLOv1 loss, we are going to first create a helper to calculate IoU, which will probably be used contained in the predominant YOLOv1 function. Take a look at the Codeblock 2 below to see how I implement it.

# Codeblock 2
def intersection_over_union(boxes_targets, boxes_predictions):

    box2_x1 = boxes_targets[..., 0:1] - boxes_targets[..., 2:3] / 2
    box2_y1 = boxes_targets[..., 1:2] - boxes_targets[..., 3:4] / 2
    box2_x2 = boxes_targets[..., 0:1] + boxes_targets[..., 2:3] / 2
    box2_y2 = boxes_targets[..., 1:2] + boxes_targets[..., 3:4] / 2
    
    box1_x1 = boxes_predictions[..., 0:1] - boxes_predictions[..., 2:3] / 2
    box1_y1 = boxes_predictions[..., 1:2] - boxes_predictions[..., 3:4] / 2
    box1_x2 = boxes_predictions[..., 0:1] + boxes_predictions[..., 2:3] / 2
    box1_y2 = boxes_predictions[..., 1:2] + boxes_predictions[..., 3:4] / 2

    x1 = torch.max(box1_x1, box2_x1)
    y1 = torch.max(box1_y1, box2_y1)
    x2 = torch.min(box1_x2, box2_x2)
    y2 = torch.min(box1_y2, box2_y2)

    intersection = (x2 - x1).clamp(0) * (y2 - y1).clamp(0)    #(1)

    box1_area = torch.abs((box1_x2 - box1_x1) * (box1_y2 - box1_y1))
    box2_area = torch.abs((box2_x2 - box2_x1) * (box2_y2 - box2_y1))

    union = box1_area + box2_area - intersection + 1e-6       #(2)

    iou = intersection / union    #(3)

    return iou

The intersection_over_union() function above works by taking two input parameters, namely the bottom truth (boxes_targets) and the anticipated bounding boxes (boxes_predictions). These two inputs are mainly arrays of length 4, storing the , , , and values. Note that and are the coordinate of the box midpoint, not the top-left corner. The bounding box information is then extracted in order that we are able to compute the intersection (#(1)) and the union (#(2)). We are able to finally obtain the IoU using the code at line #(3). Along with line #(2), here we also have to add a really small value at the top of the operation (1e-6 = 0.000001). This number is beneficial to forestall division-by-zero error within the case when the world of the anticipated bounding box is 0 for some reasons.

Now let’s run the intersection_over_union() function we just created on several test cases with a purpose to check if it really works properly. The three examples in Figure 11 below show intersections with high, medium, and low IoU (from left to right, respectively).

Figure 11. Bounding box with different overlaps [3].

All of the boxes you see here have the scale of 200×200 px, and what makes the three cases different is simply their area of the intersections. Should you take a more in-depth take a look at the Codeblock 3 below, you will notice that the anticipated boxes (pred_{0,1,2}) are shifted by 20, 100, and 180 pixels from their respective targets (target_{0,1,2}) along each the horizontal and vertical axes.

# Codeblock 3
target_0 = torch.tensor([[0., 0., 200., 200.]])
pred_0   = torch.tensor([[20., 20., 200., 200.]])
iou_0    = intersection_over_union(target_0, pred_0)
print('iou_0:', iou_0)

target_1 = torch.tensor([[0., 0., 200., 200.]])
pred_1   = torch.tensor([[100., 100., 200., 200.]])
iou_1    = intersection_over_union(target_1, pred_1)
print('iou_1:', iou_1)

target_2 = torch.tensor([[0., 0., 200., 200.]])
pred_2   = torch.tensor([[180., 180., 200., 200.]])
iou_2    = intersection_over_union(target_2, pred_2)
print('iou_2:', iou_2)

Because the above code is run, you’ll be able to see that our example on the left has the very best IoU of 0.6807, followed by the one in the center and the one on the best with the scores of 0.1429 and 0.0050, a trend that is precisely what we expected earlier. This essentially proves that our intersection_over_union() function works well.

# Codeblock 3 Output
iou_0: tensor([[0.6807]])
iou_1: tensor([[0.1429]])
iou_2: tensor([[0.0050]])

The YOLOv1 Loss Function

There’s actually one other thing we’d like to do before creating the loss function, namely instantiating an nn.MSELoss instance which can help us compute the error values across all cells. Because the name suggests, this function by default will compute MSE (Mean Squared Error). Since we wish the error value to be summed as an alternative of averaged, we’d like to set the reduction parameter to sum as shown in Codeblock 4 below. Next, we initialize the lambda_coord, lambda_noobj, S, B, and C parameters, which on this case I set all of them to their default values mentioned in the unique paper. Here I also initialize the BATCH_SIZE parameter which indicates the variety of samples we’re going to process in a single forward pass.

# Codeblock 4
sse = nn.MSELoss(reduction="sum")

lambda_coord = 5
lambda_noobj = 0.5

S = 7
B = 2
C = 20

BATCH_SIZE = 1

Alright, as all pre-requisite variables have been initialized, now let’s actually define the loss() function for the YOLOv1 model. This function is sort of long, so I made a decision to interrupt it down into several parts. Just be sure that all the things is placed inside the same cell if you need to try running this code on your individual notebook.

You may see in Codeblock 5a below that this function takes two input arguments: goal and prediction (#(1)). Do not forget that originally the output of YOLOv1 (the prediction) is a protracted single dimensional tensor of length 1470, whereas the length of the goal tensor is 1225. What we’d like to do first contained in the loss() function is to reshape them into 7×7×30 (#(3)) and seven×7×25 (#(2)), respectively, in order that we are able to process the knowledge contained in each tensors easily.

# Codeblock 5a
def loss(goal, prediction):    #(1)
    
    goal = goal.reshape(-1, S, S, C+5)                #(2)
    prediction = prediction.reshape(-1, S, S, C+B*5)      #(3)

    obj = goal[..., 20].unsqueeze(3)      #(4)
    noobj = 1 - obj                         #(5)

Next, the code at lines #(4) and #(5) are only how we implement the and binary masks. At line #(4) we take the worth at index 20 from the goal tensor and store it in obj variable. Index 20 itself corresponds to the bounding box confidence (see Figure 12), which if there’s an object midpoint inside the cell, then the worth of that index can be 1. Otherwise, if object midpoint shouldn’t be present, then the worth can be 0. Conversely, the noobj variable I initialize at line #(5) will act because the inverse of obj, which the worth can be 1 if there isn’t a object midpoint present within the grid cell.

Figure 12. What the goal and prediction vector for every grid cell looks like. The goal bounding box confidence is stored at index 20, whereas the anticipated bounding box confidences are at index 20 and 25, each of their corresponding vectors [3]. Read more about this in my previous article at reference number [1].

Now let’s move on to Codeblock 5b, where we compute the bounding box error, which corresponds to the primary and the second rows of the loss function. What we essentially do initially is to take the values from the goal tensor (indices 21, 22, 23, and 24). This may be done with a straightforward array slicing technique as shown at line #(1). Next, we do the identical thing to the predicted tensor. Nonetheless, keep in mind that since our model generates two bounding boxes for every cell, we’d like to store their values into two separate variables: pred_bbox0 and pred_bbox1 (#(2–3)).

In Figure 12, the sliced indices are those known as , , , , and , , , . Among the many two bounding box predictions, we are going to only take the one which best approximates the goal box. Hence, we’d like to compute the IoU between each predicted boxes and the goal box using the code at line #(4) and #(5). The expected bounding box that produces the very best IoU is chosen using torch.max() at line #(6). The values of the most effective bounding box prediction will then be stored in best_bbox, whereas the corresponding information of the box that has the lower IoU will probably be discarded (#(8)). At lines #(7) and #(8) itself we multiply each the actual and the most effective predicted with obj, which is how we apply the mask.

At this point we have already got our and values able to be processed with the sse function we initialized earlier. Nonetheless, keep in mind that we still have to apply square root to and beforehand, which I do at line #(9) and #(10) for the goal and the most effective prediction vectors, respectively. One thing that that you must be mindful at line #(10) is that we must always take absolutely the value of the numbers before applying torch.sqrt() just to forestall us from computing the square root of negative numbers. Not only that, it is usually obligatory so as to add a really small number (1e-6) to be sure that we won’t take the square root of 0, which can cause numerical instability. Still with the identical line, we then multiply the resulting tensor with its original sign that we preserved earlier using torch.sign().

Finally, as we’ve applied torch.sqrt() to the and components of target_bbox and best_bbox, we are able to now pass each tensors to the sse() function as shown at line #(11). Note that the loss value stored in bbox_loss already includes each the error from the primary and the second row of the YOLOv1 loss function.

# Codeblock 5b
    target_bbox = goal[..., 21:25]      #(1)
    
    pred_bbox0 = prediction[..., 21:25]   #(2)
    pred_bbox1 = prediction[..., 26:30]   #(3)
    
    iou_pred_bbox0 = intersection_over_union(pred_bbox0, target_bbox)  #(4)
    iou_pred_bbox1 = intersection_over_union(pred_bbox1, target_bbox)  #(5)
    
    iou_pred_bboxes = torch.cat([iou_pred_bbox0.unsqueeze(0), 
                                 iou_pred_bbox1.unsqueeze(0)], 
                                dim=0)
    
    best_iou, best_bbox_idx = torch.max(iou_pred_bboxes, dim=0)    #(6)
    
    target_bbox = obj * target_bbox                                #(7)
    best_bbox   = obj * (best_bbox_idx*pred_bbox1                  #(8)
                         + (1-best_bbox_idx)*pred_bbox0)

    target_bbox[..., 2:4] = torch.sqrt(target_bbox[..., 2:4])      #(9)
    best_bbox[..., 2:4]   = torch.sign(best_bbox[..., 2:4]) * torch.sqrt(torch.abs(best_bbox[..., 2:4]) + 1e-6)  #(10)

    bbox_loss = sse(          #(11)
        torch.flatten(target_bbox, end_dim=-2),
        torch.flatten(best_bbox, end_dim=-2)
    )

The following component we are going to implement is the . Take a take a look at the Codeblock 5c below to see how I do this.

# Codeblock 5c
    target_bbox_confidence = goal[..., 20:21]      #(1)
    pred_bbox0_confidence = prediction[..., 20:21]   #(2)
    pred_bbox1_confidence = prediction[..., 25:26]   #(3)
    
    target_bbox_confidence = obj * target_bbox_confidence                   #(4)
    best_bbox_confidence   = obj * (best_bbox_idx*pred_bbox1_confidence     #(5)
                                    + (1-best_bbox_idx)*pred_bbox0_confidence)
    
    object_loss = sse(      #(6)
        torch.flatten(obj * target_bbox_confidence * best_iou),           #(7)
        torch.flatten(obj * best_bbox_confidence),
    )

What we initially do within the codeblock above is to take the worth at index 20 from the goal vector (#(1)). Meanwhile for the prediction vector, we’d like to take the values at indices 20 and 25 (#(2–3)), through which they correspond to the boldness scores of every of the 2 boxes generated by the model. You may return to Figure 12 to confirm this.

Next, at line #(5) I take the boldness of the box prediction that has the upper IoU. The code at line #(4) is definitely not obligatory because obj and target_bbox_confidence are mainly the identical thing. You may confirm this by checking the code at line #(4) in Codeblock 5a. I actually do that anyway for the sake of clarity because we essentially have each and multiplied with in the unique equation (see Figure 6).

Afterwards, we compute the SSE between the bottom truth confidence (target_bbox_confidence) and the anticipated confidence (best_bbox_confidence) (#(6)). It’s important to notice at line #(7) that we’d like to multiply the bottom truth confidence with the IoU of the most effective bounding box prediction (best_iou). It is because the paper mentions that each time there’s an object midpoint inside a cell, then we wish the prediction confidence equal to that IoU rating. — And this ends our discussion concerning the implementation of .

Now the Codeblock 5d below focuses on computing the . The code is sort of easy since here we reuse the target_bbox_confidence and the pred_bbox{0,1}_confidence we initialized within the previous codeblock. These variables should be multiplied with the noobj mask before the SSE computation is performed. Note that the error made by the 2 predicted boxes must be summed, which is the rationale why you see the addition operation at line #(1).

# Codeblock 5d
    no_object_loss = sse(
        torch.flatten(noobj * target_bbox_confidence),
        torch.flatten(noobj * pred_bbox0_confidence),
    )
    
    no_object_loss += sse(          #(1)
        torch.flatten(noobj * target_bbox_confidence),
        torch.flatten(noobj * pred_bbox1_confidence),
    )

Lastly, we compute the classification loss using the Codeblock 5e below, through which this corresponds to the fifth row in the unique equation. Do not forget that the unique YOLOv1 was trained on the 20-class PASCAL VOC dataset. This is largely the rationale that we take the primary 20 indices from the goal and prediction vectors (#(1–2)). Then, we are able to simply pass the 2 into the sse() function (#(3)).

# Codeblock 5e
    target_class = goal[..., :20]      #(1)
    pred_class = prediction[..., :20]    #(2)
    
    
    class_loss = sse(      #(3)
        torch.flatten(obj * target_class, end_dim=-2),
        torch.flatten(obj * pred_class, end_dim=-2),
    )

As we’ve already accomplished the five components of the YOLOv1 loss function, what we’d like to do now could be to sum all the things up using the next codeblock. Don’t forget to present weightings to bbox_loss and no_object_loss by multiplying them with their corresponding lambda parameters we initialized earlier (#(1–2)).

# Codeblock 5f
    total_loss = (
        lambda_coord * bbox_loss           #(1)
        + object_loss
        + lambda_noobj * no_object_loss    #(2)
        + class_loss
    )
    
    return bbox_loss, object_loss, no_object_loss, class_loss, total_loss

Test Cases

On this section I’m going to reveal tips on how to run the loss() function we just created on several test cases. Now listen to the Figure 13 below as I’ll make the next test cases based on this image.

Bounding Box Loss Example

The bbox_loss_test() function in Codeblock 6 below focuses on testing whether the bounding box loss is working properly. On the lines marked with #(1) and #(2) I initialize two all-zero tensors which I consult with as goal and prediction. I set the scale of those two tensors to 1×7×7×25 and 1×7×7×30, respectively, in order that we are able to modify the weather intuitively. We assume that the image in Figure 13 as the bottom truth, hence we’d like to store the bounding box information within the corresponding indices of the goal tensor.

The indexer [0] within the 0th axis indicates that we access the primary (and the just one) image within the batch (#(3)). Next, [3,3] within the 1st and 2nd axes denotes the situation of the grid cell where the article midpoint is situated. We slice the tensor with [21:25] because we wish to update the values contained in these indices with [0.4, 0.5, 2.4, 3.2], through which they correspond to the , , and values of the bounding box. The worth at index 20, which is where the goal bounding box confidence is stored, is about to 1 because the object midpoint is situated inside this cell (#(4)). Next, the index that corresponds to class (the category at index 7) also must be set to 1 (#(5)), similar to how we create one-hot encoding label in a typical classification task. You may refer back to Figure 12 to confirm that the category is indeed on the seventh index.

# Codeblock 6
def bbox_loss_test():
    goal = torch.zeros(BATCH_SIZE, S, S, (C+5))        #(1)
    prediction = torch.zeros(BATCH_SIZE, S, S, (C+B*5))  #(2)
    
    goal[0, 3, 3, 21:25] = torch.tensor([0.4, 0.5, 2.4, 3.2])    #(3)
    goal[0, 3, 3, 20] = 1.0    #(4)
    goal[0, 3, 3, 7] = 1.0     #(5)
    
    prediction[0, 3, 3, 21:25] = torch.tensor([0.4, 0.5, 2.4, 3.2])       #(6)
    #prediction[0, 3, 3, 21:25] = torch.tensor([0.4, 0.5, 2.8, 4.0])      #(7)
    #prediction[0, 3, 3, 21:25] = torch.tensor([0.3, 0.2, 3.2, 4.3])      #(8)
    
    goal = goal.reshape(BATCH_SIZE, S*S*(C+5))            #(9)
    prediction = prediction.reshape(BATCH_SIZE, S*S*(C+B*5))  #(10)

    bbox_loss = loss(goal, prediction)[0]    #(11)
    
    return bbox_loss

bbox_loss_test()

You may see within the above codeblock that I prepared three test cases at line #(6–8), through which the one at line #(6) is a condition where the anticipated bounding box midpoint and the article size matches exactly with the bottom truth. In that exact case, our bbox_loss can be 1.8474e-13, which is a particularly small number. Do not forget that it doesn’t return exactly 0 due to 1e-6 we added throughout the IoU and the square root calculations. Meanwhile within the second test case, I assume that the midpoint prediction is correct, however the box size is a bit too large. Should you attempt to run this, we can have our bbox_loss increase to 0.0600. Third, I further enlarge the bounding box prediction and likewise shift from the actual position. And in such a case, our bbox_loss gets even larger to 0.2385.

By the best way, it is crucial to keep in mind that the loss function we defined earlier expects the goal and prediction tensors to have the size of 1×1225 and 1×1470, respectively. Hence, we’d like to reshape them (#(9–10)) accordingly before eventually computing the loss value (#(11)).

# Codeblock 6 Output
Case 1: tensor(1.8474e-13)
Case 2: tensor(0.0600)
Case 3: tensor(0.2385)

Object Loss Example

To ascertain whether the is correct, we’d like to deal with the worth at index 20. What we do initially within the object_loss_test() function below is comparable to the previous one, namely creating the goal and prediction tensors (#(1–2)) and initializing ground truth vector for cell (3, 3) (#(3–5)). Here we assume that the bounding box prediction perfectly aligns with the actual bounding box (#(6)).

# Codeblock 7
def object_loss_test():
    goal = torch.zeros(BATCH_SIZE, S, S, (C+5))        #(1)
    prediction = torch.zeros(BATCH_SIZE, S, S, (C+B*5))  #(2)
    
    goal[0, 3, 3, 21:25] = torch.tensor([0.4, 0.5, 2.4, 3.2])      #(3)
    goal[0, 3, 3, 20] = 1.0    #(4)
    goal[0, 3, 3, 7] = 1.0     #(5)
    
    prediction[0, 3, 3, 21:25] = torch.tensor([0.4, 0.5, 2.4, 3.2])  #(6)
    
    prediction[0, 3, 3, 20] = 1.0    #(7)
    #prediction[0, 3, 3, 20] = 0.9   #(8)
    #prediction[0, 3, 3, 20] = 0.6   #(9)
    
    goal = goal.reshape(BATCH_SIZE, S*S*(C+5))
    prediction = prediction.reshape(BATCH_SIZE, S*S*(C+B*5))

    object_loss = loss(goal, prediction)[1]
    
    return object_loss

object_loss_test()

I’ve arrange three test cases specifically for the . The primary one is the case when the model is perfectly confident that there’s a box midpoint inside the cell, or in other words, it is a condition where the boldness is 1 (#(7)). Should you attempt to run this, the resulting object loss can be 1.4211e-14, which is again a worth very near zero. It’s also possible to see within the resulting output below that the increases to 0.0100 and 0.1600 as we decrease the anticipated confidence to 0.9 and 0.6 (#(8–9)), which is precisely what we expected.

# Codeblock 7 Output
Case 1: tensor(1.4211e-14)
Case 2: tensor(0.0100)
Case 3: tensor(0.1600)

Classification Loss Example

Talking concerning the , let’s now see if our loss function can really penalize misclassifications. Identical to the previous ones, within the Codeblock 8 below I prepared three test cases, through which the primary one is the condition where the model appropriately gives perfect confidence to class and at the identical time leaving all other class probabilities to 0 (#(1)). Should you attempt to run this, the resulting classification loss can be exactly 0. Next, if you happen to decrease the boldness of predicting to 0.9 while barely increasing the boldness for sophistication (index 8) to 0.1 as shown at line #(2), we are going to get our classification loss to extend to 0.0200. The loss value gets even larger to 1.2800 once I assume that the model misclassifies as by assigning a really low confidence for the (0.2) and a high confidence for the (0.8) (#(3)). This essentially indicates that our loss function implementation is shown to give you the chance to measure errors in classification properly.

# Codeblock 8
def class_loss_test():
    goal = torch.zeros(BATCH_SIZE, S, S, (C+5))
    prediction = torch.zeros(BATCH_SIZE, S, S, (C+B*5))
    
    goal[0, 3, 3, 21:25] = torch.tensor([0.4, 0.5, 2.4, 3.2])
    goal[0, 3, 3, 20] = 1.0
    goal[0, 3, 3, 7] = 1.0
    
    prediction[0, 3, 3, 21:25] = torch.tensor([0.4, 0.5, 2.4, 3.2])
    
    prediction[0, 3, 3, 7] = 1.0    #(1)
    #prediction[0, 3, 3, 7:9] = torch.tensor([0.9, 0.1])    #(2)
    #prediction[0, 3, 3, 7:9] = torch.tensor([0.2, 0.8])    #(3)
    
    goal = goal.reshape(BATCH_SIZE, S*S*(C+5))
    prediction = prediction.reshape(BATCH_SIZE, S*S*(C+B*5))

    class_loss = loss(goal, prediction)[3]
    
    return class_loss

class_loss_test()

# Codeblock 8 Output
Case 1: tensor(0.)
Case 2: tensor(0.0200)
Case 3: tensor(1.2800)

No Object Loss Example

Now with a purpose to test our implementation on the part, we’re going to examine the cell that doesn’t contain any object midpoint, which here I provide you with the grid cell at coordinate (1, 1). For the reason that object within the image is simply the one situated at grid cell (3, 3), the goal bounding box confidence for coordinate (1, 1) must be set to 0, as shown at line #(1) in Codeblock 9. The truth is, this step shouldn’t be quite obligatory because we already set the tensors to be all-zero in the primary place — but I do it anyway for clarity. Do not forget that this part will probably be activated only when the goal bounding box confidence is 0 like this. Otherwise, each time the goal box confidence is 1 (i.e., there’s an object midpoint inside the cell), then the part will all the time return 0.

Here I prepared two test cases, through which the primary one is when the values at indices 20 and 25 of the prediction tensor are each 0 as written at line #(2) and #(3), namely when our YOLOv1 model appropriately predicts that there isn’t a bounding box midpoint inside the cell. The loss value will increase once we use the code at line #(4) and #(5) as an alternative, through which it simulates the model somewhat thinks that there must be objects in there while it is definitely not. You may see within the resulting output below that the loss value now increases to 0.1300, which is predicted.

# Codeblock 9
def no_object_loss_test():
    goal = torch.zeros(BATCH_SIZE, S, S, (C+5))
    prediction = torch.zeros(BATCH_SIZE, S, S, (C+B*5))
    
    goal[0, 1, 1, 20] = 0.0        #(1)

    prediction[0, 1, 1, 20] = 0.0    #(2)
    prediction[0, 1, 1, 25] = 0.0    #(3)

    #prediction[0, 1, 1, 20] = 0.2   #(4)
    #prediction[0, 1, 1, 25] = 0.3   #(5)
    
    goal = goal.reshape(BATCH_SIZE, S*S*(C+5))
    prediction = prediction.reshape(BATCH_SIZE, S*S*(C+B*5))

    no_object_loss = loss(goal, prediction)[2]
    
    return no_object_loss

no_object_loss_test()

# Codeblock 9 Output
Case 1: tensor(0.)
Case 2: tensor(0.1300)

Ending

And well, I feel that’s just about all the things concerning the loss function of the YOLOv1 model. We’ve completely discussed the formal mathematical expression of the loss function, implemented it from scratch, and performed testing on each of the components. Thanks very much for reading, I hope you learn something latest from this text. Please let me know if you happen to spot any mistakes in my explanation or within the code. See ya in my next article!

References

[1] Muhammad Ardi. YOLOv1 Paper Walkthrough: The Day YOLO First Saw the World. Towards Data Science. https://towardsdatascience.com/yolov1-paper-walkthrough-the-day-yolo-first-saw-the-world/ [Accessed December 18, 2025].

[2] Joseph Redmon . You Only Look Once: Unified, Real-Time Object Detection. Arxiv. https://arxiv.org/pdf/1506.02640 [Accessed July 25, 2024].

[3] Image created originally by writer.

[4] MuhammadArdiPutra. Regression For All — YOLOv1 Loss Function. GitHub. https://github.com/MuhammadArdiPutra/medium_articles/blob/predominant/Regression%20For%20All%20-%20YOLOv1%20Loss%20Function.ipynb [Accessed July 25, 2024].

YOLOv1 Loss Function Walkthrough: Regression for All

What’s a Loss Function?

Breaking Down the Components

Row #1: Loss

Row #2: Size Loss

Row #3: Object Loss

Row #4: No Object Loss

Row #5: Classification Loss

Adjustable Parameters

Code Implementation

The IoU Function

The YOLOv1 Loss Function

Test Cases

Bounding Box Loss Example

Object Loss Example

Classification Loss Example

No Object Loss Example

Ending

References

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Exposing biases, moods, personalities, and abstract concepts hidden in large language models

Accelerating Data Processing with NVIDIA Multi-Instance GPU and NUMA Node Localization

「データ不足」の壁を越える：合成ペルソナが日本のAI開発を加速

Microsoft has a brand new plan to prove what’s real and what’s AI online

Announcing our latest Gemini AI model

YOLOv1 Loss Function Walkthrough: Regression for All

What’s a Loss Function?

Breaking Down the Components

Row #1: Loss

Row #2: Size Loss

Row #3: Object Loss

Row #4: No Object Loss

Row #5: Classification Loss

Adjustable Parameters

Code Implementation

The IoU Function

The YOLOv1 Loss Function

Test Cases

Bounding Box Loss Example

Object Loss Example

Classification Loss Example

No Object Loss Example

Ending

References

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.