Home Artificial Intelligence Implementing math in deep learning papers into efficient PyTorch code: SimCLR Contrastive Loss

Implementing math in deep learning papers into efficient PyTorch code: SimCLR Contrastive Loss

Implementing math in deep learning papers into efficient PyTorch code: SimCLR Contrastive Loss


One of the perfect ways to deepen your understanding of the mathematics behind deep learning models and loss functions, and likewise an incredible strategy to improve your PyTorch skills is to get used to implementing deep learning papers all by yourself.

Books and blog posts could assist you to start in coding and learning the fundamentals in ML / DL, but after studying a number of of them and getting good on the routine tasks in the sphere, you’ll soon realize that you simply are on your individual in the training journey and also you’ll find many of the resources online as boring and too shallow. Nevertheless, I think that for those who can study recent deep learning papers as they get published and understand the required pieces of math in it (not necessarily all of the mathematical proofs behind authors’ theories), and, you might be a capable coder who can implement them into efficient code, nothing can stop you from staying up thus far in the sphere and learning recent ideas.

Contrastive Loss implementation

I’ll introduce my routine and the steps I follow to implement math in deep learning papers using a not trivial example: The within the SimCLR paper.

Here’s the mathematical formulation of the loss:

Contrastive (NT-Xent) loss from the SimCLR paper | from https://arxiv.org/pdf/2002.05709.pdf

I agree that the mere look of the formula could possibly be daunting! and also you could be considering that there should be lot’s of ready PyTorch implementations on GitHub, so let’s use them 🙂 and Yes, you’re right. There are dozens of implementations online. Nevertheless, I believe that is a very good example for practicing this skill and will function a very good place to begin.

Steps to implement math in code

My routine in implementing the mathematics in papers into efficient PyTorch code is as follows:

  1. Understand the mathematics, explain it in easy terms
  2. Implement an initial version using easy Python , no fancy matrix multiplications for now
  3. Convert your code into PyTorch code

OK, let’s get straight to step one.

Step 1: Understanding the mathematics and explaining it in easy terms

I’m assuming that you have got a basic knowledge of linear algebra and are accustomed to mathematical notations. Should you’re not, you should use this tool to know what each of those symbols are and what they do in math, just by drawing the symbol. You can even check this awesome Wikipedia page where many of the notations are described. These are the opportunities in where you learn recent stuff, in searching and reading what is required on the time you would like it. I think it’s a more efficient way of learning, as a substitute of starting with a math textbook from scratch and putting it away after a number of days 🙂

Back to our business. Because the paragraph above the formula adds more context, within the SimCLR learning strategy you begin with N images and transform each of them 2 times to get augmented views of those images (2*N images now). Then, you pass these 2 * N images through a model to get embedding vectors for every of them. Now, you must make the embedding vectors of the two augmented views of the identical image (a positive pair) closer within the embedding space (and do the identical for all the opposite positive pairs). One strategy to measure how similar (close, in the identical direction) two vectors are, is through the use of which is defined as sim(u, v) (look up the definition within the image above).

In easy terms, what the formula is describing is that for every item in our batch, which is the embedding of considered one of the augmented views of a picture, (Remember: the batch accommodates all of the embeddings of the augmented views of various images → if starting w/ N images, the batch has a size of two*N), we first find the embedding of the opposite augmented view of that image to make a positive pair. Then, we calculate the cosine similarity of those two embeddings and exponentiate it (the of the formula). Then, we calculate the exponentiate of the cosine similarity of all the opposite pairs we will construct with our first embedding vector with which we began (aside from the pair with itself, that is what that 1[k!=i] means within the formula), and we sum them up to construct the . We will now divide the numerator by denominator and take the natural Log of that and flip the sign! Now, we now have the lack of the primary item in our batch. We want to only repeat the identical process for all the opposite items within the batch after which take the common to give you the option to call .backward() approach to PyTorch to calculate the gradients.

Step 2: Implementing it using easy Python code, with naive “for” loops!

Easy Pythonic implementation, using slow “for” loops

Let’s go over the code. Let’s say we now have two images: A and B. The variable aug_views_1 holds the embeddings (each with size 3) of 1 augmented view of those two images (A1 and B1), same as aug_views_2 (A2 and B2); so, the primary item in each matrixes are related to image A and the second of item of the each is expounded to image B. We concatenate the 2 matrixes into the projections matrix (which has 4 vectors in it: A1, B1, A2, B2).

To maintain the relation of the vectors within the projections matrix, we define pos_pairs dictionary to store which two items are related within the concatenated matrix. (soon I’ll explain the F.normalize() thing!)

As you see in the subsequent lines of code, I’m going over the items within the projections matrix in a loop, I find the related vector of that using our dictionary after which I calculate the cosine similarity. You would possibly wonder why you don’t divide by the dimensions of the vectors, because the cosine similarity formula suggests. The purpose is that before starting the loop, using the F.normalize function, I’m normalizing all of the vectors in our projection matrix to have the dimensions of 1. So, there’s no must divide by the dimensions in the road where we’re calculating the cosine similarity.

After constructing our numerator, I’m finding all the opposite indexes of vectors within the batch (aside from the identical index i), to calculate the cosine similarities consisting the denominator. Finally, I’m calculating the loss by dividing the numerator by denominator and applying the log function and flipping the sign. Ensure that to play with the code to grasp what is occurring in each line.

Step 3: Converting it into efficient matrix-friendly PyTorch code

The issue with the previous python implementation is that to be utilized in our training pipeline; we want to do away with the slow “for” loops and convert it into matrix multiplications and array manipulations with a purpose to leverage the parallelization power.

PyTorch implementation

Let’s see what’s happening on this code snippet. This time, I’ve introduced the labels_1 and labels_2 tensors to encode the arbitrary classes to which these images belong, as we want a strategy to encode the connection of A1, A2 and B1, B2 images. It doesn’t matter for those who select labels 0 and 1 (as I did) or say 5 and eight.

After concatenating each the embeddings and labels, we start by making a sim_matrix containing the cosine similarity of .



Please enter your comment!
Please enter your name here