Optimisation Algorithms: Neural Networks 101

Artificial Intelligence

Optimisation Algorithms: Neural Networks 101

admin

November 24, 2023

Optimisation Algorithms: Neural Networks 101

How you can improve training beyond the “vanilla” gradient descent algorithm

https://www.flaticon.com/free-icons/neural-network.neural network icons. Neural network icons created by andinur — Flaticon.

In my last post, we discussed how you possibly can improve the performance of neural networks through hyperparameter tuning:

This can be a process whereby the most effective hyperparameters reminiscent of learning rate and variety of hidden layers are “tuned” to search out probably the most optimal ones for our network to spice up its performance.

Unfortunately, this tuning process for big deep neural networks (deep learning) is painstakingly slow. One option to improve upon that is to make use of faster optimisers than the standard “vanilla” gradient descent method. On this post, we are going to dive into the most well-liked optimisers and variants of gradient descent that may enhance the speed of coaching and likewise convergence and compare them in PyTorch!

Before diving in, let’s quickly brush up on our knowledge of gradient descent and the speculation behind it.

The goal of gradient descent is to update the parameters of the model by subtracting the gradient (partial derivative) of the parameter with respect to the loss function. A learning rate, α, serves to control this process to make sure updating of the parameters occurs on an inexpensive scale and doesn’t over or undershoot the optimal value.

θ are the parameters of the model.
J(θ) is the loss function.
∇J(θ) is the gradient of the loss function. ∇ is the gradient operator, also often known as nabla.
α is the training rate.

I wrote a previous article on gradient descent and the way it really works if you should familiarise yourself a bit more about it:

How you can improve training beyond the “vanilla” gradient descent algorithm

LEAVE A REPLY Cancel reply