Automatic Differentiation (AutoDiff): A Transient Intro with Examples

-

An introduction to the mechanics of AutoDiff, exploring its mathematical principles, implementation strategies, and applications in currently most-used frameworks

Photo by Bozhin Karaivanov on Unsplash

At the center of machine learning lies the optimization of loss/objective functions. This optimization process heavily relies on computing gradients of those functions with respect to model parameters. As Baydin et al. (2018) elucidate of their comprehensive survey [1], these gradients guide the iterative updates in optimization algorithms reminiscent of stochastic gradient descent (SGD):

θₜ₊₁ = θₜ – α ∇θ L(θₜ)

Where:

  • θₜ represents the model parameters at step t
  • α is the educational rate
  • ∇_θ L(θₜ) denotes the gradient of the loss function L with respect to the parameters θ

This straightforward update rule belies the complexity of computing gradients in deep neural networks with tens of millions and even billions of parameters.

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x