Optimizers are an important a part of everyone working in machine learning.
Everyone knows optimizers determine how the model will converge the loss function during gradient descent. Thus, using the best optimizer can boost the performance and the efficiency of model training.
Besides classic papers, many books explain the principles behind optimizers in easy terms.
Nevertheless, I recently found that the performance of Keras 3 optimizers doesn’t quite match the mathematical algorithms described in these books, which made me a bit anxious. I frightened about misunderstanding something or about updates in the newest version of Keras affecting the optimizers.
So, I reviewed the source code of several common optimizers in Keras 3 and revisited their use cases. Now I need to share this information to avoid wasting you time and assist you master Keras 3 optimizers more quickly.
For those who’re not very aware of the newest changes in Keras 3, here’s a fast rundown: Keras 3 integrates TensorFlow, PyTorch, and JAX, allowing us to make use of cutting-edge deep learning frameworks easily through Keras APIs.