Optimizers | Gradient Descent with Momentum | Nesterov Accelerated Gradient | Deep Learning basics

In this video, we’ll do a recap of Gradient Descent and understand its drawbacks, the we’ll be looking at how Momentum and Nesterov Accelerated Gradient Descent (NAG) come to the rescue. 🌟

Gradient Descent: The Basics
Gradient Descent is a fundamental optimization technique used to minimize loss functions in machine learning. It works by iteratively adjusting the model parameters in the direction of the steepest descent of the loss function.

However, Gradient Descent has a few notable drawbacks:
Local Minima Trap 😱 – Gradient Descent can get stuck in local minima, failing to find the global minimum.
Unstable Gradient Movement ⚑ – Mini-batch updates can cause erratic movements, making convergence unstable.
Slow Convergence 🐒 – In flat regions of the loss function, Gradient Descent converges very slowly, prolonging the optimization process.

Momentum-Based Gradient Descent: A Solution with a Twist
To address these issues, we turn to Momentum-Based Gradient Descent.

Here’s how it helps:
History Accumulation πŸ“Š: It combines the current gradient with a fraction of the previous gradients, smoothing out the updates.
Faster Convergence πŸš€: This historical accumulation helps the algorithm converge much faster, particularly in flat regions.
Local Minima Avoidance πŸ”„: It’s more likely to escape local minima due to its increased inertia.

Drawback: Despite these advantages, Momentum can cause the optimization path to take numerous U-turns, which can be inefficient.

Nesterov Accelerated Gradient (NAG): A Smarter Approach
Nesterov Accelerated Gradient Descent improves upon Momentum by introducing a look-ahead step:

Look-Ahead Step πŸ‘€: Before calculating the gradient, NAG takes a step according to the accumulated history, leading to a temporary update in model parameters.
Controlled Steps πŸ‘£: By looking at the gradient from this new point, NAG takes more informed and controlled steps towards the minimum.

In Summary:
Gradient Descent: Effective but slow and prone to instability.
Momentum: Adds speed and stability but can cause inefficiencies with U-turns.
NAG: Combines the best of both worlds with faster, more controlled convergence.

If you found this explanation helpful, don’t forget to like πŸ‘, share πŸ“€, and subscribe πŸ”” for more tech insights! πŸ’‘ Your support helps us create more valuable content for you. Thanks for watching! πŸ™Œ