Skip to content

What is Gradient Descent?

Gradient descent is an optimization algorithm that finds the minimum of a function by repeatedly moving in the direction of the steepest downward slope. In machine learning it is used to minimize a model's error by adjusting parameters step by step.

The algorithm calculates the gradient (slope) of the loss function with respect to the model parameters. It then updates each parameter by subtracting a fraction of this gradient, moving the model closer to lower error.

A key hyperparameter is the learning rate, which controls the size of each step. Too large a rate can overshoot the minimum; too small a rate makes training slow.

Variants such as stochastic gradient descent and mini-batch gradient descent use subsets of the data to make updates faster and often help the model escape poor local minima.

Example

Imagine walking down a foggy hill to reach the lowest point: at each step you feel the slope beneath your feet and take a small step downhill. After many such steps you arrive near the bottom, just as gradient descent iteratively reduces a model's loss.

Why it matters

Gradient descent (and its variants) is the core engine behind training virtually all modern neural networks and many other machine-learning models, enabling them to learn from data at scale.

Frequently asked questions

The updates may overshoot the minimum, causing the loss to increase or oscillate instead of converging.