What is Gradient Descent?
Gradient descent is an optimization algorithm that finds the minimum of a function by repeatedly moving in the direction of the steepest downward slope. In machine learning it is used to minimize a model's error by adjusting parameters step by step.
The algorithm calculates the gradient (slope) of the loss function with respect to the model parameters. It then updates each parameter by subtracting a fraction of this gradient, moving the model closer to lower error.
A key hyperparameter is the learning rate, which controls the size of each step. Too large a rate can overshoot the minimum; too small a rate makes training slow.
Variants such as stochastic gradient descent and mini-batch gradient descent use subsets of the data to make updates faster and often help the model escape poor local minima.
Example
Imagine walking down a foggy hill to reach the lowest point: at each step you feel the slope beneath your feet and take a small step downhill. After many such steps you arrive near the bottom, just as gradient descent iteratively reduces a model's loss.
Why it matters
Gradient descent (and its variants) is the core engine behind training virtually all modern neural networks and many other machine-learning models, enabling them to learn from data at scale.
Frequently asked questions
The updates may overshoot the minimum, causing the loss to increase or oscillate instead of converging.
Related terms
A loss function quantifies how far a model's predictions are from the true values, serving as the objective that training tries to minimize.
The learning rate is a hyperparameter that controls the size of the steps an optimization algorithm takes when updating a model's parameters during training.
Backpropagation is an algorithm for training neural networks by calculating how much each weight contributed to the prediction error and adjusting those weights accordingly. It uses the chain rule to efficiently compute gradients of the loss function.
Adam (Adaptive Moment Estimation) is a popular optimization algorithm used to train machine learning models by iteratively updating parameters based on gradients.
Classification is a supervised machine learning task that assigns input data to one of several predefined categories or classes based on patterns learned from labeled training examples.
Clustering is an unsupervised machine learning technique that automatically groups similar data points together into clusters based on their features, without using any labeled examples.