Skip to content

What is Learning Rate?

The learning rate is a hyperparameter that controls the size of the steps an optimization algorithm takes when updating a model's parameters during training.

In algorithms like gradient descent, the learning rate scales the gradient to decide how far to move the model's weights in the direction that reduces the loss. A value that is too large can cause the model to overshoot the optimal point, while a value that is too small makes training slow and may trap the model in suboptimal solutions.

Learning rates are often fixed at the start but can also be adjusted over time using schedules such as decay or warmup to help the model converge more reliably. Choosing an appropriate rate is essential because it directly influences both the speed and stability of the training process.

Modern optimizers like Adam or SGD with momentum build on the basic learning-rate idea by adapting the step size for each parameter individually.

Example

When training a neural network to classify handwritten digits, a learning rate of 0.1 might let the model quickly reduce its errors in the first few epochs, whereas a rate of 0.0001 would require many more epochs to reach similar accuracy.

Why it matters

The learning rate is one of the most influential settings in modern AI training; a well-chosen rate can dramatically speed up convergence and improve final model performance, while a poor choice can prevent learning altogether.

Frequently asked questions

The model may fail to converge, oscillating around the minimum or even diverging so that the loss increases instead of decreasing.