What is Loss Function?
A loss function quantifies how far a model's predictions are from the true values, serving as the objective that training tries to minimize.
It acts as a numerical score of error for each prediction or batch of predictions. During training, the model adjusts its parameters to drive this score downward.
Common loss functions include mean squared error for regression and cross-entropy for classification. The choice depends on the task and output type.
Loss is typically computed on training data and used by optimizers such as gradient descent to update model weights via backpropagation.
Example
In a model that predicts house prices, the loss might be the average of (predicted price minus actual price) squared across all houses in the training set.
Why it matters
Loss functions are central to modern AI because they turn the abstract goal of 'learning' into a concrete optimization problem that can be solved at scale with gradient-based methods.
Frequently asked questions
No. Loss measures error magnitude while accuracy measures the fraction of correct predictions; a model can have low loss yet imperfect accuracy.
Related terms
Gradient descent is an optimization algorithm that finds the minimum of a function by repeatedly moving in the direction of the steepest downward slope. In machine learning it is used to minimize a model's error by adjusting parameters step by step.
Backpropagation is an algorithm for training neural networks by calculating how much each weight contributed to the prediction error and adjusting those weights accordingly. It uses the chain rule to efficiently compute gradients of the loss function.
Adam (Adaptive Moment Estimation) is a popular optimization algorithm used to train machine learning models by iteratively updating parameters based on gradients.
Classification is a supervised machine learning task that assigns input data to one of several predefined categories or classes based on patterns learned from labeled training examples.
Clustering is an unsupervised machine learning technique that automatically groups similar data points together into clusters based on their features, without using any labeled examples.
A hyperparameter is a value or setting chosen by the user before training a machine learning model that controls the learning process itself.