Does high variance always mean the model is bad?

Not necessarily; a high-variance model may still perform well if enough data is available, but it usually signals a need for regularization or simpler models.

How can I reduce variance in my model?

Common approaches include collecting more training data, using regularization, simplifying the model architecture, or applying ensemble techniques like bagging.

What is Variance?

In machine learning, variance refers to how much a model's predictions fluctuate when trained on different subsets of the same data. High variance indicates the model is overly sensitive to the specific training examples it sees.

Variance captures the variability of a model's output across different training sets drawn from the same distribution. A high-variance model essentially memorizes noise and idiosyncrasies in the training data rather than learning stable patterns.

It is one half of the classic bias-variance decomposition of prediction error. While bias measures systematic errors from overly simplistic assumptions, variance measures errors that arise from excessive model flexibility.

Techniques such as regularization, cross-validation, and ensemble methods are commonly used to reduce variance without unduly increasing bias.

Example

Imagine training a decision tree on different random samples of housing data. A very deep tree might predict wildly different prices for the same house depending on which sample it was trained on, illustrating high variance.

Why it matters

Managing variance is essential for building models that generalize well to unseen data rather than simply memorizing the training set. Modern deep-learning practice relies heavily on variance-reduction techniques such as dropout and data augmentation.

Frequently asked questions

Bias is the error from erroneous assumptions in the learning algorithm, while variance is the error from sensitivity to small fluctuations in the training set.

Related terms

Bias

In AI ethics, bias refers to systematic prejudices or errors in machine learning systems that produce unfair or discriminatory outcomes for particular groups of people.

Overfitting

Overfitting happens when a machine learning model learns the training data too closely, including its noise and quirks, so it fails to perform well on new, unseen data.

Bias-Variance Tradeoff

The bias-variance tradeoff describes the balance between two sources of error in a machine learning model: bias (error from overly simple assumptions) and variance (error from sensitivity to small fluctuations in the training data).

Regularization

Regularization is a set of techniques in machine learning that reduce overfitting by adding a penalty term to the model's loss function, discouraging overly complex or large parameter values.

Ensemble Learning

Ensemble learning is a machine learning approach that combines predictions from multiple models to achieve better accuracy and robustness than any individual model.

Active Learning

Active learning is a machine learning technique where the model itself selects the most informative unlabeled data points to be labeled by a human, rather than labeling data randomly or all at once.

What is Variance?

Example

Why it matters

Frequently asked questions

How is variance different from bias?

Does high variance always mean the model is bad?

How can I reduce variance in my model?

Related terms

What is Variance?

Example

Why it matters

Frequently asked questions

How is variance different from bias?

Does high variance always mean the model is bad?

How can I reduce variance in my model?

Related terms