In machine learning, variance refers to how much a model's predictions fluctuate when trained on different subsets of the same data. High variance indicates the model is overly sensitive to the specific training examples it sees.
Variance captures the variability of a model's output across different training sets drawn from the same distribution. A high-variance model essentially memorizes noise and idiosyncrasies in the training data rather than learning stable patterns.
It is one half of the classic bias-variance decomposition of prediction error. While bias measures systematic errors from overly simplistic assumptions, variance measures errors that arise from excessive model flexibility.
Techniques such as regularization, cross-validation, and ensemble methods are commonly used to reduce variance without unduly increasing bias.
Imagine training a decision tree on different random samples of housing data. A very deep tree might predict wildly different prices for the same house depending on which sample it was trained on, illustrating high variance.
Managing variance is essential for building models that generalize well to unseen data rather than simply memorizing the training set. Modern deep-learning practice relies heavily on variance-reduction techniques such as dropout and data augmentation.
Bias is the error from erroneous assumptions in the learning algorithm, while variance is the error from sensitivity to small fluctuations in the training set.
In AI ethics, bias refers to systematic prejudices or errors in machine learning systems that produce unfair or discriminatory outcomes for particular groups of people.
Overfitting happens when a machine learning model learns the training data too closely, including its noise and quirks, so it fails to perform well on new, unseen data.
The bias-variance tradeoff describes the balance between two sources of error in a machine learning model: bias (error from overly simple assumptions) and variance (error from sensitivity to small fluctuations in the training data).
Regularization is a set of techniques in machine learning that reduce overfitting by adding a penalty term to the model's loss function, discouraging overly complex or large parameter values.
Ensemble learning is a machine learning approach that combines predictions from multiple models to achieve better accuracy and robustness than any individual model.
Active learning is a machine learning technique where the model itself selects the most informative unlabeled data points to be labeled by a human, rather than labeling data randomly or all at once.