Does active learning always need a human?

Yes, it typically relies on an oracle—most often a human expert—to provide labels for the data points the model selects.

What kinds of problems benefit most from active learning?

Tasks where labeling is costly or requires specialists, such as medical diagnosis, legal document review, or rare-event detection, gain the biggest efficiency improvements.

What is Active Learning?

Active learning is a machine learning technique where the model itself selects the most informative unlabeled data points to be labeled by a human, rather than labeling data randomly or all at once.

The process begins with a small set of labeled examples. The model then evaluates unlabeled data and picks samples it finds most uncertain or valuable for improving its performance.

These chosen samples are sent to an oracle (usually a human expert) for labeling. The new labels are added to the training set and the model is retrained, repeating the cycle until performance is sufficient.

Popular selection strategies include uncertainty sampling, where the model queries points near its decision boundary, and query-by-committee, where multiple models vote on the most disagreed-upon examples.

Example

A medical imaging system starts with a few hundred labeled X-rays. It then identifies scans it is least confident about classifying as cancerous or benign and asks a radiologist to label only those, quickly reaching high accuracy with far fewer total labels.

Why it matters

Labeling data is often expensive and time-consuming, especially when experts are required. Active learning reduces labeling costs while still producing accurate models, making AI practical in domains like healthcare, NLP, and autonomous driving.

Frequently asked questions

Regular supervised learning requires a large, randomly labeled dataset upfront, while active learning iteratively chooses only the most useful samples to label, saving time and effort.

Related terms

Semi-Supervised Learning

Semi-supervised learning is a machine learning approach that combines a small amount of labeled data with a large amount of unlabeled data to train models more effectively than using either alone.

Supervised Learning

Supervised learning is a machine learning method where a model is trained on data that already has correct answers attached, so it can learn to predict those answers for new data.

Transfer Learning

Transfer learning is a machine learning method that reuses a model trained on one task as the starting point for a different but related task.

Adam Optimizer

Adam (Adaptive Moment Estimation) is a popular optimization algorithm used to train machine learning models by iteratively updating parameters based on gradients.

Anomaly Detection

Anomaly detection is a machine learning technique that identifies rare or unusual data points that differ significantly from the majority of the data, often called outliers.

Bias-Variance Tradeoff

The bias-variance tradeoff describes the balance between two sources of error in a machine learning model: bias (error from overly simple assumptions) and variance (error from sensitivity to small fluctuations in the training data).