Skip to content
Sign in

What is Active Learning?

Active learning is a machine learning technique where the model itself selects the most informative unlabeled data points to be labeled by a human, rather than labeling data randomly or all at once.

The process begins with a small set of labeled examples. The model then evaluates unlabeled data and picks samples it finds most uncertain or valuable for improving its performance.

These chosen samples are sent to an oracle (usually a human expert) for labeling. The new labels are added to the training set and the model is retrained, repeating the cycle until performance is sufficient.

Popular selection strategies include uncertainty sampling, where the model queries points near its decision boundary, and query-by-committee, where multiple models vote on the most disagreed-upon examples.

Example

A medical imaging system starts with a few hundred labeled X-rays. It then identifies scans it is least confident about classifying as cancerous or benign and asks a radiologist to label only those, quickly reaching high accuracy with far fewer total labels.

Why it matters

Labeling data is often expensive and time-consuming, especially when experts are required. Active learning reduces labeling costs while still producing accurate models, making AI practical in domains like healthcare, NLP, and autonomous driving.

Frequently asked questions

Regular supervised learning requires a large, randomly labeled dataset upfront, while active learning iteratively chooses only the most useful samples to label, saving time and effort.