Skip to content
Sign in

What is Data Labeling?

Data labeling is the process of adding tags or annotations to raw data so that machine learning models can learn from it during training.

It turns unlabeled examples into training examples by attaching meaningful information, such as class names, bounding boxes, or text categories. This step is required for most supervised learning tasks.

Labeling can be performed by humans, automated tools, or a combination of both. Quality and consistency of labels directly affect how well a model will perform.

Common formats include image classification tags, object detection boxes, sentiment scores on text, or transcribed speech segments.

Example

A person looks at thousands of photos and clicks 'cat' or 'dog' on each one so an image classifier can later recognize new pictures correctly.

Why it matters

Most high-performing AI systems today are trained on large amounts of labeled data; without accurate labels, models cannot learn reliable patterns.

Frequently asked questions

It is often done by human annotators, sometimes assisted by software that suggests labels for review.