Can unlabeled data still be useful?

Yes, unlabeled data is used in unsupervised learning or semi-supervised approaches to discover patterns without known answers.

What is Label?

In machine learning, a label is the known correct output or category assigned to a training data example that a model learns to predict.

Labels are the target values used in supervised learning. They tell the algorithm what the desired answer should be for each input during training.

Without labels, models cannot learn the mapping from features to outcomes. The quality and accuracy of labels directly affect how well the model performs on new data.

Labels can be categories (e.g., 'cat' or 'dog') for classification or numeric values for regression tasks.

Example

In a dataset of photos, each image might have a label such as 'cat' or 'dog' so the model can learn to recognize animals correctly.

Why it matters

Labels are essential for training supervised models that power most practical AI applications today, from image recognition to spam detection.

Frequently asked questions

Features are the input variables describing the data, while labels are the output the model tries to predict.

Related terms

Supervised Learning

Supervised learning is a machine learning method where a model is trained on data that already has correct answers attached, so it can learn to predict those answers for new data.

Feature

In AI and machine learning, a feature is an individual measurable piece of data that serves as an input variable for a model.

Training Data

Training data is the dataset of examples that a machine learning model learns from during the training process. It contains input features paired with known outputs so the model can discover patterns.

Classification

Classification is a supervised machine learning task that assigns input data to one of several predefined categories or classes based on patterns learned from labeled training examples.

Batch Size

Batch size is the number of training examples processed together in a single forward and backward pass during model training.

Data Augmentation

Data augmentation is a technique that artificially increases the size and diversity of a training dataset by creating modified versions of existing data samples.