What is binary vs. multiclass classification?

Binary classification has only two possible labels; multiclass classification has three or more mutually exclusive labels.

How do we measure classification performance?

Common metrics include accuracy, precision, recall, F1-score, and the confusion matrix, which show how often the model predicts each class correctly or incorrectly.

What is Classification?

Classification is a supervised machine learning task that assigns input data to one of several predefined categories or classes based on patterns learned from labeled training examples.

In classification, an algorithm is trained on data where each example already has a known label. The model learns decision boundaries that separate the different classes so it can predict the correct label for new, unseen inputs.

Common approaches include logistic regression, decision trees, support vector machines, and neural networks. Training minimizes prediction errors using loss functions, and the model is later evaluated on held-out test data.

Key challenges include handling imbalanced classes, choosing appropriate features, and avoiding overfitting so the model generalizes well to new data.

Example

A spam filter is a classic classification system: it is trained on thousands of emails already labeled 'spam' or 'not spam' and then predicts whether a new incoming message belongs to the spam class.

Why it matters

Classification powers everyday AI applications such as image recognition, medical diagnosis, fraud detection, and content moderation, making it one of the most widely deployed machine-learning techniques today.

Frequently asked questions

Classification predicts discrete categories (e.g., cat or dog), while regression predicts continuous numeric values (e.g., house price).

Related terms

Supervised Learning

Supervised learning is a machine learning method where a model is trained on data that already has correct answers attached, so it can learn to predict those answers for new data.

Regression

Regression is a supervised machine learning method that predicts continuous numerical values from input features.

Clustering

Clustering is an unsupervised machine learning technique that automatically groups similar data points together into clusters based on their features, without using any labeled examples.

Logistic Regression

Logistic Regression is a supervised machine learning algorithm used for binary classification that estimates the probability an input belongs to a particular class.

Adam Optimizer

Adam (Adaptive Moment Estimation) is a popular optimization algorithm used to train machine learning models by iteratively updating parameters based on gradients.

Gradient Descent

Gradient descent is an optimization algorithm that finds the minimum of a function by repeatedly moving in the direction of the steepest downward slope. In machine learning it is used to minimize a model's error by adjusting parameters step by step.