Skip to content
Sign in

What is Dimensionality Reduction?

Dimensionality reduction is a machine learning technique that decreases the number of features (dimensions) in a dataset while preserving as much relevant information as possible.

High-dimensional data often suffers from the curse of dimensionality, where too many features lead to sparse data, increased computational cost, and risk of overfitting. Dimensionality reduction addresses this by transforming or selecting a smaller set of features.

Common approaches include linear methods like Principal Component Analysis (PCA) that project data onto fewer axes capturing maximum variance, and nonlinear methods like t-SNE or autoencoders that uncover complex structures in the data.

The goal is to simplify models, speed up training, reduce noise, and enable visualization of data in 2D or 3D while minimizing information loss.

Example

In a dataset of house prices with 50 features like size, location, and amenities, dimensionality reduction might combine correlated features into 5 key components that still allow accurate price predictions.

Why it matters

Modern AI datasets from images, genomics, and text are extremely high-dimensional; dimensionality reduction makes analysis computationally feasible, improves model performance, and aids interpretability.

Frequently asked questions

No. Feature selection picks existing features, while dimensionality reduction creates new combined features or projections.