Skip to content
Sign in

What is Principal Component Analysis?

Also known as: PCA

Principal Component Analysis (PCA) is a technique for reducing the number of dimensions in a dataset while keeping as much of the original information as possible. It does this by finding new axes, called principal components, that capture the largest amounts of variation in the data.

PCA works by calculating the directions in which the data spreads out the most. These directions become the principal components, ordered from the one that explains the most variance to the one that explains the least.

The method relies on linear algebra: it centers the data, computes the covariance matrix, and finds its eigenvectors and eigenvalues to determine the new axes. Data points are then projected onto these axes to create a lower-dimensional version.

Because it is unsupervised, PCA does not use labels; it only looks at the structure of the input features themselves.

Example

Imagine a spreadsheet with 50 columns describing each customer (age, income, purchase history, etc.). PCA can combine these into just two or three new columns that still show the main differences between customers, making it easy to plot and explore clusters.

Why it matters

PCA is a standard preprocessing step that speeds up training, reduces noise, and helps visualize high-dimensional data in modern machine-learning pipelines. It remains widely used in computer vision, genomics, and recommendation systems.

Frequently asked questions

PCA is unsupervised because it does not require labeled outcomes; it only examines relationships among the input features.