Skip to content
Sign in

What is K-Means?

K-Means is an unsupervised machine learning algorithm that partitions data into a user-specified number (K) of clusters by grouping similar points together.

The algorithm starts by randomly placing K centroids in the feature space. Each data point is then assigned to the nearest centroid based on distance, typically Euclidean.

Centroids are updated to the mean position of all points assigned to them, and the assignment step repeats until the centroids stabilize or a maximum iteration limit is reached.

It minimizes within-cluster variance and is efficient for large datasets, though results can vary with initial centroid placement and it assumes roughly spherical clusters.

Example

A retailer might use K-Means with K=4 on customer purchase data to automatically group shoppers into clusters such as 'budget buyers', 'frequent high-spenders', 'seasonal purchasers', and 'one-time visitors'.

Why it matters

K-Means remains a foundational tool for exploratory data analysis, customer segmentation, image compression, and anomaly detection in modern AI pipelines due to its simplicity and speed.

Frequently asked questions

Common methods include the elbow plot of within-cluster sum of squares or the silhouette score that measures cluster cohesion and separation.