Skip to content
Sign in

What is K-Nearest Neighbors?

Also known as: KNN

K-Nearest Neighbors (KNN) is a simple supervised machine learning algorithm used for classification and regression that predicts the label or value of a new data point based on the majority vote or average of its K closest training examples.

KNN works by storing the entire training dataset and, for each new query point, calculating distances to all known points using a chosen metric such as Euclidean distance. It then selects the K nearest points and aggregates their labels (for classification) or values (for regression) to produce the prediction.

The algorithm is instance-based and lazy, meaning it performs no explicit training phase and defers computation until prediction time. Key choices include the value of K, the distance metric, and optional weighting of neighbors by distance.

Because it makes few assumptions about data distribution, KNN is non-parametric and can model complex decision boundaries, though it can suffer from the curse of dimensionality and requires careful feature scaling.

Example

Imagine classifying a new fruit as an apple or orange: measure its weight and diameter, find the three closest fruits in a labeled dataset, and assign the label that appears most often among those three neighbors.

Why it matters

KNN remains a foundational baseline in modern AI because of its interpretability and ease of implementation, and it underpins many recommendation systems, anomaly detection pipelines, and prototype-based methods used in production today.

Frequently asked questions

K is a user-chosen hyperparameter that specifies how many nearest neighbors are considered when making a prediction.