Skip to content

What is Overfitting?

Overfitting happens when a machine learning model learns the training data too closely, including its noise and quirks, so it fails to perform well on new, unseen data.

During training, the model adjusts its parameters to minimize errors on the training examples. If the model is too complex relative to the amount of data, it starts capturing random fluctuations instead of the true underlying patterns.

This leads to excellent performance on the training set but poor generalization, where the model makes inaccurate predictions on test or real-world data. The key issue is memorization rather than learning general rules.

Techniques like regularization, cross-validation, and using more data help reduce overfitting by encouraging the model to focus on broader patterns.

Example

Imagine a student who memorizes every answer from last year's exam instead of understanding the concepts; they ace the practice test but struggle with new questions on the real exam.

Why it matters

Overfitting produces unreliable models that cannot be trusted in real applications like medical diagnosis or self-driving cars, leading to wasted resources and potential harm.

Frequently asked questions

Check if training accuracy is much higher than validation or test accuracy; large gaps usually indicate overfitting.