Skip to content
Sign in

What is Validation Set?

A validation set is a separate portion of a dataset used during model training to evaluate performance and tune hyperparameters.

It is held out from the training data so the model does not learn directly from it, allowing unbiased checks on how well the model is generalizing at each stage of training.

Practitioners use it to select hyperparameters such as learning rate or model depth and to decide when to stop training (early stopping) before the model starts overfitting.

Once tuning is complete, the final model is evaluated on a completely untouched test set to report true generalization performance.

Example

When building a spam filter, you might split 100,000 emails into 70k for training, 15k for validation, and 15k for testing; the validation emails are used to try different thresholds and feature counts until the model performs well on them.

Why it matters

It enables reliable hyperparameter selection and guards against overfitting to the test data, which is essential for building trustworthy AI systems that perform well on new, unseen data.

Frequently asked questions

The validation set is used repeatedly during development to tune the model, while the test set is used only once at the end for final evaluation.