A test set is a portion of data held out entirely from model training and tuning, used only at the end to measure how well the final model generalizes to new examples.
In machine learning, available data is typically split into three parts: a training set for learning patterns, a validation set for tuning hyperparameters, and a test set that remains untouched until the very end.
The test set provides an unbiased estimate of real-world performance because the model has never seen these examples during development, helping detect overfitting to the training or validation data.
Best practice keeps the test set fixed and uses it only once for final reporting, ensuring the performance numbers reflect how the model will behave on future unseen data.
A researcher splits 10,000 labeled photos into 7,000 for training, 2,000 for validation, and 1,000 for testing. After the model is fully trained and tuned, accuracy is measured only on the 1,000 test photos to report final results.
Without a separate test set, reported performance can be overly optimistic, leading to models that fail in production; it remains the standard way to obtain trustworthy generalization metrics in AI today.
Training data has already been seen by the model, so testing on it gives an overly optimistic score that does not reflect performance on new data.
A validation set is a separate portion of a dataset used during model training to evaluate performance and tune hyperparameters.
Overfitting happens when a machine learning model learns the training data too closely, including its noise and quirks, so it fails to perform well on new, unseen data.
Batch size is the number of training examples processed together in a single forward and backward pass during model training.
Chunking is the process of breaking large datasets, documents, or files into smaller, fixed-size or semantically meaningful segments. It is a common data preprocessing step in AI/ML pipelines to manage memory and enable efficient processing.
Cosine similarity measures how similar two vectors are by computing the cosine of the angle between them, ignoring their magnitudes.
Data augmentation is a technique that artificially increases the size and diversity of a training dataset by creating modified versions of existing data samples.