What is Batch Size?
Batch size is the number of training examples processed together in a single forward and backward pass during model training.
In machine learning, especially neural network training, data is rarely fed one sample at a time or all at once. Instead, the dataset is split into smaller groups called batches. The model updates its weights after seeing each batch, which is the core idea behind mini-batch gradient descent.
Choosing the batch size involves a trade-off: smaller batches add more noise to updates (which can help escape poor local minima) but are slower to compute on modern hardware, while larger batches give smoother gradients and better hardware utilization but may require more memory and can sometimes lead to worse generalization.
Batch size is usually kept constant during training but can be adjusted as a hyperparameter; common values range from 16 to 256 for many deep learning tasks.
Example
When training an image classifier on 60,000 photos with a batch size of 32, the model sees and learns from 32 images at a time before updating its parameters, requiring 1,875 such updates to finish one full pass over the data.
Why it matters
Batch size directly controls training speed, memory usage, and how well the model converges, making it a key hyperparameter that practitioners tune to fit hardware constraints and achieve good performance.
Frequently asked questions
Very large batches reduce update noise and can speed up training on GPUs but may hurt generalization and require more memory than available.
Related terms
Gradient descent is an optimization algorithm that finds the minimum of a function by repeatedly moving in the direction of the steepest downward slope. In machine learning it is used to minimize a model's error by adjusting parameters step by step.
An epoch is one complete pass of a machine learning model through the entire training dataset during training.
The learning rate is a hyperparameter that controls the size of the steps an optimization algorithm takes when updating a model's parameters during training.
A dataset is a structured collection of data points used to train, validate, or test machine learning models.
Data augmentation is a technique that artificially increases the size and diversity of a training dataset by creating modified versions of existing data samples.
In AI and machine learning, a feature is an individual measurable piece of data that serves as an input variable for a model.