Skip to content

What is Batch Size?

Batch size is the number of training examples processed together in a single forward and backward pass during model training.

In machine learning, especially neural network training, data is rarely fed one sample at a time or all at once. Instead, the dataset is split into smaller groups called batches. The model updates its weights after seeing each batch, which is the core idea behind mini-batch gradient descent.

Choosing the batch size involves a trade-off: smaller batches add more noise to updates (which can help escape poor local minima) but are slower to compute on modern hardware, while larger batches give smoother gradients and better hardware utilization but may require more memory and can sometimes lead to worse generalization.

Batch size is usually kept constant during training but can be adjusted as a hyperparameter; common values range from 16 to 256 for many deep learning tasks.

Example

When training an image classifier on 60,000 photos with a batch size of 32, the model sees and learns from 32 images at a time before updating its parameters, requiring 1,875 such updates to finish one full pass over the data.

Why it matters

Batch size directly controls training speed, memory usage, and how well the model converges, making it a key hyperparameter that practitioners tune to fit hardware constraints and achieve good performance.

Frequently asked questions

Very large batches reduce update noise and can speed up training on GPUs but may hurt generalization and require more memory than available.