Is batch size the same as epoch?

No. An epoch is one complete pass through the entire training dataset, while batch size determines how many samples are seen before each parameter update within that epoch.

How do I pick a good batch size?

Start with common values like 32 or 64, then adjust based on validation performance, training time, and your hardware's memory limits.

What is Batch Size?

Batch size is the number of training examples processed together in a single forward and backward pass during model training.

In machine learning, especially neural network training, data is rarely fed one sample at a time or all at once. Instead, the dataset is split into smaller groups called batches. The model updates its weights after seeing each batch, which is the core idea behind mini-batch gradient descent.

Choosing the batch size involves a trade-off: smaller batches add more noise to updates (which can help escape poor local minima) but are slower to compute on modern hardware, while larger batches give smoother gradients and better hardware utilization but may require more memory and can sometimes lead to worse generalization.

Batch size is usually kept constant during training but can be adjusted as a hyperparameter; common values range from 16 to 256 for many deep learning tasks.

Example

When training an image classifier on 60,000 photos with a batch size of 32, the model sees and learns from 32 images at a time before updating its parameters, requiring 1,875 such updates to finish one full pass over the data.

Why it matters

Batch size directly controls training speed, memory usage, and how well the model converges, making it a key hyperparameter that practitioners tune to fit hardware constraints and achieve good performance.

Frequently asked questions

Very large batches reduce update noise and can speed up training on GPUs but may hurt generalization and require more memory than available.

Related terms

Gradient Descent

Gradient descent is an optimization algorithm that finds the minimum of a function by repeatedly moving in the direction of the steepest downward slope. In machine learning it is used to minimize a model's error by adjusting parameters step by step.

Epoch

An epoch is one complete pass of a machine learning model through the entire training dataset during training.

Learning Rate

The learning rate is a hyperparameter that controls the size of the steps an optimization algorithm takes when updating a model's parameters during training.

Dataset

A dataset is a structured collection of data points used to train, validate, or test machine learning models.

Data Augmentation

Data augmentation is a technique that artificially increases the size and diversity of a training dataset by creating modified versions of existing data samples.

Feature

In AI and machine learning, a feature is an individual measurable piece of data that serves as an input variable for a model.