Skip to content
Sign in

What is Batch Normalization?

Batch Normalization is a technique in neural networks that normalizes the inputs to each layer by subtracting the batch mean and dividing by the batch standard deviation, then scaling and shifting with learnable parameters.

It works by computing the mean and variance of each feature across a mini-batch during training. The activations are then normalized to have zero mean and unit variance before being scaled by gamma and shifted by beta parameters that the network learns.

This reduces internal covariate shift, allowing higher learning rates and faster convergence while also providing a mild regularization effect that can reduce the need for dropout.

During inference, running averages of mean and variance collected during training are used instead of batch statistics to ensure consistent predictions.

Example

In a CNN for classifying images, batch normalization layers are often inserted after convolutional layers. This lets the network train effectively even when the distribution of pixel values varies across batches of photos.

Why it matters

Batch Normalization made training much deeper networks practical and reliable, becoming a standard component in modern architectures like ResNet and Transformer variants.

Frequently asked questions

It stabilizes training by keeping activations in a healthy range, allowing faster convergence and deeper models without vanishing or exploding gradients.