Should I apply dropout at test time?

No, dropout is only used during training; at test time all neurons are active and outputs are scaled accordingly.

What dropout rate should I start with?

A common starting value is 0.5 for hidden layers and 0.2 for input layers; the best rate is usually found by validation.

What is Dropout?

Dropout is a regularization technique used during neural network training that randomly sets a fraction of neurons to zero on each forward pass to reduce overfitting.

During training, dropout randomly drops out (sets to zero) a chosen percentage of neurons in a layer for each mini-batch. This prevents the network from relying too heavily on any single neuron and forces it to learn more robust features.

At inference time, all neurons are used but their outputs are scaled by the dropout probability so the expected value remains the same as during training. This simple change improves generalization without changing the model architecture.

Dropout can be viewed as training an implicit ensemble of many thinned networks that share weights, which is why it often leads to better performance on unseen data.

Example

In a fully-connected layer with 100 neurons and a dropout rate of 0.5, roughly 50 neurons are randomly ignored during each training step. The remaining neurons must still produce useful activations, making the model less sensitive to the removal of any particular neuron.

Why it matters

Dropout remains one of the simplest and most effective ways to combat overfitting in deep networks and is still widely used or adapted in modern architectures such as transformers and CNNs.

Frequently asked questions

It can slightly increase the number of epochs needed, but the extra robustness usually outweighs the modest extra compute.

Related terms

Overfitting

Overfitting happens when a machine learning model learns the training data too closely, including its noise and quirks, so it fails to perform well on new, unseen data.

Regularization

Regularization is a set of techniques in machine learning that reduce overfitting by adding a penalty term to the model's loss function, discouraging overly complex or large parameter values.

Neural Network

A neural network, or artificial neural network (ANN), is a computational model inspired by the human brain that learns to recognize patterns in data by passing information through layers of interconnected artificial neurons.

Training

Training is the process of feeding data into a machine learning model so it can learn patterns and adjust its internal parameters to make accurate predictions.

Ensemble Learning

Ensemble learning is a machine learning approach that combines predictions from multiple models to achieve better accuracy and robustness than any individual model.

Batch Normalization

Batch Normalization is a technique in neural networks that normalizes the inputs to each layer by subtracting the batch mean and dividing by the batch standard deviation, then scaling and shifting with learnable parameters.