Why do basic RNNs have trouble with long sequences?

During training, gradients can shrink or grow exponentially over many time steps, making it hard for the network to learn dependencies that are far apart in the sequence.

What are common modern alternatives to RNNs?

LSTMs, GRUs, and especially Transformers are widely used today because they handle long-range dependencies more effectively than vanilla RNNs.

What is Recurrent Neural Network?

Also known as: RNN

A Recurrent Neural Network (RNN) is a type of neural network built to handle sequential data by passing information from one step to the next through a hidden state that acts like a memory.

Unlike standard feedforward networks that process each input independently, an RNN reuses the same weights across time steps and feeds its output from the previous step back into the current step. This loop lets the network maintain context from earlier inputs while processing new ones.

Training uses a technique called backpropagation through time, which unfolds the network across the sequence and adjusts weights based on errors accumulated over multiple steps. Simple RNNs can struggle with long sequences due to vanishing or exploding gradients, leading to specialized variants like LSTMs and GRUs.

At each time step the network computes a new hidden state by combining the current input with the previous hidden state, then produces an output. This design makes RNNs naturally suited for tasks where order and context matter.

Example

When predicting the next word in a sentence, an RNN reads each word one by one while keeping a running memory of earlier words, allowing it to generate more coherent text than a network that sees every word in isolation.

Why it matters

RNNs were foundational for early advances in natural language processing, speech recognition, and time-series forecasting, establishing the core idea of processing sequences that later influenced modern architectures like Transformers.

Frequently asked questions

A regular neural network processes each input independently with no memory of previous inputs, while an RNN maintains a hidden state that carries information across time steps.

Related terms

Long Short-Term Memory

Long Short-Term Memory (LSTM) is a type of recurrent neural network architecture designed to learn and retain information over long sequences of data.

Activation Function

An activation function is a mathematical operation applied to the output of a neuron in a neural network that decides whether the neuron should 'fire' and pass on a signal.

Autoencoder

An autoencoder is a neural network that learns to compress input data into a smaller representation and then reconstruct the original data from that compressed form.

Backpropagation

Backpropagation is an algorithm for training neural networks by calculating how much each weight contributed to the prediction error and adjusting those weights accordingly. It uses the chain rule to efficiently compute gradients of the loss function.

Convolutional Neural Network

A Convolutional Neural Network (CNN) is a specialized type of deep neural network designed to process grid-like data such as images by automatically learning spatial patterns and features.

Decoder

In deep learning, a decoder is a neural network module that converts an encoded representation (like a context vector or latent features) into a final output such as text, images, or sequences.