What is Long Short-Term Memory?
Also known as: LSTM
Long Short-Term Memory (LSTM) is a type of recurrent neural network architecture designed to learn and retain information over long sequences of data.
Standard recurrent networks struggle with long-term dependencies because gradients tend to vanish or explode during training. LSTMs solve this by maintaining a separate cell state that acts like a conveyor belt, allowing information to flow across many time steps with minimal change.
Three gates control what happens to the cell state: the forget gate decides what to discard, the input gate decides what new information to store, and the output gate decides what to expose as the hidden state. Each gate uses sigmoid and tanh activations to regulate information flow.
This gated mechanism lets the network selectively remember or forget patterns, making it effective for sequential tasks where context from many steps earlier is important.
Example
When predicting the next word in a long sentence, an LSTM can remember the subject mentioned several words earlier and use that context to choose the correct verb form.
Why it matters
LSTMs enabled major advances in speech recognition, machine translation, and time-series forecasting before transformers became dominant, and they remain widely used in resource-constrained or streaming applications today.
Frequently asked questions
LSTMs add gates and a cell state that let them keep relevant information across many time steps, while regular RNNs suffer from vanishing gradients and quickly forget earlier inputs.
Related terms
A Recurrent Neural Network (RNN) is a type of neural network built to handle sequential data by passing information from one step to the next through a hidden state that acts like a memory.
An activation function is a mathematical operation applied to the output of a neuron in a neural network that decides whether the neuron should 'fire' and pass on a signal.
An autoencoder is a neural network that learns to compress input data into a smaller representation and then reconstruct the original data from that compressed form.
Backpropagation is an algorithm for training neural networks by calculating how much each weight contributed to the prediction error and adjusting those weights accordingly. It uses the chain rule to efficiently compute gradients of the loss function.
A Convolutional Neural Network (CNN) is a specialized type of deep neural network designed to process grid-like data such as images by automatically learning spatial patterns and features.
In deep learning, a decoder is a neural network module that converts an encoded representation (like a context vector or latent features) into a final output such as text, images, or sequences.