What are the three gates in an LSTM?

The forget gate, input gate, and output gate regulate what information is discarded, added, or passed forward from the cell state.

Are LSTMs still used now that transformers exist?

Yes, LSTMs are still common for smaller models, on-device inference, and streaming data where the quadratic cost of attention is impractical.

What is Long Short-Term Memory?

Also known as: LSTM

Long Short-Term Memory (LSTM) is a type of recurrent neural network architecture designed to learn and retain information over long sequences of data.

Standard recurrent networks struggle with long-term dependencies because gradients tend to vanish or explode during training. LSTMs solve this by maintaining a separate cell state that acts like a conveyor belt, allowing information to flow across many time steps with minimal change.

Three gates control what happens to the cell state: the forget gate decides what to discard, the input gate decides what new information to store, and the output gate decides what to expose as the hidden state. Each gate uses sigmoid and tanh activations to regulate information flow.

This gated mechanism lets the network selectively remember or forget patterns, making it effective for sequential tasks where context from many steps earlier is important.

Example

When predicting the next word in a long sentence, an LSTM can remember the subject mentioned several words earlier and use that context to choose the correct verb form.

Why it matters

LSTMs enabled major advances in speech recognition, machine translation, and time-series forecasting before transformers became dominant, and they remain widely used in resource-constrained or streaming applications today.

Frequently asked questions

LSTMs add gates and a cell state that let them keep relevant information across many time steps, while regular RNNs suffer from vanishing gradients and quickly forget earlier inputs.

Related terms

Recurrent Neural Network

A Recurrent Neural Network (RNN) is a type of neural network built to handle sequential data by passing information from one step to the next through a hidden state that acts like a memory.

Activation Function

An activation function is a mathematical operation applied to the output of a neuron in a neural network that decides whether the neuron should 'fire' and pass on a signal.

Autoencoder

An autoencoder is a neural network that learns to compress input data into a smaller representation and then reconstruct the original data from that compressed form.

Backpropagation

Backpropagation is an algorithm for training neural networks by calculating how much each weight contributed to the prediction error and adjusting those weights accordingly. It uses the chain rule to efficiently compute gradients of the loss function.

Convolutional Neural Network

A Convolutional Neural Network (CNN) is a specialized type of deep neural network designed to process grid-like data such as images by automatically learning spatial patterns and features.

Decoder

In deep learning, a decoder is a neural network module that converts an encoded representation (like a context vector or latent features) into a final output such as text, images, or sequences.