Skip to content

What is Long Short-Term Memory?

Also known as: LSTM

Long Short-Term Memory (LSTM) is a type of recurrent neural network architecture designed to learn and retain information over long sequences of data.

Standard recurrent networks struggle with long-term dependencies because gradients tend to vanish or explode during training. LSTMs solve this by maintaining a separate cell state that acts like a conveyor belt, allowing information to flow across many time steps with minimal change.

Three gates control what happens to the cell state: the forget gate decides what to discard, the input gate decides what new information to store, and the output gate decides what to expose as the hidden state. Each gate uses sigmoid and tanh activations to regulate information flow.

This gated mechanism lets the network selectively remember or forget patterns, making it effective for sequential tasks where context from many steps earlier is important.

Example

When predicting the next word in a long sentence, an LSTM can remember the subject mentioned several words earlier and use that context to choose the correct verb form.

Why it matters

LSTMs enabled major advances in speech recognition, machine translation, and time-series forecasting before transformers became dominant, and they remain widely used in resource-constrained or streaming applications today.

Frequently asked questions

LSTMs add gates and a cell state that let them keep relevant information across many time steps, while regular RNNs suffer from vanishing gradients and quickly forget earlier inputs.