Why do we need a context vector?

It acts as a bottleneck that forces the model to capture the essential meaning of the input in a fixed-size form the decoder can use.

Is the Encoder-Decoder only used for text?

No, the same pattern appears in image captioning, speech recognition, and other sequence tasks.

What is Encoder-Decoder?

An Encoder-Decoder is a neural network architecture that uses one model (the encoder) to compress input data into a compact representation and a second model (the decoder) to generate output from that representation.

The encoder processes the input sequence step by step, typically with recurrent layers, building hidden states that capture the meaning of the data. At the end it produces a fixed-size context vector that summarizes the entire input.

The decoder takes this context vector and generates the output sequence one element at a time, using its own recurrent layers and often feeding its previous predictions back as input.

Modern variants replace the single context vector with attention mechanisms so the decoder can focus on different parts of the input at each step.

Example

In machine translation an English sentence is fed to the encoder; the decoder then produces the equivalent French sentence word by word using the encoded representation.

Why it matters

Encoder-Decoder models form the backbone of sequence-to-sequence tasks in NLP and are the direct precursor to the Transformer architecture used in today’s large language models.

Frequently asked questions

The encoder reads and compresses the input; the decoder generates the output from that compressed representation.

Related terms

Recurrent Neural Network

A Recurrent Neural Network (RNN) is a type of neural network built to handle sequential data by passing information from one step to the next through a hidden state that acts like a memory.

Attention Mechanism

The attention mechanism is a technique in neural networks that lets the model dynamically focus on the most relevant parts of the input when processing each element, rather than treating all inputs equally.

Transformer

A Transformer is a neural network architecture that processes sequential data like text using self-attention to weigh relationships between all parts of the input at once.

Autoencoder

An autoencoder is a neural network that learns to compress input data into a smaller representation and then reconstruct the original data from that compressed form.

Activation Function

An activation function is a mathematical operation applied to the output of a neuron in a neural network that decides whether the neuron should 'fire' and pass on a signal.

Backpropagation

Backpropagation is an algorithm for training neural networks by calculating how much each weight contributed to the prediction error and adjusting those weights accordingly. It uses the chain rule to efficiently compute gradients of the loss function.