Skip to content
Sign in

What is Positional Encoding?

Positional encoding adds information about token order to input embeddings in transformer models, which lack any built-in sense of sequence because they process tokens in parallel.

Transformers rely on self-attention, which treats every token equally regardless of position. Without extra signals, the model cannot tell whether 'cat sat on mat' differs from 'mat on sat cat'.

The standard solution adds a fixed or learned vector to each token embedding. Sinusoidal encodings use sine and cosine functions of different frequencies so the model can easily learn relative positions; learned encodings treat position as another trainable embedding.

Because the added vectors are unique per position yet allow arithmetic operations, attention heads can later discover patterns such as 'word two positions after a verb'.

Example

In the sentence 'I saw a cat', the word 'saw' receives a different positional vector than it would in 'I saw a saw', letting the model know the first 'saw' is a verb and the second is a noun.

Why it matters

Positional encoding is what lets transformers scale to long contexts and become the backbone of every modern LLM, replacing recurrent networks for nearly all sequence tasks.

Frequently asked questions

Because all tokens are processed simultaneously, the architecture itself has no notion of 'before' or 'after' unless explicit position information is supplied.