Skip to content
Sign in

What is Transformer?

A Transformer is a neural network architecture that processes sequential data like text using self-attention to weigh relationships between all parts of the input at once.

It replaces older sequential models like RNNs with parallel processing via attention mechanisms, allowing the model to focus on relevant words regardless of their position in a sentence.

Key components include multi-head self-attention layers, feed-forward networks, positional encodings to retain order information, and often an encoder-decoder structure.

This design enables efficient training on massive datasets and scales well to large models used in modern language tasks.

Example

In a sentence like 'The cat sat on the mat because it was tired,' a Transformer can directly connect 'it' to 'cat' through attention scores, helping generate accurate translations or answers.

Why it matters

Transformers power nearly all state-of-the-art LLMs today, enabling breakthroughs in chatbots, translation, and content generation by handling long contexts efficiently at scale.

Frequently asked questions

Transformers process all input tokens in parallel using attention, while RNNs handle them one by one sequentially, making Transformers much faster to train on long sequences.