Skip to content
Sign in

What is Context Window?

A context window is the maximum number of tokens an LLM can process together in one pass, including the user's input and any conversation history.

It defines the span of text the model can attend to when making predictions, enforced by the transformer's fixed positional embeddings and attention layers.

Exceeding the window forces truncation of older tokens, so the model literally loses access to that information for the current generation step.

Recent models expand windows to 128k–1M tokens, but memory and compute costs grow quadratically with size.

Example

If a user pastes a 10,000-word document into a model whose window holds only 4,000 tokens, the earliest paragraphs are dropped and the model cannot reference them when answering questions.

Why it matters

Context-window size sets hard limits on coherent long conversations, document analysis, and agent memory, directly affecting real-world usefulness of LLMs today.

Frequently asked questions

They range from a few thousand tokens in older models to 128k–1M+ tokens in current frontier models.