Skip to content
Sign in

What is Context Length?

Context length is the maximum number of tokens an LLM can process in a single input at once, acting as its effective memory window.

It determines how much preceding text the model can reference when generating a response, directly shaping coherence over long inputs.

Measured in tokens rather than words, it is constrained by the transformer's attention mechanism and available compute during inference.

Exceeding the limit forces truncation or summarization, while techniques like sliding windows or sparse attention aim to extend it.

Example

A model with a 4k context length can read and answer questions about a short article, but a 128k model can handle an entire novel in one go.

Why it matters

Larger context lengths enable LLMs to manage lengthy documents, multi-turn conversations, and complex reasoning tasks that are central to real-world applications today.

Frequently asked questions

The model typically truncates older or excess tokens, which can cause it to lose important earlier information.