Why do models use tokens instead of letters?

Tokens let the model learn meaningful patterns at a useful scale while keeping the vocabulary size manageable.

How do I count tokens in my prompt?

Use the model's official tokenizer library or the counting tool provided by the API you are using.

What is Token?

A token is the basic unit of text that an LLM reads and generates. It may be a whole word, part of a word, or punctuation, depending on the model's tokenizer.

Tokenization breaks raw text into these discrete pieces so the model can convert them into numbers it understands. Common methods like Byte-Pair Encoding (BPE) split words into frequent subword units, allowing the model to handle rare words and different languages efficiently.

Every model has a fixed vocabulary of tokens and a maximum context length measured in tokens. Longer inputs are truncated or cost more to process, which directly affects how much information the model can consider at once.

Token count also determines API pricing and generation speed, making tokenization a central factor in both model design and practical usage.

Example

The sentence "ChatGPT is helpful" might become four tokens: ["Chat", "G", "PT", " is helpful"] with one tokenizer or three tokens with another.

Why it matters

Tokens set the limits on context size, control inference cost, and shape how well models handle language, so they are fundamental to building and using modern LLMs.

Frequently asked questions

No. Tokens can be whole words, word pieces, or even single characters, so one word may equal multiple tokens.

Related terms

Context Window

A context window is the maximum number of tokens an LLM can process together in one pass, including the user's input and any conversation history.

Embedding

An embedding (or vector embedding) is a way to represent words, sentences, or other data as dense numerical vectors in a high-dimensional space so that similar items end up close together.

Attention Mechanism

The attention mechanism is a technique in neural networks that lets the model dynamically focus on the most relevant parts of the input when processing each element, rather than treating all inputs equally.

Context Length

Context length is the maximum number of tokens an LLM can process in a single input at once, acting as its effective memory window.

Foundation Model

A foundation model is a large-scale AI model trained on massive, diverse datasets that can be adapted to perform many different tasks with minimal additional training.

Grounding

Grounding in LLMs connects a model's generated text to verifiable external facts or data sources so responses are accurate rather than invented.