What is Token?
A token is the basic unit of text that an LLM reads and generates. It may be a whole word, part of a word, or punctuation, depending on the model's tokenizer.
Tokenization breaks raw text into these discrete pieces so the model can convert them into numbers it understands. Common methods like Byte-Pair Encoding (BPE) split words into frequent subword units, allowing the model to handle rare words and different languages efficiently.
Every model has a fixed vocabulary of tokens and a maximum context length measured in tokens. Longer inputs are truncated or cost more to process, which directly affects how much information the model can consider at once.
Token count also determines API pricing and generation speed, making tokenization a central factor in both model design and practical usage.
Example
The sentence "ChatGPT is helpful" might become four tokens: ["Chat", "G", "PT", " is helpful"] with one tokenizer or three tokens with another.
Why it matters
Tokens set the limits on context size, control inference cost, and shape how well models handle language, so they are fundamental to building and using modern LLMs.
Frequently asked questions
No. Tokens can be whole words, word pieces, or even single characters, so one word may equal multiple tokens.
Related terms
A context window is the maximum number of tokens an LLM can process together in one pass, including the user's input and any conversation history.
An embedding (or vector embedding) is a way to represent words, sentences, or other data as dense numerical vectors in a high-dimensional space so that similar items end up close together.
The attention mechanism is a technique in neural networks that lets the model dynamically focus on the most relevant parts of the input when processing each element, rather than treating all inputs equally.
Context length is the maximum number of tokens an LLM can process in a single input at once, acting as its effective memory window.
A foundation model is a large-scale AI model trained on massive, diverse datasets that can be adapted to perform many different tasks with minimal additional training.
Grounding in LLMs connects a model's generated text to verifiable external facts or data sources so responses are accurate rather than invented.