Skip to content

What is Top-p Sampling?

Also known as: Nucleus Sampling

Top-p sampling (nucleus sampling) is a text-generation technique that dynamically selects the smallest set of most likely next tokens whose combined probability exceeds a threshold p (e.g. 0.9), then samples from that set.

Traditional fixed-size methods like top-k always keep the same number of candidates. Top-p instead looks at the model's probability distribution and keeps adding tokens in descending order until their cumulative probability reaches p, forming a variable-sized 'nucleus'.

This approach automatically adapts to the model's confidence: when the distribution is peaked, fewer tokens are kept; when it is flatter, more tokens are included, helping balance coherence and diversity.

The parameter p (typically 0.8–0.95) controls the trade-off; lower p yields more focused output while higher p allows greater variety.

Example

When generating the next word after 'The cat sat on the', the model might assign high probability to 'mat' and 'sofa'. With p=0.9 the nucleus might contain only those two tokens, so the model samples between them rather than risking a low-probability word like 'airplane'.

Why it matters

Top-p sampling is widely used in modern LLMs because it produces more coherent yet varied text than greedy or fixed-k decoding, improving the quality of chatbots, story generators, and other creative applications.

Frequently asked questions

Top-k always keeps a fixed number k of tokens; top-p keeps a variable number whose probabilities sum to at least p, adapting to each prediction.