Skip to content

What is Temperature?

Temperature is a parameter in large language models that controls the randomness of generated text. Lower values produce more focused and deterministic outputs, while higher values increase creativity and variability.

It works by scaling the model's raw output scores (logits) before they are converted into probabilities via the softmax function. A temperature below 1 sharpens the distribution toward high-probability tokens, while a value above 1 flattens it to allow lower-probability tokens more chance of selection.

At temperature 0 the model always picks the single most likely next token (greedy decoding). At the default value of 1 the model samples from its normal learned distribution. Values greater than 1 make unlikely tokens more probable, producing more diverse but sometimes less coherent text.

The key idea is a simple trade-off between coherence and diversity that lets users tune generation style without retraining the model.

Example

When writing a product description, temperature 0.2 yields safe, repetitive phrasing, while temperature 1.2 may introduce unexpected metaphors or unusual word choices.

Why it matters

Temperature gives users direct control over output style, enabling the same model to handle factual Q&A, creative storytelling, or code generation by simply adjusting one number.

Frequently asked questions

It forces the model to always choose the single highest-probability token, producing the most deterministic and repeatable output.