Does higher temperature always mean better creativity?

Not always—very high values can make text incoherent or nonsensical, so moderate increases are usually best for creative tasks.

Is temperature the same as randomness?

It directly controls how random the sampling is, but other parameters like top-p also influence randomness.

What is Temperature?

Temperature is a parameter in large language models that controls the randomness of generated text. Lower values produce more focused and deterministic outputs, while higher values increase creativity and variability.

It works by scaling the model's raw output scores (logits) before they are converted into probabilities via the softmax function. A temperature below 1 sharpens the distribution toward high-probability tokens, while a value above 1 flattens it to allow lower-probability tokens more chance of selection.

At temperature 0 the model always picks the single most likely next token (greedy decoding). At the default value of 1 the model samples from its normal learned distribution. Values greater than 1 make unlikely tokens more probable, producing more diverse but sometimes less coherent text.

The key idea is a simple trade-off between coherence and diversity that lets users tune generation style without retraining the model.

Example

When writing a product description, temperature 0.2 yields safe, repetitive phrasing, while temperature 1.2 may introduce unexpected metaphors or unusual word choices.

Why it matters

Temperature gives users direct control over output style, enabling the same model to handle factual Q&A, creative storytelling, or code generation by simply adjusting one number.

Frequently asked questions

It forces the model to always choose the single highest-probability token, producing the most deterministic and repeatable output.

Related terms

Top-p Sampling

Top-p sampling (nucleus sampling) is a text-generation technique that dynamically selects the smallest set of most likely next tokens whose combined probability exceeds a threshold p (e.g. 0.9), then samples from that set.

Greedy Decoding

Greedy decoding is a text generation strategy in NLP where, at each step, the model selects the single token with the highest probability as the next output.

Hallucination

In LLMs, hallucination is when the model generates fluent, confident text that is factually incorrect, fabricated, or not supported by its training data.

Retrieval-Augmented Generation

Retrieval-Augmented Generation (RAG) is a technique that improves large language models by retrieving relevant external information before generating a response.