Skip to content

What is Greedy Decoding?

Greedy decoding is a text generation strategy in NLP where, at each step, the model selects the single token with the highest probability as the next output.

It works by repeatedly choosing the argmax (most probable) token from the model's output distribution at every time step, feeding that choice back as input for the next prediction until an end token is reached or a length limit is hit.

Because it never revisits earlier decisions or explores lower-probability alternatives, the approach is computationally cheap and deterministic, but it can produce suboptimal sequences when a locally high-probability choice leads to poorer overall output.

The method is widely used as a baseline in autoregressive models such as GPT-style language models and neural machine translation systems.

Example

When asked to continue "The cat sat on the", a model using greedy decoding might pick "mat" (highest probability) rather than exploring "chair" or "windowsill," resulting in the sentence "The cat sat on the mat."

Why it matters

Greedy decoding offers the fastest inference speed for large language models, making it practical for real-time applications even though more sophisticated search methods often yield higher-quality text.

Frequently asked questions

No, it can get stuck in locally optimal choices and miss globally better sequences.