How is greedy decoding different from beam search?

Greedy decoding keeps only the single best token at each step, while beam search maintains multiple candidate sequences in parallel.

When should I use greedy decoding?

Use it when speed and determinism matter more than maximum output quality, such as in quick prototyping or latency-sensitive services.

What is Greedy Decoding?

Greedy decoding is a text generation strategy in NLP where, at each step, the model selects the single token with the highest probability as the next output.

It works by repeatedly choosing the argmax (most probable) token from the model's output distribution at every time step, feeding that choice back as input for the next prediction until an end token is reached or a length limit is hit.

Because it never revisits earlier decisions or explores lower-probability alternatives, the approach is computationally cheap and deterministic, but it can produce suboptimal sequences when a locally high-probability choice leads to poorer overall output.

The method is widely used as a baseline in autoregressive models such as GPT-style language models and neural machine translation systems.

Example

When asked to continue "The cat sat on the", a model using greedy decoding might pick "mat" (highest probability) rather than exploring "chair" or "windowsill," resulting in the sentence "The cat sat on the mat."

Why it matters

Greedy decoding offers the fastest inference speed for large language models, making it practical for real-time applications even though more sophisticated search methods often yield higher-quality text.

Frequently asked questions

No, it can get stuck in locally optimal choices and miss globally better sequences.

Related terms

Embedding

An embedding (or vector embedding) is a way to represent words, sentences, or other data as dense numerical vectors in a high-dimensional space so that similar items end up close together.

Named Entity Recognition

Named Entity Recognition (NER) is a natural language processing task that automatically finds and classifies specific names and terms in text into categories like people, organizations, locations, or dates.