What is Greedy Decoding?
Greedy decoding is a text generation strategy in NLP where, at each step, the model selects the single token with the highest probability as the next output.
It works by repeatedly choosing the argmax (most probable) token from the model's output distribution at every time step, feeding that choice back as input for the next prediction until an end token is reached or a length limit is hit.
Because it never revisits earlier decisions or explores lower-probability alternatives, the approach is computationally cheap and deterministic, but it can produce suboptimal sequences when a locally high-probability choice leads to poorer overall output.
The method is widely used as a baseline in autoregressive models such as GPT-style language models and neural machine translation systems.
Example
When asked to continue "The cat sat on the", a model using greedy decoding might pick "mat" (highest probability) rather than exploring "chair" or "windowsill," resulting in the sentence "The cat sat on the mat."
Why it matters
Greedy decoding offers the fastest inference speed for large language models, making it practical for real-time applications even though more sophisticated search methods often yield higher-quality text.
Frequently asked questions
No, it can get stuck in locally optimal choices and miss globally better sequences.
Related terms
An embedding (or vector embedding) is a way to represent words, sentences, or other data as dense numerical vectors in a high-dimensional space so that similar items end up close together.
Named Entity Recognition (NER) is a natural language processing task that automatically finds and classifies specific names and terms in text into categories like people, organizations, locations, or dates.