How does beam search differ from greedy search?

Greedy search picks the single highest-probability token at every step, while beam search tracks multiple candidates to find a better overall sequence.

Is beam search guaranteed to find the best output?

No, it is a heuristic that may miss the absolute best sequence but usually finds a high-quality one efficiently.

What is Beam Search?

Beam search is a decoding algorithm used in NLP to generate sequences like sentences by exploring multiple high-probability paths instead of just one.

It maintains a fixed number (the beam width) of the most promising partial sequences at each generation step, scoring them based on the model's probability outputs.

At every time step, each sequence in the beam is expanded with possible next tokens; the algorithm then keeps only the top-scoring candidates and discards the rest.

This approach balances quality and efficiency, avoiding both the shortsightedness of always picking the single best token and the computational cost of checking every possible sequence.

Example

When translating 'Hello world' to French, a beam width of 3 might keep the top three partial translations at each step and ultimately select the full sentence with the highest overall probability rather than the first word that looked best.

Why it matters

Beam search is widely used in production NLP systems for machine translation, summarization, and chatbots because it produces more fluent and accurate text than simpler decoding methods while remaining practical to run.

Frequently asked questions

Beam width (or beam size) is the number of candidate sequences kept at each step; larger widths explore more options but increase computation.

Related terms

Greedy Decoding

Greedy decoding is a text generation strategy in NLP where, at each step, the model selects the single token with the highest probability as the next output.

Transformer

A Transformer is a neural network architecture that processes sequential data like text using self-attention to weigh relationships between all parts of the input at once.

Embedding

An embedding (or vector embedding) is a way to represent words, sentences, or other data as dense numerical vectors in a high-dimensional space so that similar items end up close together.

Named Entity Recognition

Named Entity Recognition (NER) is a natural language processing task that automatically finds and classifies specific names and terms in text into categories like people, organizations, locations, or dates.

Natural Language Generation

Natural Language Generation (NLG) is the AI process of automatically turning structured data, facts, or meanings into fluent, human-readable text. It is a core subfield of natural language processing focused on producing natural-sounding language output.

Natural Language Processing

Natural Language Processing (NLP) is a branch of artificial intelligence that enables computers to understand, interpret, and generate human language in useful ways.