What is Embedding?
Also known as: Vector Embedding
An embedding (or vector embedding) is a way to represent words, sentences, or other data as dense numerical vectors in a high-dimensional space so that similar items end up close together.
Embeddings are learned by neural networks that map discrete items (tokens, words, documents) into continuous vectors. During training the model adjusts the numbers so that contextually or semantically related items receive similar vectors.
The key property is geometry: distance or angle between vectors reflects meaning. Operations like vector addition can capture analogies (e.g., king – man + woman ≈ queen).
Because the vectors are dense and low-dimensional compared with one-hot encodings, they let downstream models efficiently measure similarity and generalize to unseen data.
Example
In a movie-recommendation system the words “action” and “adventure” receive embeddings that lie close to each other, while “action” and “romance” are farther apart, allowing the system to suggest similar films.
Why it matters
Embeddings turn raw text into numbers that capture meaning, powering modern NLP applications such as semantic search, chatbots, recommendation engines, and retrieval-augmented generation.
Frequently asked questions
A one-hot vector is sparse and treats every word as completely unrelated; an embedding is dense and places related words near each other in vector space.
Related terms
A vector database is a specialized database designed to store and query high-dimensional vector embeddings, enabling fast similarity searches instead of traditional exact-match queries.
Greedy decoding is a text generation strategy in NLP where, at each step, the model selects the single token with the highest probability as the next output.
Named Entity Recognition (NER) is a natural language processing task that automatically finds and classifies specific names and terms in text into categories like people, organizations, locations, or dates.