Skip to content

What is Embedding?

Also known as: Vector Embedding

An embedding (or vector embedding) is a way to represent words, sentences, or other data as dense numerical vectors in a high-dimensional space so that similar items end up close together.

Embeddings are learned by neural networks that map discrete items (tokens, words, documents) into continuous vectors. During training the model adjusts the numbers so that contextually or semantically related items receive similar vectors.

The key property is geometry: distance or angle between vectors reflects meaning. Operations like vector addition can capture analogies (e.g., king – man + woman ≈ queen).

Because the vectors are dense and low-dimensional compared with one-hot encodings, they let downstream models efficiently measure similarity and generalize to unseen data.

Example

In a movie-recommendation system the words “action” and “adventure” receive embeddings that lie close to each other, while “action” and “romance” are farther apart, allowing the system to suggest similar films.

Why it matters

Embeddings turn raw text into numbers that capture meaning, powering modern NLP applications such as semantic search, chatbots, recommendation engines, and retrieval-augmented generation.

Frequently asked questions

A one-hot vector is sparse and treats every word as completely unrelated; an embedding is dense and places related words near each other in vector space.