What is Variational Autoencoder?
Also known as: VAE
A Variational Autoencoder (VAE) is a neural network that learns a compressed probabilistic representation of data and can generate new similar examples by sampling from that space. It combines autoencoders with variational inference to enable both reconstruction and generation.
A VAE consists of an encoder that maps input data to parameters of a probability distribution (usually Gaussian) in a lower-dimensional latent space, and a decoder that reconstructs the data from samples drawn from this distribution.
Unlike standard autoencoders, VAEs add a regularization term (KL divergence) that forces the latent distribution to stay close to a standard normal, creating a smooth, continuous latent space from which new data points can be generated.
Training uses a reparameterization trick to allow backpropagation through the random sampling step, balancing reconstruction accuracy with the ability to sample realistic new outputs.
Example
Trained on handwritten digits, a VAE can compress each image into a few numbers representing a distribution; sampling nearby points in that space and decoding them produces new, plausible digit images that were never in the training set.
Why it matters
VAEs are foundational generative models that introduced scalable probabilistic generation with neural networks and remain widely used for tasks like image synthesis, anomaly detection, and as building blocks in modern generative systems.
Frequently asked questions
A regular autoencoder maps inputs to fixed points in latent space, while a VAE maps to probability distributions, enabling generation of new samples by drawing from those distributions.
Related terms
An autoencoder is a neural network that learns to compress input data into a smaller representation and then reconstruct the original data from that compressed form.
A Generative Adversarial Network (GAN) is a machine learning model made of two neural networks that compete against each other to generate realistic new data, such as images or text.
A diffusion model is a generative AI technique that creates new data like images by learning to reverse a gradual noising process applied to training examples.
A multimodal model is a generative AI system that can process and create content across multiple data types, such as text, images, audio, or video, within a single model.