What is Self-Supervised Learning?
Self-supervised learning is a machine learning method where a model creates its own training labels directly from the input data, without needing human annotations.
In self-supervised learning, the system automatically generates pseudo-labels by hiding or transforming parts of the data and then training the model to predict or reconstruct the missing information. This turns large amounts of unlabeled data into a supervised-style training signal.
The approach typically involves two stages: first, pre-training on massive unlabeled datasets to learn useful representations, then fine-tuning on smaller labeled datasets for specific tasks. Common techniques include masking words in text or predicting image rotations and augmentations.
It sits between supervised learning (which uses human labels) and unsupervised learning (which finds patterns without labels), combining the scalability of the latter with the clear training objectives of the former.
Example
BERT is trained by randomly masking words in sentences and learning to predict the missing words from context, allowing it to understand language from billions of unlabeled web pages before being fine-tuned for tasks like question answering.
Why it matters
Self-supervised learning enables models to leverage enormous amounts of raw data that would be too expensive to label manually, powering today's large foundation models in language and vision.
Frequently asked questions
Unsupervised learning discovers patterns without any explicit training signal, while self-supervised learning creates its own labels from the data to form a clear prediction task.
Related terms
Supervised learning is a machine learning method where a model is trained on data that already has correct answers attached, so it can learn to predict those answers for new data.
Unsupervised learning is a machine learning method that trains models on unlabeled data to find hidden patterns, structures, or relationships without any guidance on correct outputs.
Transfer learning is a machine learning method that reuses a model trained on one task as the starting point for a different but related task.
Adam (Adaptive Moment Estimation) is a popular optimization algorithm used to train machine learning models by iteratively updating parameters based on gradients.
Classification is a supervised machine learning task that assigns input data to one of several predefined categories or classes based on patterns learned from labeled training examples.
Clustering is an unsupervised machine learning technique that automatically groups similar data points together into clusters based on their features, without using any labeled examples.