What is Gradient Boosting?
Gradient Boosting is an ensemble machine learning technique that builds a strong predictive model by sequentially adding weak learners, typically decision trees, that correct the errors of the previous models.
It works by starting with an initial prediction and then iteratively training new models on the residuals, or errors, of the current ensemble. Each new model is trained to minimize a loss function using gradient descent, effectively moving the overall predictions in the direction that reduces error.
The process combines many simple models additively, with each one focusing on the hardest examples that previous models got wrong. Regularization techniques like learning rate and tree depth limits help prevent overfitting.
Popular variants include XGBoost, LightGBM, and CatBoost, which optimize speed and performance on large datasets.
Example
To predict house prices, gradient boosting might start with the average price as a baseline, then add a tree that corrects for underpriced homes in certain neighborhoods, followed by another tree that fixes remaining errors in square footage predictions.
Why it matters
Gradient boosting often achieves state-of-the-art results on tabular data and structured datasets, powering many production systems in finance, healthcare, and recommendation engines.
Frequently asked questions
Random forest builds trees independently in parallel, while gradient boosting builds them sequentially, with each tree correcting the previous ones.
Related terms
A decision tree is a supervised machine learning model that predicts outcomes by recursively splitting data into branches based on feature values, forming a tree-like structure with decisions at internal nodes and final predictions at the leaves.
A Random Forest is an ensemble machine learning algorithm that builds many decision trees during training and combines their outputs to produce a more accurate and stable prediction.
Ensemble learning is a machine learning approach that combines predictions from multiple models to achieve better accuracy and robustness than any individual model.
Active learning is a machine learning technique where the model itself selects the most informative unlabeled data points to be labeled by a human, rather than labeling data randomly or all at once.
Adam (Adaptive Moment Estimation) is a popular optimization algorithm used to train machine learning models by iteratively updating parameters based on gradients.
Anomaly detection is a machine learning technique that identifies rare or unusual data points that differ significantly from the majority of the data, often called outliers.