Skip to content
Sign in

What is Gradient Boosting?

Gradient Boosting is an ensemble machine learning technique that builds a strong predictive model by sequentially adding weak learners, typically decision trees, that correct the errors of the previous models.

It works by starting with an initial prediction and then iteratively training new models on the residuals, or errors, of the current ensemble. Each new model is trained to minimize a loss function using gradient descent, effectively moving the overall predictions in the direction that reduces error.

The process combines many simple models additively, with each one focusing on the hardest examples that previous models got wrong. Regularization techniques like learning rate and tree depth limits help prevent overfitting.

Popular variants include XGBoost, LightGBM, and CatBoost, which optimize speed and performance on large datasets.

Example

To predict house prices, gradient boosting might start with the average price as a baseline, then add a tree that corrects for underpriced homes in certain neighborhoods, followed by another tree that fixes remaining errors in square footage predictions.

Why it matters

Gradient boosting often achieves state-of-the-art results on tabular data and structured datasets, powering many production systems in finance, healthcare, and recommendation engines.

Frequently asked questions

Random forest builds trees independently in parallel, while gradient boosting builds them sequentially, with each tree correcting the previous ones.