Skip to content
Sign in

What is Random Forest?

A Random Forest is an ensemble machine learning algorithm that builds many decision trees during training and combines their outputs to produce a more accurate and stable prediction.

It works by creating a large number of individual decision trees, each trained on a random subset of the data (via bootstrapping) and a random subset of features at each split. This randomness reduces correlation between trees.

For classification, the forest outputs the class chosen by the majority of trees; for regression, it averages the predictions. The approach is known as bagging combined with random feature selection.

Because errors from individual trees tend to cancel out, the overall model is less prone to overfitting than a single decision tree while remaining interpretable through feature-importance measures.

Example

To predict whether a loan applicant will default, a Random Forest builds hundreds of trees on different random samples of past loan data; the final decision is the majority vote across all trees, yielding a more reliable risk score than any single tree.

Why it matters

Random Forests remain one of the most widely used, robust, and easy-to-tune algorithms for tabular data in both research and production, often serving as a strong baseline before trying deep learning.

Frequently asked questions

A single decision tree can easily overfit the training data, while a Random Forest averages many diverse trees to reduce variance and improve generalization.