What is Linear Regression?
Linear regression is a supervised machine learning technique that predicts a continuous target value by fitting a straight line (or hyperplane) to the relationship between input features and the output.
It works by learning coefficients for each input feature so that the equation y = b0 + b1*x1 + b2*x2 + ... produces the smallest possible prediction errors on the training data.
The most common approach, ordinary least squares, finds the line that minimizes the sum of squared differences between actual and predicted values; gradient descent can also be used for larger datasets.
Key ideas include the assumptions of linearity, independence of errors, and homoscedasticity, plus extensions like multiple regression for many features and regularization to prevent overfitting.
Example
A real-estate app might use linear regression to predict a house's sale price from its square footage, number of bedrooms, and age by learning a simple equation from past sales data.
Why it matters
Linear regression remains a foundational building block for understanding more complex models and is still widely used for interpretable forecasting, trend analysis, and as a baseline in modern AI pipelines.
Frequently asked questions
Simple linear regression uses only one input feature, while multiple linear regression uses two or more features to predict the target.
Related terms
Logistic Regression is a supervised machine learning algorithm used for binary classification that estimates the probability an input belongs to a particular class.
Gradient descent is an optimization algorithm that finds the minimum of a function by repeatedly moving in the direction of the steepest downward slope. In machine learning it is used to minimize a model's error by adjusting parameters step by step.
Supervised learning is a machine learning method where a model is trained on data that already has correct answers attached, so it can learn to predict those answers for new data.
Active learning is a machine learning technique where the model itself selects the most informative unlabeled data points to be labeled by a human, rather than labeling data randomly or all at once.
Adam (Adaptive Moment Estimation) is a popular optimization algorithm used to train machine learning models by iteratively updating parameters based on gradients.
Anomaly detection is a machine learning technique that identifies rare or unusual data points that differ significantly from the majority of the data, often called outliers.