How does linear regression find the best line?

It minimizes the sum of squared errors between the actual values and the values predicted by the line, usually via ordinary least squares or gradient descent.

Can linear regression be used for classification?

No, it predicts continuous numbers; for classification tasks, logistic regression or other classifiers are more appropriate.

What is Linear Regression?

Linear regression is a supervised machine learning technique that predicts a continuous target value by fitting a straight line (or hyperplane) to the relationship between input features and the output.

It works by learning coefficients for each input feature so that the equation y = b0 + b1*x1 + b2*x2 + ... produces the smallest possible prediction errors on the training data.

The most common approach, ordinary least squares, finds the line that minimizes the sum of squared differences between actual and predicted values; gradient descent can also be used for larger datasets.

Key ideas include the assumptions of linearity, independence of errors, and homoscedasticity, plus extensions like multiple regression for many features and regularization to prevent overfitting.

Example

A real-estate app might use linear regression to predict a house's sale price from its square footage, number of bedrooms, and age by learning a simple equation from past sales data.

Why it matters

Linear regression remains a foundational building block for understanding more complex models and is still widely used for interpretable forecasting, trend analysis, and as a baseline in modern AI pipelines.

Frequently asked questions

Simple linear regression uses only one input feature, while multiple linear regression uses two or more features to predict the target.

Related terms

Logistic Regression

Logistic Regression is a supervised machine learning algorithm used for binary classification that estimates the probability an input belongs to a particular class.

Gradient Descent

Gradient descent is an optimization algorithm that finds the minimum of a function by repeatedly moving in the direction of the steepest downward slope. In machine learning it is used to minimize a model's error by adjusting parameters step by step.

Supervised Learning

Supervised learning is a machine learning method where a model is trained on data that already has correct answers attached, so it can learn to predict those answers for new data.

Active Learning

Active learning is a machine learning technique where the model itself selects the most informative unlabeled data points to be labeled by a human, rather than labeling data randomly or all at once.

Adam Optimizer

Adam (Adaptive Moment Estimation) is a popular optimization algorithm used to train machine learning models by iteratively updating parameters based on gradients.

Anomaly Detection

Anomaly detection is a machine learning technique that identifies rare or unusual data points that differ significantly from the majority of the data, often called outliers.