What is Hyperparameter?
A hyperparameter is a value or setting chosen by the user before training a machine learning model that controls the learning process itself.
Unlike model parameters, which are learned automatically from the data during training, hyperparameters must be set in advance and are not updated by the algorithm.
They influence how quickly or effectively the model learns, how complex the model can become, and how it avoids problems like overfitting.
Common ways to select good hyperparameters include manual tuning, grid search, random search, or automated methods like Bayesian optimization.
Example
When training a neural network, you might set the learning rate to 0.01 and the number of hidden layers to 3; these choices are hyperparameters that stay fixed while the model weights are learned from data.
Why it matters
Hyperparameters can dramatically affect a model's accuracy and generalization, so choosing them well is essential for building effective AI systems today.
Frequently asked questions
Hyperparameters are set before training and control the learning process, while model parameters are learned from the data during training.
Related terms
Training is the process of feeding data into a machine learning model so it can learn patterns and adjust its internal parameters to make accurate predictions.
Overfitting happens when a machine learning model learns the training data too closely, including its noise and quirks, so it fails to perform well on new, unseen data.
Regularization is a set of techniques in machine learning that reduce overfitting by adding a penalty term to the model's loss function, discouraging overly complex or large parameter values.
Adam (Adaptive Moment Estimation) is a popular optimization algorithm used to train machine learning models by iteratively updating parameters based on gradients.
Classification is a supervised machine learning task that assigns input data to one of several predefined categories or classes based on patterns learned from labeled training examples.
Clustering is an unsupervised machine learning technique that automatically groups similar data points together into clusters based on their features, without using any labeled examples.