Also known as: Weights
Parameters, also called weights, are the internal numerical values in a machine learning model that are adjusted during training. They determine how the model processes inputs to generate predictions or outputs.
In neural networks and LLMs, parameters are the learned coefficients that connect neurons or attention heads across layers. Each parameter multiplies or transforms signals as data flows through the model.
During training, parameters are updated using optimization algorithms like gradient descent to minimize the difference between the model's predictions and the actual target values.
Modern LLMs contain billions of parameters, allowing them to capture complex patterns in language data, but this also increases computational requirements for training and inference.
In a simple neural network that predicts house prices, the parameters are the weights applied to features like square footage and location. After training on past sales data, these weights encode the importance of each feature for accurate predictions.
Parameters encode the knowledge a model has acquired from data, directly affecting its capabilities and performance. Scaling the number of parameters has driven major advances in LLMs, enabling more fluent and knowledgeable AI systems.
Parameters are learned automatically from data during training, while hyperparameters are settings chosen by humans before training begins, such as learning rate or model architecture.
A neural network, or artificial neural network (ANN), is a computational model inspired by the human brain that learns to recognize patterns in data by passing information through layers of interconnected artificial neurons.
Training is the process of feeding data into a machine learning model so it can learn patterns and adjust its internal parameters to make accurate predictions.
Backpropagation is an algorithm for training neural networks by calculating how much each weight contributed to the prediction error and adjusting those weights accordingly. It uses the chain rule to efficiently compute gradients of the loss function.
Gradient descent is an optimization algorithm that finds the minimum of a function by repeatedly moving in the direction of the steepest downward slope. In machine learning it is used to minimize a model's error by adjusting parameters step by step.
Fine-tuning is the process of taking a pre-trained AI model and continuing its training on a smaller, task-specific dataset to adapt it for a particular use case.
The attention mechanism is a technique in neural networks that lets the model dynamically focus on the most relevant parts of the input when processing each element, rather than treating all inputs equally.