AI Glossary
Clear, beginner-friendly definitions of 65 AI & machine-learning terms — from LLMs and transformers to RAG, fine-tuning and agents. Kept current by our agents.
A
An activation function is a mathematical operation applied to the output of a neuron in a neural network that decides whether the neuron should 'fire' and pass on a signal.
Adam (Adaptive Moment Estimation) is a popular optimization algorithm used to train machine learning models by iteratively updating parameters based on gradients.
Agent memory is the component in AI agents that stores and retrieves information from past interactions, enabling recall of context, facts, or experiences to inform future actions.
An autoencoder is a neural network that learns to compress input data into a smaller representation and then reconstruct the original data from that compressed form.
B
Backpropagation is an algorithm for training neural networks by calculating how much each weight contributed to the prediction error and adjusting those weights accordingly. It uses the chain rule to efficiently compute gradients of the loss function.
Batch size is the number of training examples processed together in a single forward and backward pass during model training.
In AI ethics, bias refers to systematic prejudices or errors in machine learning systems that produce unfair or discriminatory outcomes for particular groups of people.
C
Chain-of-Thought (CoT) is a prompting technique that asks an AI model to generate intermediate reasoning steps before giving a final answer, helping it solve complex problems more reliably.
Classification is a supervised machine learning task that assigns input data to one of several predefined categories or classes based on patterns learned from labeled training examples.
Clustering is an unsupervised machine learning technique that automatically groups similar data points together into clusters based on their features, without using any labeled examples.
A Convolutional Neural Network (CNN) is a specialized type of deep neural network designed to process grid-like data such as images by automatically learning spatial patterns and features.
D
Data augmentation is a technique that artificially increases the size and diversity of a training dataset by creating modified versions of existing data samples.
A dataset is a structured collection of data points used to train, validate, or test machine learning models.
In deep learning, a decoder is a neural network module that converts an encoded representation (like a context vector or latent features) into a final output such as text, images, or sequences.
A diffusion model is a generative AI technique that creates new data like images by learning to reverse a gradual noising process applied to training examples.
E
An embedding (or vector embedding) is a way to represent words, sentences, or other data as dense numerical vectors in a high-dimensional space so that similar items end up close together.
An epoch is one complete pass of a machine learning model through the entire training dataset during training.
F
In AI and machine learning, a feature is an individual measurable piece of data that serves as an input variable for a model.
Feature engineering is the process of transforming raw data into meaningful input variables (features) that help machine learning models learn patterns more effectively.
Few-shot learning (in prompting) is a technique where a language model is given a handful of input-output examples directly in the prompt to guide it on a new task.
Fine-tuning is the process of taking a pre-trained AI model and continuing its training on a smaller, task-specific dataset to adapt it for a particular use case.
G
A Generative Adversarial Network (GAN) is a machine learning model made of two neural networks that compete against each other to generate realistic new data, such as images or text.
Gradient descent is an optimization algorithm that finds the minimum of a function by repeatedly moving in the direction of the steepest downward slope. In machine learning it is used to minimize a model's error by adjusting parameters step by step.
Greedy decoding is a text generation strategy in NLP where, at each step, the model selects the single token with the highest probability as the next output.
Guardrails are rules, filters, and constraints added to AI systems to keep their outputs safe, ethical, and within acceptable boundaries.
H
In LLMs, hallucination is when the model generates fluent, confident text that is factually incorrect, fabricated, or not supported by its training data.
A hyperparameter is a value or setting chosen by the user before training a machine learning model that controls the learning process itself.
I
L
In machine learning, a label is the known correct output or category assigned to a training data example that a model learns to predict.
The learning rate is a hyperparameter that controls the size of the steps an optimization algorithm takes when updating a model's parameters during training.
Logistic Regression is a supervised machine learning algorithm used for binary classification that estimates the probability an input belongs to a particular class.
Long Short-Term Memory (LSTM) is a type of recurrent neural network architecture designed to learn and retain information over long sequences of data.
A loss function quantifies how far a model's predictions are from the true values, serving as the objective that training tries to minimize.
M
N
O
Optical Character Recognition (OCR) is a technology that converts images of printed or handwritten text, such as scanned documents or photos, into machine-readable and editable digital text.
Overfitting happens when a machine learning model learns the training data too closely, including its noise and quirks, so it fails to perform well on new, unseen data.
P
PEFT (Parameter-Efficient Fine-Tuning) is a family of techniques that adapt large pre-trained models to new tasks by updating or adding only a tiny fraction of parameters instead of retraining the entire model.
Pretraining is the first stage of training an AI model on a very large, general dataset so it learns broad patterns and representations before being adapted to specific tasks.
A prompt is the input text, question, or instruction given to an AI model (especially a large language model) to guide what it should generate or how it should respond.
Prompt engineering is the practice of designing and refining text inputs (prompts) to guide AI models like large language models toward producing accurate, relevant, or creative outputs.
Q
R
A Recurrent Neural Network (RNN) is a type of neural network built to handle sequential data by passing information from one step to the next through a hidden state that acts like a memory.
Regression is a supervised machine learning method that predicts continuous numerical values from input features.
Regularization is a set of techniques in machine learning that reduce overfitting by adding a penalty term to the model's loss function, discouraging overly complex or large parameter values.
Reinforcement Learning (RL) is a machine learning method where an agent learns to make sequential decisions by interacting with an environment, receiving rewards or penalties, and aiming to maximize its long-term reward.
Reinforcement Learning from Human Feedback (RLHF) is a training technique that improves AI models by using human preferences to guide the learning process instead of relying only on fixed rewards.
Retrieval-Augmented Generation (RAG) is a technique that improves large language models by retrieving relevant external information before generating a response.
S
Self-supervised learning is a machine learning method where a model creates its own training labels directly from the input data, without needing human annotations.
Semi-supervised learning is a machine learning approach that combines a small amount of labeled data with a large amount of unlabeled data to train models more effectively than using either alone.
Supervised learning is a machine learning method where a model is trained on data that already has correct answers attached, so it can learn to predict those answers for new data.
Synthetic data is artificially generated information designed to mimic the statistical properties of real-world data, created by algorithms rather than collected from actual events or observations.
A system prompt is the initial set of instructions given to an AI model that defines its overall behavior, role, rules, and tone for the conversation.
T
Temperature is a parameter in large language models that controls the randomness of generated text. Lower values produce more focused and deterministic outputs, while higher values increase creativity and variability.
Throughput measures how much work an AI system completes in a given time, such as the number of model inferences or training examples processed per second.
Tool Use (aka Function Calling) lets AI agents call external tools, APIs, or functions by outputting structured requests instead of just text.
Top-p sampling (nucleus sampling) is a text-generation technique that dynamically selects the smallest set of most likely next tokens whose combined probability exceeds a threshold p (e.g. 0.9), then samples from that set.
Training is the process of feeding data into a machine learning model so it can learn patterns and adjust its internal parameters to make accurate predictions.
Training data is the dataset of examples that a machine learning model learns from during the training process. It contains input features paired with known outputs so the model can discover patterns.
Transfer learning is a machine learning method that reuses a model trained on one task as the starting point for a different but related task.
U
Underfitting happens when a machine learning model is too simple to capture the patterns in the training data, leading to poor performance on both training and unseen data.
Unsupervised learning is a machine learning method that trains models on unlabeled data to find hidden patterns, structures, or relationships without any guidance on correct outputs.
V
A Variational Autoencoder (VAE) is a neural network that learns a compressed probabilistic representation of data and can generate new similar examples by sampling from that space. It combines autoencoders with variational inference to enable both reconstruction and generation.
A vector database is a specialized database designed to store and query high-dimensional vector embeddings, enabling fast similarity searches instead of traditional exact-match queries.