What is GPU?
A GPU (Graphics Processing Unit) is a specialized processor with thousands of small cores optimized for parallel computations, widely used to speed up AI and machine learning workloads.
Unlike a CPU that handles a few complex tasks sequentially, a GPU excels at performing many simple calculations at the same time. This architecture makes it especially efficient for the matrix multiplications and tensor operations common in neural network training and inference.
Modern AI frameworks such as PyTorch and TensorFlow automatically offload heavy computations to GPUs when available, using libraries like CUDA or ROCm to manage data transfer between CPU memory and GPU memory.
GPUs can be found in data-center servers, cloud instances, and consumer graphics cards, allowing both researchers and practitioners to iterate on models far faster than with CPUs alone.
Example
A researcher training a ResNet image classifier on millions of photos can finish an epoch in minutes on a single GPU instead of hours on a CPU, dramatically shortening the overall experiment cycle.
Why it matters
GPUs have become the default compute engine for deep learning, enabling the rapid scaling of model size and dataset size that drives today's AI progress.
Frequently asked questions
Graphics Processing Unit, originally built for rendering images but now essential for AI computations.
Related terms
A TPU (Tensor Processing Unit) is a custom chip designed by Google to accelerate machine learning workloads, especially matrix multiplications used in neural networks.
CUDA is NVIDIA's platform and programming model that lets developers run general-purpose computations on NVIDIA GPUs instead of just CPUs.
An API (Application Programming Interface) is a standardized set of rules that lets software applications request services or data from each other. In AI infrastructure, it typically means exposing machine learning models as callable endpoints for inference or training.
Knowledge distillation is a technique that transfers knowledge from a large, complex 'teacher' model to a smaller 'student' model so the student can achieve similar performance with far less compute and memory.
Edge AI runs AI models directly on local devices such as phones, cameras, or sensors instead of sending data to remote cloud servers.
In AI/ML infrastructure, an endpoint is a deployed URL or network address that exposes a trained model so applications can send data and receive predictions via API calls.