How are FLOPs different from parameters?

Parameters are the learned weights stored in the model; FLOPs count the arithmetic operations executed when the model runs.

Do fewer FLOPs always mean faster inference?

Not always—memory bandwidth, batch size, and hardware optimizations can affect actual speed more than raw FLOPs.

What is FLOPs?

FLOPs stands for floating-point operations and counts the total number of arithmetic calculations (additions, multiplications) a neural network performs during a forward or backward pass.

Each layer in a model performs many floating-point math operations on tensors. Summing these operations across the entire network gives the model's total FLOPs, a hardware-independent measure of computational work.

FLOPs differ from FLOPS (operations per second), which measures hardware speed. In AI, practitioners usually report total FLOPs to compare model complexity rather than runtime speed.

Lower FLOPs generally imply faster inference and lower energy use, but real-world speed also depends on memory access, parallelism, and hardware optimizations.

Example

ResNet-50 needs roughly 4 billion FLOPs to classify one 224×224 image, while the lighter MobileNetV3 needs only about 200 million FLOPs for similar accuracy on the same task.

Why it matters

FLOPs guide model selection for deployment: edge devices favor low-FLOP models, while cloud training budgets are planned around total FLOPs across large datasets.

Frequently asked questions

Floating-point operations—the basic decimal-number calculations performed by a model.

Related terms

Parameters

Parameters, also called weights, are the internal numerical values in a machine learning model that are adjusted during training. They determine how the model processes inputs to generate predictions or outputs.

Inference

Inference is the stage where a trained machine learning model is used to generate predictions or outputs on new, unseen data. In infrastructure contexts, it focuses on efficiently deploying and serving models in production.

Latency

Latency is the time delay between sending input to an AI system and receiving its output. In infrastructure, it measures how quickly a model processes a request and returns results.

Throughput

Throughput measures how much work an AI system completes in a given time, such as the number of model inferences or training examples processed per second.

GPU

A GPU (Graphics Processing Unit) is a specialized processor with thousands of small cores optimized for parallel computations, widely used to speed up AI and machine learning workloads.

API

An API (Application Programming Interface) is a standardized set of rules that lets software applications request services or data from each other. In AI infrastructure, it typically means exposing machine learning models as callable endpoints for inference or training.