Infrastructure

Serving, hardware and MLOps.

All AI Fundamentals Machine Learning Deep Learning LLMs & Transformers Generative AI NLP AI Agents Prompting Data & Training Infrastructure Evaluation Safety & Ethics

All A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

I

Inference

Inference is the stage where a trained machine learning model is used to generate predictions or outputs on new, unseen data. In infrastructure contexts, it focuses on efficiently deploying and serving models in production.

Q

Quantization

Quantization is a model optimization technique that lowers the numerical precision of weights and activations, usually converting 32-bit floats to 8-bit integers or similar lower-bit formats.

T

Throughput

Throughput measures how much work an AI system completes in a given time, such as the number of model inferences or training examples processed per second.