What is Precision?
Precision is an evaluation metric for classification models that measures the proportion of true positive predictions among all positive predictions made.
It is calculated as true positives divided by the sum of true positives and false positives. This focuses only on the cases the model labeled as positive.
A high precision score means the model makes few false positive errors when predicting the positive class. It does not consider how many actual positives were missed.
Precision is commonly paired with recall because improving one often reduces the other; the balance between them is summarized by the F1 score.
Example
In a spam filter, if the model flags 100 emails as spam and 90 of them are actually spam, precision is 0.90. This shows that 90% of its spam predictions were correct.
Why it matters
Precision is critical in applications where false alarms are costly or disruptive, such as fraud detection or medical screening, helping practitioners control the reliability of positive alerts.
Frequently asked questions
Precision measures correctness among predicted positives while recall measures how many actual positives were found.
Related terms
Recall is an evaluation metric that measures the proportion of actual positive cases a model correctly identifies. It shows how well the model finds all relevant instances in the data.
The F1 Score is a single metric that balances precision and recall to evaluate how well a classification model performs, especially when classes are uneven.
Accuracy measures the proportion of correct predictions made by a machine learning model out of all predictions. It is calculated as the number of correct predictions divided by the total number of predictions.
A confusion matrix is a table that shows how well a classification model performs by comparing its predictions to the actual labels.
A benchmark is a standardized dataset and task used to measure and compare how well different AI models perform.
BLEU Score is an automatic metric that evaluates machine-generated text quality, mainly for machine translation, by measuring overlap with human-written reference translations.