What is Recall?
Recall is an evaluation metric that measures the proportion of actual positive cases a model correctly identifies. It shows how well the model finds all relevant instances in the data.
Recall is calculated as true positives divided by the sum of true positives and false negatives. It focuses on minimizing missed positives rather than avoiding false alarms.
In binary classification it equals the true positive rate or sensitivity. High recall means few actual positives are overlooked, which matters when missing a case carries high cost.
Recall often trades off against precision; improving one can lower the other, so they are commonly balanced using the F1 score.
Example
In a medical test for cancer, recall tells what fraction of patients who actually have cancer were correctly flagged by the model. A recall of 0.95 means 95 percent of real cancer cases were detected.
Why it matters
Recall is critical in high-stakes domains such as disease screening and fraud detection where failing to catch a positive instance can have serious consequences.
Frequently asked questions
Recall measures how many actual positives were found; precision measures how many predicted positives were actually correct.
Related terms
Precision is an evaluation metric for classification models that measures the proportion of true positive predictions among all positive predictions made.
The F1 Score is a single metric that balances precision and recall to evaluate how well a classification model performs, especially when classes are uneven.
Accuracy measures the proportion of correct predictions made by a machine learning model out of all predictions. It is calculated as the number of correct predictions divided by the total number of predictions.
A confusion matrix is a table that shows how well a classification model performs by comparing its predictions to the actual labels.
A benchmark is a standardized dataset and task used to measure and compare how well different AI models perform.
BLEU Score is an automatic metric that evaluates machine-generated text quality, mainly for machine translation, by measuring overlap with human-written reference translations.