When should accuracy not be used?

Avoid relying on accuracy alone with highly imbalanced classes, such as fraud detection where 99% of transactions are legitimate.

How is accuracy different from precision?

Accuracy counts all correct predictions, while precision focuses only on how many positive predictions were actually correct.

What is Accuracy?

Accuracy measures the proportion of correct predictions made by a machine learning model out of all predictions. It is calculated as the number of correct predictions divided by the total number of predictions.

Accuracy is a simple evaluation metric for classification tasks. It counts true positives and true negatives, then divides by the total instances to give a percentage of correct outcomes.

It assumes equal importance for all errors and works best on balanced datasets. On imbalanced data it can be misleading because a model can achieve high accuracy by always predicting the majority class.

Accuracy is often reported alongside other metrics such as precision, recall, and F1 score to give a fuller picture of model performance.

Example

A model that classifies 95 out of 100 images correctly as cat or dog has 95% accuracy. If the dataset contains 90 dogs and only 10 cats, the same score could hide poor performance on cats.

Why it matters

Accuracy remains the most widely reported baseline metric for comparing models and communicating results to non-experts. It is easy to understand yet must be interpreted carefully in real-world applications with class imbalance.

Frequently asked questions

It depends on the problem; 90%+ is often strong for balanced data, but random guessing on a balanced binary task already gives 50%.

Related terms

Precision

Precision is an evaluation metric for classification models that measures the proportion of true positive predictions among all positive predictions made.

Recall

Recall is an evaluation metric that measures the proportion of actual positive cases a model correctly identifies. It shows how well the model finds all relevant instances in the data.

F1 Score

The F1 Score is a single metric that balances precision and recall to evaluate how well a classification model performs, especially when classes are uneven.

Confusion Matrix

A confusion matrix is a table that shows how well a classification model performs by comparing its predictions to the actual labels.

Benchmark

A benchmark is a standardized dataset and task used to measure and compare how well different AI models perform.

BLEU Score

BLEU Score is an automatic metric that evaluates machine-generated text quality, mainly for machine translation, by measuring overlap with human-written reference translations.

What is Accuracy?

Example

Why it matters

Frequently asked questions

What is a good accuracy value?

When should accuracy not be used?

How is accuracy different from precision?

Related terms

What is Accuracy?

Example

Why it matters

Frequently asked questions

What is a good accuracy value?

When should accuracy not be used?

How is accuracy different from precision?

Related terms