Base Rate Fallacy¶

Summary¶

The cognitive bias of ignoring base rate information (prior probability) when evaluating the likelihood of an event, focusing instead on specific evidence or stereotypical information.

Classic Examples¶

Steve the Librarian/Farmer¶

Given a description of Steve as "meek and tidy," most people say he's more likely to be a librarian — ignoring the 20:1 farmer-to-librarian ratio in the population.

Correct reasoning: Even if a librarian is 4× more likely to fit the description, the base rate means a person fitting it is only 16.7% likely to be a librarian.

Breast Cancer Screening¶

Base rate: 1% of women have breast cancer
Test accuracy: 90% detection, 3% false positive
After positive test: 25% chance of cancer (not 90%)
75% of positive results are false positives

The base rate (1%) is crucial — a rare condition means most positive tests will be false positives even with an accurate test.

Why People Fall for It¶

Representativeness heuristic — People judge likelihood by how well something matches a stereotype
Specific information feels more relevant — A vivid description seems more diagnostic than dry statistics
Base rates feel abstract — "1 in 100" is harder to grasp than "Steve is shy"

How to Avoid It¶

Use representative samples — "Out of 100 people like this..." drops errors from 85% to 0% (Kahneman & Tversky)
Think in frequencies, not percentages — "4 out of 24" is more intuitive than "16.7%"
Always ask: "How common is this in the general population?"

In AI and Machine Learning¶

Model evaluation: A 99% accurate disease detector is terrible if the disease prevalence is 0.01%
Spam filtering: Even a good spam filter produces false positives for rare types of legitimate email
Security screening: Rare threats produce overwhelming false positive rates