“Confirmed cases” are skyrocketing … how many are false positives?

The answer is likely to surprise you!
=============

In my strategic business analytics course, I used to teach something called Bayesian Inference … a way to calculate probabilities by combining contextual information (called “base rates” or “priors”) with case-specific observations (think: testing or witnessing).

Today, we’ll apply Bayesian Inference to the COVID testing situation…

================

WARNING: The math may be unsettling to some readers. If it is, either (1) just trust me, gloss over the math and accept my conclusions, or (2) hit delete and have a nice day … none the wiser

To get started, we need a couple pieces of data…

The first is is the accuracy of COVID tests being administered.

Admiral Brett Giroir, in charge of testing for the Feds, says the current tests have a 90% accuracy rate, with errors evenly balanced between false positives and false negatives. Source

Let’s take Adm. Giroir at his word and assume that the tests are 90% accurate.

If the accuracy rate is 90%, what is the likelihood that, if you test positive, you really are infected by the virus?

Most folks confidently answer: 90%.

Good, but not great … and, unfortunately, probably very wrong.

=============

Here’s where Bayesian Inference comes into the picture.

We need another piece of data called a base rate.

That is, given where you live (i.e. the prevalence of COVID in your locale), your behavior (e.g. sheltering-in-place, out working with patient or customer contact, socializing) and your symptomology (i.e. no, mild or severe COVID-like symptoms), what are the chances that somebody with your profile (i.e. a member of your “reference group”) is currently infected with the virus?

This is a tough number to estimate, but we can take a stab at it and then test the sensitivity by trying other values to see what difference they make.

In a prior post, we squeezed the NY antibody test results and estimated that in NY — the hottest of the hot spots — about 4% or 5% of the population was probably infected at any point in recent times.

We’ll use NY’s 5% as our base rate (of virus prevalence among our reference group).

And, we’ll assume that Adm. Giroir’s 90% accuracy claim is correct.

=============

Here are the resulting numbers. Don’t let the table scare you, we’ll walk through the steps.

Note: You might want to click & print the table

=============

Set-up

First step: we plug the known (or assumed) numbers into the a Bayes’ matrix.

Row 1 displays positive test results.
Row 1, Line 1 displays the accuracy rate which is labeled “R,W%” which stands for Right or Wrong %.
For Row 1, Line 1, we plugged in the accuracy rates: 90% of the time an infected person will test positive; 10% of the time they test negative.
Those rates are reversed in Row 2, Line 1 (e.g. 90% of the time a person who is not infected will get an accurate negative test result)
We plugged in the 5% base rate assumption (the yellow box in Column 3)
We arbitrarily assumed that 1,000 people would be in the sample to be tested (the orange box in Column 5).
Technical note: The sample size is completely arbitrary and doesn’t change the answer — it just lets us to do some work in ‘numbers of people’ instead of percentages.

=============

Calculations

Now we can start cranking some numbers …

Start with Row 3 — the total number of people being tested. Since our base rate is 5%, we expect that 50 of the 1,000 people are likely to be infected (based on the prevalence of the virus in our reference group: our locale, our behavior and our symptomology). Conversely, 950 of the 1,000 are likely to be uninfected.
Next let’s work through Column 3 and distribute the 50 infected people into Column 3’s results’ boxes. Since the test is 90% accurate, 45 infected people get accurate positive results (Column 3, Row 1); the other 5 people (10% of 50) get false negatives (Column 3, Row 2).
Then, we do the same for Column 4. Since the test is 90% accurate, 855 uninfected people get accurate negative results (Column 4, Row 2); the other 95 uninfected people (10% of 950) get false positive results (Column 4, Row 1).
Summing Columns 3 and 4, we get 140 people (14% of our 1,000 sample) testing positive and 860 (86%) testing negative.

==============

Analysis

Look at row 2: the people getting negative test results…

The good news is that very few infected people (only 1/2% of the total sample) are slipping through the cracks as false negatives and practically all uninfected people are getting accurately diagnosed.

That’s pretty good, but…

Now, laser in on row 1: the people getting positive test results…

Again, as shown in row 1, 140 people (14%) get positive test results (45 + 95 = 140).

If the test had been 100% accurate, only 50 people would have tested positive (our base rate 5% x 1,000)

Where did the extra positive results come from?

Note in Column 3, Row 1 that only 32.1% of the 140 people getting positive test results are true positives … infected people getting positive test results.

But, as displayed in Column 4, Row 1, 67.9% of are false positives – 95 uninfected people who get positive test results.

Think about that for a moment.

Given a test that the gov’t claims to be about 90% accurate and an assumed viral prevalence rate of 5% (based on a reference group’s locale, behavior and symptomology), roughly 2/3’s of the positive test results are false positives.

============

Takeaway: If our assumptions are correct (5% prevalence, 90% test accuracy) … and if the Bayesian logic is correct … then the number of “confirmed cases” being reported is probably way high.

That would be another partial explanation for the apparent disconnect between the number of cases and resulting number of deaths.

=============
Thanks to Prof. Robin Dillon-Merrill for nudging me to dig into this topic and for linking me to an inspiring source article: The Importance of Prior Probabilities in Coronavirus Testing
=============
For background on testing procedures and accuracy, see: Latest in Coronavirus Testing Methods and COVID-19 tests are far from perfect

This entry was posted on July 28, 2020 at 9:35 am and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

The Homa Files