“Confirmed cases” skyrocketed … how many were false positives?

One answer to why there wasn’t a commensurate high spike in deaths.
=============

COVID tests yielding “false positive” results have been hitting the news again.

A couple of weeks ago, Ohio governor Mike DeWine tested positive, missed an event with President Trump and was subsequently re-tested (twice) and found to be negative.

See Ohio Gov. DeWine tests negative … after testing positive.

This week, it was reported that several nursing homes have experienced numerous cases of false positives.

False-positive test results are a particularly significant risk in nursing homes, because a resident wrongly believed to have Covid-19 could be placed in an area dedicated to infected patients, potentially exposing an uninfected person to the coronavirus.

And, there is a growing number of reports that re-opened schools are being shut-down when a single student or faculty member tested positive. Locally, I know of 3 such instances.

Bottom line: false positives are very likely and have significant consequences to patients and institutions.

The IHME estimates that less than 1% of Americans are currently infected.

Given the low prevalence of COVID (i.e. percentage currently infected) … and low but statistically significant testing errors … the likelihood of false positives is very high!

Here’s my logic…

In my strategic business analytics course, I used to teach something called Bayesian Inference … a way to calculate probabilities by combining contextual information (called “base rates” or “priors”) with case-specific observations (think: testing or witnessing).

Today, we’ll apply Bayesian Inference to the COVID testing situation…

================

WARNING: The math may be unsettling to some readers. If it is, either (1) just trust me, gloss over the math and accept my conclusions, or (2) hit delete and have a nice day … none the wiser

To get started, we need a couple pieces of data…

The first is is the accuracy of COVID tests being administered.

Admiral Brett Giroir, in charge of testing for the Feds, says the current tests have a 90% accuracy rate, with errors evenly balanced between false positives and false negatives. Source

Let’s take Adm. Giroir at his word and assume that the tests are 90% accurate.

If the accuracy rate is 90%, what is the likelihood that, if you test positive, you really are infected by the virus?

Most folks confidently answer: 90%.

Good, but not great … and, unfortunately, probably very wrong.

=============

Here’s where Bayesian Inference comes into the picture.

We need another piece of data called a base rate.

That is, given where you live (i.e. the prevalence of COVID in your locale), your behavior (e.g. sheltering-in-place, out working with patient or customer contact, socializing) and your symptomology (i.e. no, mild or severe COVID-like symptoms), what are the chances that somebody with your profile (i.e. a member of your “reference group”) is currently infected with the virus?

This is a tough number to estimate, but we can take a stab at it and then test the sensitivity by trying other values to see what difference they make.

In a prior post, we squeezed the NY antibody test results and estimated that in NY — the hottest of the hot spots — about 4% or 5% of the population was probably infected at any point in recent times.

We’ll use NY’s 5% as our base rate (of virus prevalence among our reference group).

And, we’ll assume that Adm. Giroir’s 90% accuracy claim is correct.

=============

Here are the resulting numbers. Don’t let the table scare you, we’ll walk through the steps.

Note: You might want to click & print the table

=============

Set-up

First step: we plug the known (or assumed) numbers into the a Bayes’ matrix.

Row 1 displays positive test results.
Row 1, Line 1 displays the accuracy rate which is labeled “R,W%” which stands for Right or Wrong %.
For Row 1, Line 1, we plugged in the accuracy rates: 90% of the time an infected person will test positive; 10% of the time they test negative.
Those rates are reversed in Row 2, Line 1 (e.g. 90% of the time a person who is not infected will get an accurate negative test result)
We plugged in the 5% base rate assumption (the yellow box in Column 3)
We arbitrarily assumed that 1,000 people would be in the sample to be tested (the orange box in Column 5).
Technical note: The sample size is completely arbitrary and doesn’t change the answer — it just lets us to do some work in ‘numbers of people’ instead of percentages.

=============

Calculations

Now we can start cranking some numbers …

Start with Row 3 — the total number of people being tested. Since our base rate is 5%, we expect that 50 of the 1,000 people are likely to be infected (based on the prevalence of the virus in our reference group: our locale, our behavior and our symptomology). Conversely, 950 of the 1,000 are likely to be uninfected.
Next let’s work through Column 3 and distribute the 50 infected people into Column 3’s results’ boxes. Since the test is 90% accurate, 45 infected people get accurate positive results (Column 3, Row 1); the other 5 people (10% of 50) get false negatives (Column 3, Row 2).
Then, we do the same for Column 4. Since the test is 90% accurate, 855 uninfected people get accurate negative results (Column 4, Row 2); the other 95 uninfected people (10% of 950) get false positive results (Column 4, Row 1).
Summing Columns 3 and 4, we get 140 people (14% of our 1,000 sample) testing positive and 860 (86%) testing negative.

==============

Analysis

Look at row 2: the people getting negative test results…

The good news is that very few infected people (only 1/2% of the total sample) are slipping through the cracks as false negatives and practically all uninfected people are getting accurately diagnosed.

That’s pretty good, but…

Now, laser in on row 1: the people getting positive test results…

Again, as shown in row 1, 140 people (14%) get positive test results (45 + 95 = 140).

If the test had been 100% accurate, only 50 people would have tested positive (our base rate 5% x 1,000)

Where did the extra positive results come from?

Note in Column 3, Row 1 that only 32.1% of the 140 people getting positive test results are true positives … infected people getting positive test results.

But, as displayed in Column 4, Row 1, 67.9% of are false positives – 95 uninfected people who get positive test results.

Think about that for a moment.

Given a test that the gov’t claims to be about 90% accurate and an assumed viral prevalence rate of 5% (based on a reference group’s locale, behavior and symptomology), roughly 2/3’s of the positive test results are false positives.

============

Takeaway: If our assumptions are correct (5% prevalence, 90% test accuracy) … and if the Bayesian logic is correct … then the number of “confirmed cases” being reported is probably way high.

That would be another partial explanation for the apparent disconnect between the number of cases and resulting number of deaths.

=============
Thanks to Prof. Robin Dillon-Merrill for nudging me to dig into this topic and for linking me to an inspiring source article: The Importance of Prior Probabilities in Coronavirus Testing
=============
For background on testing procedures and accuracy, see: Latest in Coronavirus Testing Methods and COVID-19 tests are far from perfect

This entry was posted on September 16, 2020 at 10:00 am and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

The Homa Files