The answer may surprise you, and it has big implications for test & trace.
=============
In a prior post, we reported that “Asymptomatics” are not rushing to get tested and provided some subjective reasons why that might be (e.g. no doctor referral, high hassle factor, privacy concerns).
OK, let’s up our game a notch or two and throw some math & economics at the problem.
==============
I’m a fan of “Freakonomics” … the popular call sign for a discipline called Behavioral Economics … the study of the rationality that underlies many seemingly irrational decisions that people sometimes make.
And, in my strategic business analytics course, I used to teach something called Bayesian Inference … a way to calculate probabilities by combining contextual information (called “base rates” or “priors”) with case-specific observations (think: testing or witnessing).
Today, we’ll connect Freakonomics and Bayesian Inference and apply them to the COVID testing situation…
================
WARNING: The math may be unsettling to some readers. If it is, either (1) just trust me, gloss over the math and accept my conclusions, or (2) hit delete and have a nice day … none the wiser
To get started, we need a couple pieces of data…
The first is is the accuracy of COVID tests being administered.
Admiral Brett Giroir, in charge of testing for the Feds, says the current tests have a 90% accuracy rate, with errors evenly balanced between false positives and false negatives. Source
Let’s take Adm. Giroir at his word and assume that the tests are 90% accurate.
If the accuracy rate is 90%, what is the likelihood that, if you test positive, you really are infected by the virus?
Most folks confidently answer: 90%.
Good, but not great … and, unfortunately, probably very wrong.
=============
Here’s where Bayesian Inference comes into the picture.
We need another piece of data called a base rate.
That is, given where you live (i.e. the prevalence of COVID in your locale), your behavior (e.g. sheltering-in-place, out working with patient or customer contact, socializing) and your symptomology (i.e. no, mild or severe COVID-like symptoms), what are the chances that somebody with your profile (i.e. a member of your “reference group”) is currently infected with the virus?
This is a tough number to estimate, but we can take a stab at it and then test the sensitivity by trying other values to see what difference they make.
In a prior post, we squeezed the NY antibody test results and estimated that in NY — the hottest of the hot spots — about 4% or 5% of the population was probably infected at any point in recent times.
We’ll assume that we’re asymptomatic, have been sheltering-in-place (i.e. minimal social contacts outside of our homes) and don’t work in a COVID-prevalent environment … and we’ll use NY’s 5% as our base rate (of virus prevalence among our reference group).
We’ll assume that Adm. Giroir’s 90% accuracy claim is correct.
=============
Here are the resulting numbers. Don’t let the table scare you, we’ll walk through the steps.
Note: You might want to click & print the table
=============
Set-up
First step: we plug the known (or assumed) numbers into the a Bayes’ matrix.
- Row 1 displays positive test results.
- Row 1, Line 1 displays the accuracy rate which is labeled “R,W%” which stands for Right or Wrong %.
- For Row 1, Line 1, we plugged in the accuracy rates: 90% of the time an infected person will test positive; 10% of the time they test negative.
- Those rates are reversed in Row 2, Line 1 (e.g. 90% of the time a person who is not infected will get an accurate negative test result)
- We plugged in the 5% base rate assumption (the yellow box in Column 3)
- We arbitrarily assumed that 1,000 people would be in the sample to be tested (the orange box in Column 5).
- Technical note: The sample size is completely arbitrary and doesn’t change the answer — it just lets us to do some work in ‘numbers of people’ instead of percentages.
=============
Calculations
Now we can start cranking some numbers …
- Start with Row 3 — the total number of people being tested. Since our base rate is 5%, we expect that 50 of the 1,000 people are likely to be infected (based on the prevalence of the virus in our reference group: our locale, our behavior and our symptomology). Conversely, 950 of the 1,000 are likely to be uninfected.
- Next let’s work through Column 3 and distribute the 50 infected people into Column 3’s results’ boxes. Since the test is 90% accurate, 45 infected people get accurate positive results (Column 3, Row 1); the other 5 people (10% of 50) get false negatives (Column 3, Row 2).
- Then, we do the same for Column 4. Since the test is 90% accurate, 855 uninfected people get accurate negative results (Column 4, Row 2); the other 95 uninfected people (10% of 950) get false positive results (Column 4, Row 1).
- Summing Columns 3 and 4, we get 140 people (14% of our 1,000 sample) testing positive and 860 (86%) testing negative.
==============
Analysis
Look at row 2: the people getting negative test results…
The good news is that very few infected people (only 1/2% of the total sample) are slipping through the cracks as false negatives and practically all uninfected people are getting accurately diagnosed.
That’s pretty good, but…
Now, laser in on row 1: the people getting positive test results…
Again, as shown in row 1, 140 people (14%) get positive test results (45 + 95 = 140).
If the test had been 100% accurate, only 50 people would have tested positive (our base rate 5% x 1,000)
Where did the extra positive results come from?
Note in Column 3, Row 1 that only 32.1% of the 140 people getting positive test results are true positives … infected people getting positive test results.
But, as displayed in Column 4, Row 1, 67.9% of are false positives – 95 uninfected people who get positive test results.
Think about that for a moment.
Given a test that the gov’t claims to be about 90% accurate and an assumed viral prevalence rate of 5% (based on a reference group’s locale, behavior and symptomology), roughly 2/3’s of the positive test results are false positives.
============
This where Freakonomics comes in…
Except for over-reporting “confirmed cases” and scaring the daylights out of the people getting the false positive test results, this wouldn’t be a big deal if all people testing positive were retested to confirm the diagnosis.
Trust me, if the retest is done by a different medical crew on a different day using a different test kit, it’s statistically unlikely that they would get another false positive test result.
But these days — under the ramping-up of test & trace programs — it’s more likely that these false positive people would be told to self-quarantine, monitored to ensure that they were doing so and asked for their contact lists so that their contacts can be traced and tested.
What’s the rub with that?
If our base rate and test accuracy assumptions are in the ballpark, most of the test & trace efforts will be wild goose chases … resulting in unnecessary (and intrusive) contact racing. That may discourage some asymptomatic people from getting tested.
The simple solution to save money, reduce hassle and encourage testing: before chasing the geese, give the people who test positive a second test to confirm the diagnosis. Then start chasing their traced contacts.
=============
In subsequent posts, we’ll run through more scenarios … interpret their implications … and give you a tool to run scenarios with your own assumptions.
=============
Thanks to Prof. Robin Dillon-Merrill for nudging me to dig into this topic and for linking me to an inspiring source article: The Importance of Prior Probabilities in Coronavirus Testing
=============
For background on testing procedures and accuracy, see: Latest in Coronavirus Testing Methods and COVID-19 tests are far from perfect
Leave a Reply