2020-04-12

Tested Positive. Do I Have It?

As the antibody tests of SARS-CoV-2 are being carried out, and people start to explain (e.g., on Twitter) what is means to test positive, I want to elaborate on a concept that most elementary probability textbook have mentioned, since people may be overly anxious (or too careless) about a positive result.
The estimate of how many people are positive in your region (prior) affects a lot the estimated chances of you having it, given you tested positive (posterior).

What is known and unknown?

Let’s clarify some terms, those known, unknown, and requires assumption:

$P(+|test+)$ : people actually wants to know this. The probability of I have it, given I test positive. However, this is very commonly confused with:
$P(test+|+)$ : the sensitivity of test. According to the FDA document, Cellex antibody tester has $93.8\%$ sensitivity¹
$P(test-|-)$ : the specificity of the test. Cellex has $96.0\%$ .
$P(+)$ : how much of the population might be affected, or $\frac{N.\ positive}{population}$ . You need to assume a value, in order to compute $P(+|test+)$ from those known values using the Bayes Rule. This is the prior.
P(test+): how many of the tests might be positive. Computing this depends on P(+). More specifically: $P(test+)=P(test+|+)P(+)+P(test+|-)P(-)$

The Bayes Rule

The first time of understanding Bayes Rule requires getting over some intuitions, but once done, things start getting intuitive again.
Bayes Rule starts from the fact that the joint probability could be factorized in two ways:

$P(test+, +)=P(test+|+)P(+)$ $P(+, test+)=P(+|test+)P(test+)$

The left hand sides of the above two equations are the same, so we have:

$P(test+|+)P(+)=P(+|test+)P(test+)$

In other words:

$P(+|test+)=\frac{P(test+|+)P(+)}{P(test+)}$

Now that we can plug in the values to see our likelihood of actually having it given a positive test.

What are the results?

The results depend on the prior assumption. A lot. Here is how.

First scenario
If you assume that there are plenty of SARS-nCoV-2 tests, and that there are not too many asymptomatic carriers — in other words, $P(+)$ is close to $\frac{N. \ test\ positive}{population}$ , which is around $0.1\%$ in US right now ², then:

$P(test+)=0.938\times 0.001 + 0.040\times 0.998 = 0.0409$ $P(+|test+) = \frac{0.938 \times 0.001}{0.0409} = 2.29\%$

Which means you only have slightly more one in fifty chance of actually having it, when your test result is positive.

Second scenario
If you assume that COVID-19 tests are insufficient, and that many people with slight symptoms just stayed at home and recovered. Only those with serious symptoms went to get a test. Let’s say $P(+)$ is somewhat underestimated by $\frac{N. \ test\ positive}{population}$ by a factor of ten. Then let’s assume $P(+)=0.01$ in the country have got COVID-19 or some light-symptom variants, then:

$P(test+)=0.938\times 0.01 + 0.040\times 0.98 = 0.0490$ $P(+|test+) = \frac{0.938 \times 0.01}{0.04898} = 19.15\%$

Which means you only have less than one in five chance of actually having it.

Third scenario
If you think there are way more people who have it than the COVID-19 positive test result shows, for example, by considering that the mortality rate appears overwhelmingly high in some countries (e.g., more than $10\%$ in UK vs. around $1\%$ in Germany). Let’s be maniac and assume $10\%$ of people in this country have it, with most ( $\sim 90\%$ ) of them didn’t even think they need to be tested. In this case, let’s set the prior to be $P(+)=0.10$ , then:

$P(test+)=0.938\times 0.10 + 0.040\times 0.90 = 0.1298$ $P(+|test+) = \frac{0.938 \times 0.10}{0.1298} = 72.27\%$

This is surprisingly high — almost three out of four! You can see how ridiculous conspiracy theories can change your results.

What do the varying numbers mean?
Don’t scare yourself to death by placing conspiracy theories on the priors, please. There are not that many overestimates. Even if all countries underreport the numbers, my intuition is that $\frac{N. \ test\ positive}{population}$ can’t possibly underestimate $P(+)$ by, let’s say, a factor of ten.
In short — I think³ what happened is closer to the first scenario in Canada, Japan, Korea, China except Hubei (a month ago), and most states in US, and closer to the second scenario in Hubei, New York / New Jersey, UK, Spain, and Italy.

How about the diagnosis?

Following a similar line of thought, one might ask if there are such a high chance of “not having it when testing positive” for the COVID-19 virus test (not the antibody).
I tend to believe no. Here’s why.
Since majority of COVID-19 virus tests are done with those who have symptoms, we need to condition everything on the variable “symptom=True”. Therefore, $P(+)$ would be of the scale of let’s say $0.1 \sim 0.5$ , estimated by⁴ $\frac{N.\ test\ positive}{symptomatic}$ . Assuming the sensitivity of COVID RNA tests are high (e.g., $90\%$ ), then the posterior likelihood, “chance of having it when tested true”, would not differ much from the test sensitivity.

Conclusion

A good estimate of the prior probability in your region is essential for an accurate $P(+|test+)$ value. Dependent on the regional situation, we should neither be overly anxious, nor too careless about the testing results.

Footnotes

1.These are positive percent agreement actually -- I'm saying sensitivity for the purpose of this blog. ↩
2.Data source: worldometers ↩
3.I am not healthcare professional. I study computer science (AI), and have taken probability & stats courses. All of my source of analysis come from public source. If you want medical advice, ask a healthcare professional, please. ↩
4.Data source: 1point3acres ↩