An Initiate of the Bayesian Conspiracy

An Intuitive Explanation of Bayesian Reasoning is an extraordinary piece on Bayes' theorem that starts with this simple puzzle:


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2007/04/an-initiate-of-the-bayesian-conspiracy.html

Given no google, comments, or reading the article, I turned to the next best thing: My MATH Probability 3310 class notes. In my second test review sheet (written by me), I found this amazing gem:

Bayes Formula: P(E) = P(EF) + P(EF^c)

Given that I don’t know what the letter ‘c’ denotes, I came up with the answer of 125.2%. This is why a Computer Science degree is important for any good programmer. And why I need to learn to take better notes.

IMAD (I’m not a doctor) rather a math teacher - and actually presented Bayes’ formula the other day, so it wasn’t too hard to get the 7,76%.

Inspired by Alan Crowe here is a little Scheme program that simulates a number of trials. After 1000000 trials I got 0.077307 as the result.

(define (experiment)
(if (= (random) 0.01)
; breast cancer
(if (= (random) 0.80)
'cancer-pos
’cancer-neg)
; well
(if (= (random) 0.096)
'well-pos
’well-neg)))

(define (trials n)
(if (= n 0)
0
(let ([result (experiment)])
(case result
[(cancer-pos well-pos)
(case result
[(cancer-pos) (+ 1 (trials (sub1 n)))]
[(well-pos) (trials (sub1 n))])]
[else
(trials n)]))))

(let ([m 1000000])
(/ (trials m)
(* 1.0 m)))

No, I have no idea how Bayes’ Rule works, so here’s a go with highschool math and common sense:

Of routinely screened women:
99% dont have it, BUT 99 * .096 = 9.504% will test positive anyways
1% do have it, BUT ONLY 1 * .8 = .8% will test positive

So, of those routinely screened women:
.8 / (9.504 + .8) = .0776
There’s a 7.76% chance that she’s actually got it. Less than 10% true positive! Yikes!
This seems way low. Am I wrong to take the 1% occurrance rate into account?

7.76%

From 10±year-old memories:

Draw up a 2x2 table. The columns represent testing positive and testing negative; the rows represent having cancer and not. We will work with probabilities in each cell, although using cardinalities from a population of, say, 10000 will yield whole numbers throughout, which some may find easier.

Statement 1 tells us that the row totals are in the proportion of 1:99.
Statement 2 tells us that in row 1, the columns are in the proportion of 80:20.
We know that row 1 sums to 0.01, so row 1 is 0.008 | 0.002.
Statement 3 tells us that in row 2, the columns are in the proportion of 9.6:(100-9.6).
We know that row 2 sums to 0.99, so row 2 is 0.09504 | 0.8946

We have the complete table. And the figure we are asked for is, given we are in column 1, what is the probability we are in row 1. This is (row 1, column 1) / [(row 1, column 1) + (row 2, column 1)] (*), or

0.008 / (0.008 + 0.09504)
= 0.008 / 0.10304
= 25/322 ~= 7.76 %

(*) This is Bayes Theorem, right? P(A | B) = P(A B) / P(B) ?

I made it the correct 7.8% without reading the solution, but I did have to lookup the bayes forumla. Still, it isn’t that tough.

Got the reasoning right, then had trouble multiplying numbers on paper.

As you can guess, I have a degree in math, and working on another in stats…

P(True Positive)/ (P(True Positive)+P(False Positive))
0.8 / (0.8 + 0.096)
= 89.3%

Of course the calculation is of no importance (as always people here are distracted by the examples given =P ) - the important finding to me is that general public has a hard time dealing with set, logic, probability, or other non-arithmetic maths.

I forget how many times I convince people that all types of games in casino have negative expected value.

Keith, your reasoning is wrong - 9.6% of women without breast cancer get a positive result anyway. That does not imply that 90.4% of positives are correct, but rather that 90.4% of women without breast cancer get a negative - very different things.

Kevin, your reasoning is wrong, since the chance of a true positive given a random person isn’t 0.8, but rather 0.010.8, and the chance of a false positive is 0.990.096.

The end result then is (0.010.8) / (0.010.08 + 0.99*0.0.96) =~ 0.078. A woman with a positive test result has only a 7.8% chance of actually having breast cancer.

Okay, now I’ve read the article, and it turns out I got it about right (7.76%)… although the article really is painfully slow. Given a nice separation of the relevant facts, like in the stated problem, this should be pretty straightforward.

So, if 15% of doctors get the correct answer, 99% of people don’t believe that doctor when given the (correct) extraordinarily low figure, and 99% of people DO believe the doctor when he gives the wrong (much higher) figure, how likely are you as a patient to understand the correct information concerning your condition when you get a positive result on your test? This doesn’t bode well…

I tentatively suggest 7.76% (or a shade over).
I did study this at university, but it was so long ago that I think the Rev. Bayes took the lecture himself. I definitely remember a forumula along the lines of David’s, and I too have no idea what “c” denotes. However, I worked it out as follows:

1% of women have breast cancer
of these, 80% will have positive tests
= 0.8%

99% of women do not have breast cancer
of these, 9.6% will have positive tests
= 9.504%

So:
(adding these mutually exclusive groups): 10.304% of women will have positive tests.

of these 10.304%, the 0.8% group are the ones with breast cancer,
so if you are in the 10.304% who received the positive test, your probability of having cancer is
088%/10.304% =~ 7.76%

I humbly await correction.

I eyeballed this in about 15 seconds.

1000 patients. 1% cancer rate = 10 have cancer, 990 don’t.

Of the 10, 8 test positive, 2 don’t.
Of the 990, ~10% false positive rate means ~90 test positive, ~900 don’t.

Of those testing positive, 8 / ~90 gives a chance just shy of %10.

Your’re all wrong. If the original group is 10k women, 1030.4 will test positive, and 100 will have cancer, so the answer is about 1 in 10.

I got 0.8333 but I was working without paper and dropped a clanger.

Reasoning went:

Say there’s 1000 women. 10 have breast cancer, so 8 have positive test results.

990 of them don’t have breast cancer, so 990 * 0.096 = 95 have positive tests results.

Then, like an idiot, I divided 8 by 95 and got .083 instead of dividing 8 by 103 and getting 0.077, so out of a 1000 women who get a positive result from a breast cancer screening test, 77 of them, on average, actually have breast cancer.

I’m pretty confident that, if the original problem had been couched in whole number terms, more people would get the right answer. Probabilities expressed as percentages can obscure at least as much as they reveal, at least, in matters of public health like breast cancer screening.

Yep, 7.76%. On reading the first wording of the question I couldn’t really figure it out, but then as soon as I read the words ‘10 out of 1000 women…’ it clicked - I won’t repeat the working above.

I find it really surprising how many intelligent people have trouble with probability questions (stuff like this, and the Monty Hall/game show problem), even after sitting down with pen and paper. They just refuse to believe numerical reasoning over their own intuition.

It is frightening that only 15% of doctors get this right. But what does “get this right” mean? One can just look at two statements - 1% of women have cancer, and 9.8% of results are positive - and instantly see that fewer than 1 in 10 of these test results are a valid positive, without doing any real math. For a doctor, this is probably enough, but EVERY doctor needs to see this right away.

AIDS is another example of this phenomenon. A false positive AIDS results occurs up to 0.0007% of the time (7 in 1,000,000). Given 23,000 heterosexual AIDS cases in the US, a purely heterosexual non-drug using male with a positive test has a 1 in 20 chance of being AIDS free. But this number falls depending on the person’s Bayesian priors - has he been a monk for the past 40 years? Has he visited prostitutes? Is he white? Has he been in prison?

Remco: You’ve proved to me that I shouldn’t read this blog and do math during office hour. Now just read the articles and thanks for correcting me!

Interesting blog, as always.

This is just high school data management. let B represent having breast cancer, !B not having breast cancer, P testing positive, !P testing negative. (I filled in the known values and since each set of branches must = 1, the other branch is just 1-possibility. for example you know that the chance of having breast cancer is 0.01 (B), so the chance of not having it is just 1-0.01=0.99(!B)
Now to just draw a tree diagram with each level representing a different stage(Sorry for the bad ascii art):
/
/
/
/
0.01/ \0.99
B !B
0.8/ \0.2 .096/ \0.904
/ \ /
P !P P !P

Since probabilities are: possibility

Chris Moorhouse: Am I wrong to take the 1% occurrance rate into account?

I think the important question is: “Where do you get a 1% occurance rate from?”

The only 1% was for number of people who get tested…