Our title, which is indistinguishable from a flood of others^{1}, might read, “Reading Articles About The Misuse Of Statistics Increases Risk Of Apoplexy.”

Yes, for every article you read like this one, your risk of becoming apoplectic over the improper use of statistics increases 2.0 times.

What does that “2.0-fold increase in risk” mean? Not just for this finding, but for any which reports results in the form of “increased risk” of suffering from a malady after being exposed to some “risk factor.” In this study, “exposure” is reading this blog, which is the risk factor, and “non-exposure” is not reading.

Suppose (somehow) you knew the probability of developing the malady given you were not exposed to the “risk factor.” Call it prob_{not exposed}. You also have to know (somehow) the probability of developing the malady given you were exposed; called prob_{exposed}. Relative risk is

RR = prob_{exposed} / prob_{not exposed}.

You could also calculated the odds ratio. First know that odds are a one-to-one function of probability, *viz*:

Odds = prob / (1 – prob).

The odds ratio is like the risk ratio, but the ratio of the odds, not probabilities:

OR = odds_{exposed} / odds_{not exposed}.

Now suppose that prob_{not exposed} = 0.000001, which is a one in a million chance of developing the malady given you were not exposed. If you then hear that being exposed “increases the risk by 2.0 fold”, then this means the risk ratio must be 2.0. Back solving gives the probability of developing the malady after exposure as 0.000002. (Similar calculations can be done for odds ratios.)

In this case, exposure drove your risk from one in a million to just 2 in a million. We can already see that presenting results in raw probability will not be as pulse pounding as speaking in terms of risk or odds. Information is also lost in giving the risk ratio: the customer has no idea what the risk is in the control group. So one fix would be to give emphasis to the actual probabilities of suffering, and not just the risk ratio.

But even if that is done, something would still be wrong. Can you spot what?

For the apoplexy finding, we do not *know* what the probability of apoplexy is for this blog’s readers. Nor do we *know* what the probability of apoplexy is for non-readers. Therefore, we cannot know the risk. We can, through statistical formula, estimate it. But that estimate will exaggerate the true risk.

For example, suppose that we witnessed 18 cases of sputtering apoplexy in 40 readers of this blog, but we only found it only 9 times in 40 non-readers. That gives an estimated “statistically significant” risk ratio of 2.0. But this exaggerates risk in the following sense.

Now, we can guess prob_{exposed} is about 0.45 and prob_{not exposed} is about 0.225 for *new* groups of readers and non-readers “similar” to the ones sampled here. (Incidentally, those probabilities and that RR are, however, *exact* for *these* 80 readers and non-readers.)

We are not interested in these folks anymore, but in *new* ones. That is the point, after all, of doing this study. The actual probability^{2} that the next, new blog readers develops apoplexy is 0.452, which is close to, but just over, the raw estimate of 0.45. And the actual probability that the next, new non-readers develops apoplexy is 0.232, which is also higher than the raw estimate of 0.225. This puts the actual risk ratio at 1.95, which is *under* the raw estimate of 2.0.

Not a huge difference in this fictional example, to be sure, but the difference between the raw and actual difference will always be in the direction of exaggerating the risk. Taken over the tens of thousands of studies reporting risk, the overall effect is large.

The reason these differences exists is because the traditional method reports parameter estimates, and not actual probabilities or actual risk ratios. Parameters are the internal, unobservable parts of the probability models which are used to quantify uncertainty in the data. They are also the focus of nearly all statistical methods (because of inertia, custom, and lack of knowledge of alternatives).

Reporting in terms of actual observables not only gives a true impression of the probabilities and risks, but allows us to answer more complicated questions about the data and to provide richer information. For example, reporting on observables we can picture the probability that each of 0, 1, 2, …, 40 new readers/non-readers develop apoplexy. That’s done in the picture.

This kind of picture is extraordinarily important because it will give superior estimates for cost and benefit analyses, which are guaranteed to be exaggerated using parameter-based methods.

——————————————————————————-

^{1}Search for the terms “increase(s)(ed) risk of”; millions of hits.

^{2}See the “modern stats” at this link for how to calculate these. The actual probabilities will always move closer to 0.5 than the raw parameter estimates.

25 September 2011 at 12:43 pm

I think that the probabilities are reversed in the paragraph beginning, “We are not interested in these folks anymore … ”

In any case, I’ve learned that I can double the odds of winning the Mega Millions lottery by buying two tickets instead of one — from one in 175,711,536 to two in 175,711,536. And as they always remind me, “If you don’t play, you can’t win.” The odds of winning are infinitely better if I buy a ticket that if I don’t.

25 September 2011 at 12:47 pm

Speed.

Thanks, fixed.

25 September 2011 at 12:54 pm

I like the updated graph too. Much more convincing.

25 September 2011 at 12:58 pm

I’m not apoplectic just yet, but I’m closely monitoring myself.

Please put me in the database as a “possible.”

I wonder what the statistics are for those who become apoplectic upon seeing Morgan Freeman proclaim Obama’s foes to be racists. These are the same racists who awarded Herman Cain 1st place in Florida’s straw poll.

25 September 2011 at 6:38 pm

Dr. Briggs: Apoplexy hits me every time I hear a new heath related statistic on the news. There are a long string of alarms rendered by the news media reporting that the latest medical studies show that I have x% increased chance of having a serious medical problem from using salt. Now, this information in this form is worthless because the conditions assumed are total unspecified among other things. I know that salt has varuable minerals such iodine that helps prevent thyroid problems. I can’t do without it. I guess people just accept this statistical inference and change their life styles accordingly. Because so many people are innumerate, especially when it comes to statistical inferences, they have no way of assessing the implications for themselves. Do you have the same apoplectic response? Of course the field of journalism does not require numeracy. Do they ever check with someone who does?

25 September 2011 at 9:23 pm

Label on graph – New Apolplexies??

25 September 2011 at 11:36 pm

I don’t get apoplexy anymore since my doctor now has me on a regular diet of apoplexy medication. Apoplexy is a side effect of Statistical Questioning Syndrome and that’s really what I have. This is a new disease recently voted in by the AMA who have begun to see how the APA methodology helps its practitioners (chiropractors have long used this methodology, too). One symptom of SQS is an eyebrow raised at AMA dicta.

26 September 2011 at 8:16 am

Tony,

Yes, out of the 40 new readers/non-readers, this is the probability for the number of apoplexies we might see. The most likely value of new apoplexies out of 40 for non-readers is 8, which will happen with about a 12% chance, etc.

The 40 is arbitrary. I could have picked just 1, or 1000, or whatever was of interest to me.

26 September 2011 at 11:25 am

How do you know that 1.95 is the ACTUAL risk ratio? Maybe the actual risk ratio is 2.2, and then both numbers underestimate the risk ratio. Plus your calculations can be sensitive to the choice of a prior distribution.

26 September 2011 at 12:39 pm

JH,

Prior sensitivity is not problematic—and in any case, in simple situations like this give the exact same answers as the frequentist solution, as you know. We know that 1.92 is the actual risk of the observables, given the model is true and the data observed.

Of course, I do not claim the model is true, nor the data flawless. But given they are, then my statement is true.

26 September 2011 at 10:51 pm

Briggs,

re. New Apolplexies.

Seems to be an extra ‘L’.

Got me wondering about Steve Jobs :)

28 September 2011 at 4:53 am

Of course we often read that stress is a major causative factor in cancer. So I used to bait my greenie friends with the observation that, if this is so, then their alarmism over chemicals causing cancer (when there was little if any

physicalevidence that they did), then the increasing cancer rate was a result of their alarmism.