Statistics

Statistical Results Associated With Increased Risk Of Exaggerating Risk

Our title, which is indistinguishable from a flood of others1, might read, “Reading Articles About The Misuse Of Statistics Increases Risk Of Apoplexy.”

Yes, for every article you read like this one, your risk of becoming apoplectic over the improper use of statistics increases 2.0 times.

What does that “2.0-fold increase in risk” mean? Not just for this finding, but for any which reports results in the form of “increased risk” of suffering from a malady after being exposed to some “risk factor.” In this study, “exposure” is reading this blog, which is the risk factor, and “non-exposure” is not reading.

Suppose (somehow) you knew the probability of developing the malady given you were not exposed to the “risk factor.” Call it probnot exposed. You also have to know (somehow) the probability of developing the malady given you were exposed; called probexposed. Relative risk is

     RR = probexposed / probnot exposed.

You could also calculated the odds ratio. First know that odds are a one-to-one function of probability, viz:

     Odds = prob / (1 – prob).

The odds ratio is like the risk ratio, but the ratio of the odds, not probabilities:

     OR = oddsexposed / oddsnot exposed.

Now suppose that probnot exposed = 0.000001, which is a one in a million chance of developing the malady given you were not exposed. If you then hear that being exposed “increases the risk by 2.0 fold”, then this means the risk ratio must be 2.0. Back solving gives the probability of developing the malady after exposure as 0.000002. (Similar calculations can be done for odds ratios.)

In this case, exposure drove your risk from one in a million to just 2 in a million. We can already see that presenting results in raw probability will not be as pulse pounding as speaking in terms of risk or odds. Information is also lost in giving the risk ratio: the customer has no idea what the risk is in the control group. So one fix would be to give emphasis to the actual probabilities of suffering, and not just the risk ratio.

But even if that is done, something would still be wrong. Can you spot what?

For the apoplexy finding, we do not know what the probability of apoplexy is for this blog’s readers. Nor do we know what the probability of apoplexy is for non-readers. Therefore, we cannot know the risk. We can, through statistical formula, estimate it. But that estimate will exaggerate the true risk.

For example, suppose that we witnessed 18 cases of sputtering apoplexy in 40 readers of this blog, but we only found it only 9 times in 40 non-readers. That gives an estimated “statistically significant” risk ratio of 2.0. But this exaggerates risk in the following sense.

Now, we can guess probexposed is about 0.45 and probnot exposed is about 0.225 for new groups of readers and non-readers “similar” to the ones sampled here. (Incidentally, those probabilities and that RR are, however, exact for these 80 readers and non-readers.) Risk ratios

We are not interested in these folks anymore, but in new ones. That is the point, after all, of doing this study. The actual probability2 that the next, new blog readers develops apoplexy is 0.452, which is close to, but just over, the raw estimate of 0.45. And the actual probability that the next, new non-readers develops apoplexy is 0.232, which is also higher than the raw estimate of 0.225. This puts the actual risk ratio at 1.95, which is under the raw estimate of 2.0.

Not a huge difference in this fictional example, to be sure, but the difference between the raw and actual difference will always be in the direction of exaggerating the risk. Taken over the tens of thousands of studies reporting risk, the overall effect is large.

The reason these differences exists is because the traditional method reports parameter estimates, and not actual probabilities or actual risk ratios. Parameters are the internal, unobservable parts of the probability models which are used to quantify uncertainty in the data. They are also the focus of nearly all statistical methods (because of inertia, custom, and lack of knowledge of alternatives).

Reporting in terms of actual observables not only gives a true impression of the probabilities and risks, but allows us to answer more complicated questions about the data and to provide richer information. For example, reporting on observables we can picture the probability that each of 0, 1, 2, …, 40 new readers/non-readers develop apoplexy. That’s done in the picture.

This kind of picture is extraordinarily important because it will give superior estimates for cost and benefit analyses, which are guaranteed to be exaggerated using parameter-based methods.

——————————————————————————-

1Search for the terms “increase(s)(ed) risk of”; millions of hits.

2See the “modern stats” at this link for how to calculate these. The actual probabilities will always move closer to 0.5 than the raw parameter estimates.

Categories: Statistics

12 replies »

  1. I think that the probabilities are reversed in the paragraph beginning, “We are not interested in these folks anymore … ”

    In any case, I’ve learned that I can double the odds of winning the Mega Millions lottery by buying two tickets instead of one — from one in 175,711,536 to two in 175,711,536. And as they always remind me, “If you don’t play, you can’t win.” The odds of winning are infinitely better if I buy a ticket that if I don’t.

  2. I’m not apoplectic just yet, but I’m closely monitoring myself.

    Please put me in the database as a “possible.”

    I wonder what the statistics are for those who become apoplectic upon seeing Morgan Freeman proclaim Obama’s foes to be racists. These are the same racists who awarded Herman Cain 1st place in Florida’s straw poll.

  3. Dr. Briggs: Apoplexy hits me every time I hear a new heath related statistic on the news. There are a long string of alarms rendered by the news media reporting that the latest medical studies show that I have x% increased chance of having a serious medical problem from using salt. Now, this information in this form is worthless because the conditions assumed are total unspecified among other things. I know that salt has varuable minerals such iodine that helps prevent thyroid problems. I can’t do without it. I guess people just accept this statistical inference and change their life styles accordingly. Because so many people are innumerate, especially when it comes to statistical inferences, they have no way of assessing the implications for themselves. Do you have the same apoplectic response? Of course the field of journalism does not require numeracy. Do they ever check with someone who does?

  4. I don’t get apoplexy anymore since my doctor now has me on a regular diet of apoplexy medication. Apoplexy is a side effect of Statistical Questioning Syndrome and that’s really what I have. This is a new disease recently voted in by the AMA who have begun to see how the APA methodology helps its practitioners (chiropractors have long used this methodology, too). One symptom of SQS is an eyebrow raised at AMA dicta.

  5. Tony,

    Yes, out of the 40 new readers/non-readers, this is the probability for the number of apoplexies we might see. The most likely value of new apoplexies out of 40 for non-readers is 8, which will happen with about a 12% chance, etc.

    The 40 is arbitrary. I could have picked just 1, or 1000, or whatever was of interest to me.

  6. This puts the actual risk ratio at 1.95, which is under the raw estimate of 2.0. … but the difference between the raw and actual difference will always be in the direction of exaggerating the risk.

    How do you know that 1.95 is the ACTUAL risk ratio? Maybe the actual risk ratio is 2.2, and then both numbers underestimate the risk ratio. Plus your calculations can be sensitive to the choice of a prior distribution.

  7. JH,

    Prior sensitivity is not problematic—and in any case, in simple situations like this give the exact same answers as the frequentist solution, as you know. We know that 1.92 is the actual risk of the observables, given the model is true and the data observed.

    Of course, I do not claim the model is true, nor the data flawless. But given they are, then my statement is true.

  8. Of course we often read that stress is a major causative factor in cancer. So I used to bait my greenie friends with the observation that, if this is so, then their alarmism over chemicals causing cancer (when there was little if any physical evidence that they did), then the increasing cancer rate was a result of their alarmism.

Leave a Reply

Your email address will not be published. Required fields are marked *