Only a couple of days left, everybody. Then back to more interesting subjects. Incidentally: when is the best time to contact a statistician? Before you begin to collect data. You do not how to store data as well as we do.
Today we learn the main difference in philosophy in practice between objective, predictive Bayesian and classical analyses. To illustrate, I’m going to ask you some questions that will seem like tricks: but they are not. The answers are all obvious.
Here’s the situation: you are an emergency room physician checking people for suspicion of appendicitis. Suppose you are interested in whether temperature helps predict appendicitis. Another way to state this: suppose temperature is correlated with appendicitis. Yet another way: does temperature have any relationship with appendicitis? Get it? We want to check whether temperature and appendicitis have any relationship. This is what you want to know.
Now, let’s imagination that temperature is in no way associated with appendicitis. That is, knowing whether a person has a fever, is normal, or is hypothermic means nothing to appendicitis. It is the same as the number of green lollipops your corner grocery store has and that number’s association with appendicitis. That is, knowing the temperature or the number of lollipops tells you nothing about appendicitis.
Got it? Good, then answer this: Given this evidence, what is the probability that more people with high temperatures have appendicitis than do people with low temperatures? The answer is the same as this: what is the probability that more people whose corner stores have many lollipops have appendicitis than do people whose corner stores have few lollipops?
If these variables are in no way related to appendicitis, then the answer is obvious: any changes in the number of people with appendicitis is due to other factors (which we may or may not have measured). If we did see more people with appendicitis in the high lollipop (temperature) group, then this is just a coincidence. The probability is 50%
OK so far? Now let’s imagine we ran a standard (logistic) regression of temperature to predict appendicitis. The software would spit out a p-value for the classical hypothesis test with the “null” hypothesis that the parameter associated with temperature is 0. Which is to say, that temperature has no effect. Let’s imagine the p-value associated with this test is 0.003. What would a classical statistician do?
Yes, “reject” the hypothesis that the parameter associated with temperature is 0 and say, therefore, that it is non-zero. That is, he would conclude that temperature is associated with appendicitis. Very well. But our statistician at least wants to check the (subjective) Bayesian way and so computes the posterior probability distribution for the parameter.
He discovers that, given the model is correct and the data he saw, that the probability the parameter associated with temperature is greater than 0 is, say, 99.9%. Which is to say, he is pretty darn sure that temperature is positively associated with appendicitis: higher temperatures predict more appendicitis cases.
The statistician now rests. He should not. Here’s why.
Suppose we know with certainty a parameter in a logistic regression model has the value 0.000001. What is the probability that this parameter is greater than 0? Think carefully now. The answer should be obvious.
In all cases I have ever heard of, 0.000001 > 0. Therefore, the parameter is greater than zero. If this were so, the p-value we would get in any hypothesis test would show a value of 0. Now that’s significant! The posterior probability would also show that the probability the parameter is greater than 0 is 100%. Thus, we could publish our paper that the variable associated with this parameter is “highly” statistical significant, even to Bayesians.
But none of these facts answer our main question: how does temperature affect appendicitis. That’s what we started wanting to know, and with the classical analysis we stop short of learning the complete answer. Sure, the temperature parameter’s value of 0.000001 is greater than 0, but it is so small as to be useless.
What we should be doing is calculating the probability that we see more (new) people with appendicitis and high temperatures than (new) people with appendicitis and low temperatures. We already agreed that if temperature had no effect, then this probability would be 50%.
If, then, after we fit our model (with the low p-values and the high posterior probabilities), we push ahead and compute the uncertainty in new observations and so discover that the probability we see more people with high temperatures and appendicitis is only 52%, what have we learned?
Well, that while temperature may formally be related to appendicitis, its predictive value is very low, and probably negligible. Knowing the value of a person’s temperature barely—ever so barely—changed our uncertainty in the values of new observables. It changed it so little that even though all classical measures say we should consider it, we really probably shouldn’t.
And that is the difference between the old and the new.
I’m in a terrible hurry today and know that I’ve explained this badly. I’ll try again another day.
No, Sir, you did a good job explaining a very difficult subject.
Thank you for taking time to try to educate people about one of the most used (and least understood) methods used for decision making.
My field is radiation physics (medical and health physics). You constantly see studies that look at correlation between nonionizing (ie. radio) or ionizing (x-rays) radiation exposure and any health issue known to man (various types of cancer most commonly, but other diseases as well).
The research will look at 100 types of cancer (for example) and correlate cancer rates with radiation exposure. They see positive correlation with 3 cancers and say (I guess accurately but pointlessly) that at the 95% confidence value we can say that these three types of cancer are “caused by” (they seldom say correlate with) radiation exposure. They certainly correlate but we would expect (as I understand statistics) that 5 cancers (out of 100) would correlate (at the 95% confidence level) with any independent, measureable quantity in the study groups lives (whether it was education level, color blindness, size of credit card debt, etc.)
Then this result is used to make decisions about monetary compensation for future health problems, amount of shielding needed around X-ray rooms, outlawing use of nuclear power plants, etc., etc.
We have a society that is so risk averse that we are strangling ourselves. Now we use the “precautionary” principle so we don’t even have to have bogus statistical correlations to prohibit whatever is the newest “risk” de jour. We just have to convince some judge that the activity might someday be found to correlate with something bad and we will ban it just as a precaution.
In my mind, the biggest risk any of us face is the the possiblity that modern, industrial civilization may collapse and we will go back to an average lifespan of 40 years (as it was about 100 years ago (after a lot of major public health issues–vaccination, clean water, dental hygiene, controlling insect vectors of disease had already helped out a lot). If we lose the ability to purify water, prepare and distribute vaccines, make window screens, etc.) we will go back to who knows what quality and length of life.
Please let me know if I am misunderstanding your point as I really am not a good statistician but I think it is very important to have some basic literacy in this area.
Thanks again.
Briggs:
I imagine two possible experiments.
1. Sample a large population measuring temperature and presence/absence of appendicitis.
2. Sample two populations, one with appendicitis and one without, measuring temperature of each.
How should the statistics be handled in each case?