I’ve pointed out many times some of the weaknesses of traditional statistical practices, and how the routine use of “hypothesis testing” leads to an ever-growing mountain of Broken Science.
Some say the cure for this is to switch to something called “Bayesian analysis.” Alas, while Bayesian procedures help some, they are far from a panacea, because they rely on the same foundational error those traditional procedures do.
Let me explain, but I warn you it won’t be easy going.
When data is collected in science to aide in answering a question, something like the following often happens.
First, “parameterized” probability (or statistical) models are created. These quantify the uncertainty of some measurable thing, and usually show how that uncertainty changes with different assumptions. The “parameters” can be thought of as knobs or dials that when turned change the probabilities.
Let’s use weight loss for an example, but keep in mind that what I have to say goes for any use of probability models.
We want to quantify in an experiment the uncertainty we have in weight loss among groups of people. Let’s suppose half the people in our experiment took some new drug which is touted to lead to greater weight loss.
The first mistake, and one that is ubiquitous, is to assume that probability models can prove the drug caused any observed weight loss. The drug may or may not work, but the model used to analyze it cannot be used as proof of this. These models, after all, begin with the assumption the drug is important or causal. To conclude because the model fits the data well that cause has been shown is to argue in a circle.
The opposite error is also common. The drug could be a cause of weight loss, but the model is poor at representing it, or the data, which is part of the model, is insufficient or flawed.
Incidentally (and you can skip this), it is curious that while the data is an integral unremovable part of the model, data is often thought of as something independent of the model. This is true mathematically, but only in the formal academic creation of model types. It is never true philosophically or in practice. Failure to recognize this exacerbates the misconception that probability models can identify cause.
Here is where it becomes interesting, and a lot more difficult.
There are two main streams of statistical thinking on models, both of which mistakenly take probability as a real thing. It is not obvious at first, but this is the fundamental error. This mistaken idea that probability is a real property of the world is why it is thought probability models can identify cause. Instead, probability is only a way to quantify our uncertainty. It is part of our thoughts, not part of the world.
Those two modeling streams are frequentism and Bayesianism, and a lot of fuss is made about their differences. But it turns out they’re much the same.
Frequentism leads to “null hypothesis significance testing”, and the hunt for wee p-values based on statements about the parameters of models. Bayesianism does testing in a slightly different way, but it, too, keeps its focus on the parameters of models. Again, these parameters are the guts inside models, which both frequentism and Bayes treats as real things. In our example, weight loss is real, but thoughts about weight loss are not real and not part of the world.
The idea behind hypothesis testing or Bayesian analysis is the same. Both suppose that if the model fits well, the model parameter associated with weight loss (or whatever we’re interested in) in reality takes some real value (hypothesis testing) or that the uncertainty in its value can be quantified (Bayes). If parameters are a part of the world, like rocks or electrons or whatever are, then it makes sense that if parameters take the “wrong” or the “right” value, they say something about the world. Just like weight says something real about the world. Yet parameters are like thoughts of weight loss, not weight loss itself.
Proving that parameters are purely matters of thought, and are not real, takes some doing (I have a book on the subject, but beware, it is not light reading). However, I offer a simple intuitive argument here.
Models are like maps of reality: and maps are not the territory. They are only abstractions of it. The figures and symbols on maps are not real, but they help give an idea of reality. Confusing the symbols on the map as reality would be the same mistake that is made in assuming the name of thing is the thing itself. And do not forget that there are many different maps that can be made of the same location, some better or less useful than others. Those symbols on the maps can’t all be real. Same thing with models and their parameters.
What needs to happen in statistical modeling, then, is to remove focus from inside models, stop obsessing over parameters, and put the focus back on observable reality. Have models make testable (probability) predictions about reality, and stop all indirect unverifiable statements about parameters. Make predictions of real things, actual measurable entities in the world.
If models are good, they will make good predictions. If they are bad, they won’t. It’s really as simple as that.
Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: $WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank.
not surprising that the priciest piece of [toy] software calculator fails at reliable prediction:
in Chat~AGI~ enter this prompt:
.answer terse. fill in the ___ blanks, precisely and unique: You are in the savanna and if you can see a stork and also see an artificial ___ of it, then you know that the ___ idea was created on planet earth. tell what the ___ idea is
Change the word artificial to your taste, e.g. physical; don’t expect the system to understand that art can only exists if someone has an idea not just in mind … (I haven’t tried to teach it so, his stepuncle claims he already did).
The entire pharmaceutical investment industry is fixated on Wee Ps. This is aided by Universty statistics departments globally.
While I agree with everything you’ve written, trying to break the use of hypothesis testing is like Sysiphus pushing that stone.
If product approvals were based upon predictive success then there would be few drugs on the market. That’s not a bad thing, in my view.
Weight loss?!? I protest! Why read, let alone write, an article premised on fatphobia?
I have to agree with Robin (above) “trying to break the use of hypothesis testing is like Sysiphus pushing that stone.”. However, I think that wee p’s dominate thinking because there really are no widely accepted, easily presented to the non technical, alternatives. So maybe invent a better way to compare this to that?
I like the map analogy. Interesting that rearranging mental maps is a way to better manage actual territory. When mental maps are misleading travelers get lost. When they are accurate the goods get delivered, the bad guys are thwarted, and the hero — a tall fellow with blond hair and blue eyes — gets the girl.
As usual you’re addressing the wrong problem. Rulers used to get “experts” to examine chicken entrails to give an “objective” justification for their actions. Nowadays the “experts” use WeeP or if they want to be edgy Bayes or for more money and prestige they program computer models to give out the required scenarios.
Tilting at windmills is amusing for sure but let’s not forget it’s only useful for entertainment purposes.
Ephesians 6:12 (KJV)
I take it this was what you alluded to when I asked on twitter a while back about your thoughts on Judea Pearl.
I was sincerely skeptical of Bayesian analysis outright, worsened on it by the ardor of those who promote it as God’s own ritual, and that book he wrote about AI understanding causality sealed the grave shut. Your post is like its eulogy.