News Pass on today’s post to any you know who uses statistics, please.
Good news and bad news. Good news it that there is a growing number of people who are aware that p-values are sinking science and they want to do something, really do something, to fix the situation.
Bad news: the proposed fixes are only tweaks, like requiring wee p-values be weer (wee-er?) or to use Bayes factors.
There is also terrific news. We can fix much—but not all, never all—of what is broken by eliminating p-values, hypothesis tests, and Bayes factors altogether. Banish them! Bar them! Blacklist, ban, and boycott them! They do not mean what you think they do.
The main critique comes in a new paper co-authored by a blizzard of people: “Redefine Statistical Significance“. The lead author is Daniel J. Benjamin, and the second is Jim Berger, who is well known. To quote: “One Sentence Summary: We propose to change the default P-value threshold for statistical significance for claims of new discoveries from 0.05 to 0.005.”
There is at least one blog devoted to the reproducibility or replication crisis in science (thanks to Gary Boden for finding this one). There is growing recognition of severe problems, which lead one fellow to tell us “Why most of psychology is statistically unfalsifiable“. And see this: “The new astrology: By fetishising mathematical models, economists turned economics into a highly paid pseudoscience.”
What are P-values?
Despite what frequentist theory says, the great bulk of hypothesis test users believe wee p-values (and large Bayes factors) have proved, or given great weight to, causal relations. When a p-value is wee, they say X and Y are “linked” or “associated”, and by that they always mean, even if they protest they do not mean, a causal relationship.
A wee p-value means only one thing: the probability of seeing an ad hoc statistic larger than the one you did see is small given a model you do not believe. This number is as near to useless as any number ever invented, for it tells you nothing about the model you don’t believe, nor does it even whisper anything about the model you do believe.
Every use of the p-value, except in the limited sense just mentioned, involves a fallacy. I prove—as in prove—this in this award-eligible book Uncertainty: The Soul of Modeling, Probability & Statistics . How embarrassing not to own a copy!
Also see this blog’s Book Page which has links to many articles on relevant topic.
I have an upcoming JASA paper in discussion to Blakeley McShane and David Gal’s also upcoming “Statistical Significance and the Dichotomization of Evidence”, in which I outline the replacement for p-values. Academic publishing is lethargic, so look for this in August or even December.
Meanwhile, here are elements of a sketch of a condensation of an abbreviation of the outline. The full thing is in Uncertainty. I will answer below any new criticism that I have not already answered in Uncertainty—meaning, if I don’t answer you here, it means I already have in the book.
We have interest in proposition Y, which might be “This patient gets better”. We want the probability Y is true given we know X_0 = “The patient will be treated by the usual protocol” or X_1 = “The patient will be treated by the New & Improved! protocol”. We have a collection of observations D detailing where patients improved or not and which protocol they received. We want
Pr(Y | X_i D).
This could be deduced in many cases using finite, discrete probability, but that’s hard work; instead, a probability model relating D and X to Y is proposed. This model will be parameterized with continuous-valued parameters. Since all observations are finite and discrete, this model will be an approximation, though it can be an excellent one. The parameters are, of course, of no interest whatsoever to man or beast; they serve only to make the model function. They are a nuisance and no help in answering the question of interest, so they are “integrated out”. The end result is this:
(1) Pr(Y | X_i D M),
where M is a complicated (compound) proposition that gives details about the model proposed by the statistician. This is recognized as the predictive posterior distribution given M. M thus also contains assumptions made about the approximate parameters; i.e. whether to use “flat” priors and so on.
This form has enormous benefits. It is in plain language; specialized training isn’t need to grasp model statements, though advanced work (or better software) is needed to implement it. Everything is put in terms of observables. The model is also made prominent, in the sense that it is plain there is a specific probability model with definite assumptions in use, and thus it is clear that answers will be different if a different model or different assumptions about that model are used (“maxent” priors versus “flat”, say).
Anybody can check (1)’s predictions, even if they do not know D or M’s details. Given M and D, authors might claim there is a 55\% chance Y is true under the new protocol. Any reader can verify whether this prediction is useful for him or not, whether the predictions are calibrated, etc. We do not have to take authors at their word about what they discovered. Note: because finding wee p-values is trivial, many “novel” theories will vanish under (1) (because probabilistic predictions made using and not using the “novel” theory will not differ much; p-values wildly exaggerate differences).
A prime reason p-values were embraced was that they made automatic, universal decisions about whether to “drop” variables or to keep them (in a given model schema). But probability is not decision; p-values conflated the concepts. P-values cannot discover cause.
There are an infinite number of “variables” (observations, assumptions, information, premises, etc.) that can be added to the right-hand-side of (1). In our example, these can be anything—they can always be anything!—from a measure of hospital cleanliness to physician sock color to the number of three-teated cows in Cleveland. The list really is endless. Each time one is put into or removed from (1), the probability changes. Which list of variables is correct? They all are. This is true because all probability is conditional: there is no such thing as unconditional probability (this is also proven in Uncertainty).
The goal of all modeling is to find a list of true premises (which might include data, etc.) which allow us to determine or know the cause of Y. This list (call it C) will give extreme probabilities in (1); i.e.
Pr(Y | X_i D C) = 0 or 1.
Note that to determine and to cause are not the same; the former means to ascertain, while the latter is more complex. Readers generally think of efficient causes, and that is enough for here, though these comprise only one aspect of cause. (Because of underdetermination, C is also not unique.) Discovering cause is rare because of the complexity of C (think of the myriad causes of patient improvement). It is still true that the probabilities in (1) are correct when M is not C, for they are calculated based on different assumptions.
What goes into M? Suppose (observations, assumptions, etc.) some W is considered. The (conditional) probability of Y with and without W in (1) is found; if these differ such that the model user would make different decisions, W is kept; else not. Since decision is relative there is thus no universal solution to variable selection. A model or variable important to one person can be irrelevant to another. Since model creators always had infinite choice, this was always obvious.
Bad news for New Agers. We are not made of God, or part of Him.
1 Things already said make it quite clear that the soul is not of God’s substance.
2 For it was shown in Book I of this work that the divine substance is eternal, and that no perfection of it has any beginning. Human souls, however, did not exist before bodies, as we have just shown. Therefore, the soul cannot be made of God’s substance.
3 It was likewise shown in Book I that God cannot be the form of anything. But the human soul is, as proved above, the form of the body. Therefore, it is not of the divine substance.
4 Moreover, everything from which something is made is in potentiality to that which is made from it. But the divine substance is not in potentiality to anything, since it is pure act, as was shown in Book I. Therefore, neither the soul nor anything else can possibly be made from God’s substance.
5 Then, too, that from which something is made is in some way changed. But God is absolutely unchangeable, as was proved in Book I It is, therefore, impossible for anything to be made from Him.
6 Furthermore, that the soul suffers variations in knowledge and virtue, and their opposites, is a fact of observation. But in God there is absolutely no variation, either through himself or by accident.
Notes Recall God is metaphysically simple. Unfortunate word, since we think of complexity as superiority. But complexity is restrictive.
7 Also, it was shown in Book I that God is pure act, completely devoid of potentiality. But in the human soul we find both potentiality and act, since it contains the possible intellect, which is in potentiality to all intelligibles, as well as the agent intellect, as was shown above. Therefore, it is not of God’s nature that the human soul is made.
8 Again, since the divine substance is utterly indivisible, the soul cannot be part of it, but only the whole substance, But the divine substance can be one only, as shown in Book I. It therefore follows that of all men there is but one soul so far as intellect is concerned. And this was disproved above. Therefore, the soul is not made of God’s substance.
9 Now, the theory that the soul is part and parcel of God’s own substance or nature seems to have had three sources: the doctrine that no substance is incorporeal; the doctrine that there is but one intellect for all men; the very likeness of our soul to God. As to the first source, some, having denied that any substance is incorporeal, asserted that God is the noblest body, whether it be air or fire or anything else putatively a principle, and that the soul was of the nature of this body. For, as Aristotle points out [De Anima I, 2], the partisans of this doctrine all attributed to the soul whatever to their mind had the character of a principle. So, from this position, it followed that the soul is of the substance of God. And from this root sprang the theory of Manes, who held that God is a luminous body Wended through infinite space, and of this body, he said, the human soul is a fragment.
10 This theory, however, was previously refuted by the demonstration that God is not a body, as well as the proof that neither the human soul nor any intellectual substance is a body.
11 As to the second source indicated above, some have held that of all men there is but a single intellect, whether an agent intellect alone, or an agent and a possible intellect together, as we explained above. And since the ancients attributed divinity to every separate substance, it followed that our soul, the intellect by which we understand, is of the nature of the divine. And that is why in this age certain persons who profess the Christian faith and who posit a separately existing agent intellect explicitly identify the agent intellect with God.
12 Now, this whole doctrine of the unicity of man’s intellect has already been refuted.
Notes It’s anyway disproved daily in practice.
13 In the very likeness of our soul to God may be found the third source of the theory that the soul is of the substance or nature of God Himself. For we find that understanding, which is thought to be proper to God above all, is possessed by no substance in this lower world except man—and this on account of his soul. It might, then, seem that the soul partakes of the nature of God; and this notion might appeal especially to persons firmly convinced of the immortality of the human soul.
14 This idea even seems to find support in the Book of Genesis (1:26), where, after the statement, “Let us make man to Our image and likeness,” it is added: “God formed man of the slime of the earth; and breathed into his face the breath of life.” From this text some wished to infer that the soul is of the very nature of God. For, since he who breathes into another’s face puts forth into the latter numerically the same thing that was in himself, holy Scripture itself would here seem to imply that God put into man something divine in order to give him life.
15 But the likeness in question is no proof that man is a part of the divine substance, for man’s understanding suffers from many defects—which cannot be said of God’s.
This likeness, then, is rather indicative of a certain imperfect image than of any consubstantiality. And, indeed, Scripture implies this in saying that man was made “to the image” of God. And thus the “breathing” of which Genesis speaks signifies the pouring forth of life from God into man according to a certain likeness, and not according to unity of substance. So, too, “the spirit of life” is said to have been “breathed into his face,” for, since the organs of several senses are located in this part of the body, life is more palpably manifested in the face. God, therefore, is said to have breathed the spirit into man’s face, because He gave man the spirit of life, but not by detaching it from His own substance. For he who literally breathes into the face of someone—and this bodily breathing is evidently the source of the Scriptural metaphor—blows air into his face, but does not infuse part of his substance into him.
I am only about two years behind on my emails. Every email folks send me is worthy of a full post, but obviously I am so far behind I’ll never catch up. But it’s all juicy material and readers need to see it.
From reader Weston, a link to a paper about conspiracies. More quantifying of the unquantifiable.
Thought I’d elect my own bad science paper for the year, on if the following conspiracies weren’t true, the math shows that they would have probably been debunked by now. The 4 conspiracies chosen are the moon landing, global warming, harmful vaccinations, and the cure for cancer. That’s a lively bunch.
The author, of course, has all sorts of great citations at the beginning to understand how a conspiracy works. And then the author starts by using a failure model to say that a conspiracy will become debunked if one person talks.
Going a bit further, the paper assumes that there was a single event for which the truth about the situation is known. A moon landing is a good example of this. The moon landing, one time, either happened or it didn’t. Climate change is not a single event. Vaccinations are not single events. So not only do I think his premise is flawed, but he’s already undercut himself with his examples; He’s making an enormous amount of assumptions, and then applying to situations that do not match his assumptions. Oh, and no test data. Which is the kicker.
There’s even some wee-p values!
As an avid reader, thought you’d enjoy
From Warren, a peer-reviewed paper on the supposed racial differences in pain assessment and treatment recommendations. The academy has race on the brain, the racists.
Subject: Candidate for Bad Science award, racial bias in prescribing pain meds
Found news releases of this article floating around on Facebook. The basic point touted in the news articles is that research claims that racial bias is one reason why blacks are treated disparately when given pain medicine. It deals with white laypeople and medical professionals (or trainees) endorsing false statements about biological differences between blacks and whites, and how that influenced their ability to rate the pain and provide recommendations for prescribing pain medicines in (just) two hypothetical cases, one white and one black patient.
Of course none of the news articles link to the original article, but it’s PNAS and behind a paywall. I would consider it a candidate for your Bad Science award.
Right off the bat, they create a “composite” of false beliefs regarding blacks by averaging together people’s survey responses for each of the false questions (which was on an ordinal scale) and then used continuous statistics with it. The differences between the groups (without any sense of uncertainty) also is suspect. There may be other major flaws with it, but I’m only a fledgling.
Here is the link to the full article: LINK
Don’t worry, be happy
Lastly, a Blog Challenge. This is from Christopher. The link is to the article “Does A ‘Happiness Gene’ Exist?”:
I’m at the end of my rope. The only way I can cope with this latest “science” nightmare is a Briggs-style deconstruction. Please consider.
The challenge is critique the paper (which you’ll have to get) in the way in which you have been taught. Then write a Guest Post which I’ll happily post.
Theodore Dalrymple in City Journal reviewing Paul Hollander’s From Benito Mussolini to Hugo Chavez: Intellectuals and a Century of Political Hero Worship said (a long but necessary quotation; the emphasis in the third paragraph is mine):
Imprisoned serial killers of women are often the object of marriage proposals from women who know nothing of them except their criminal record. This curious phenomenon indicates the depths to which self-deception can sink in determining human action. The women making such offers presumably believe that an essential core of goodness subsists in the killers and that they are uniquely the ones to bring it to the surface. They thereby also distinguish themselves from other women, whose attitude to serial killers is more conventional and unthinkingly condemnatory. They thus see further and deeper, and feel more strongly, than their conventional sisters. By contrast, they show no particular interest in petty, or pettier, criminals.
Something similar can be noted in the attitude of at least some intellectuals toward dictators, especially if those dictators claim to be in pursuit of a utopian vision. Paul Hollander, professor emeritus of sociology at the University of Massachusetts, Amherst, has long had an interest in political deception and self-deception—not surprising in someone with first-hand experience of both the Nazis and the Communists in his native Hungary. In 1981, he published his classic study of Western intellectuals who traveled, mainly on severely guided tours, to Communist countries, principally Stalin’s Russia, Mao’s China, and Castro;s Cuba, and returned with glowing accounts of the new (and better) worlds under construction there. The contrast between their accounts and reality would have been funny had reality itself not been so terrible.
In From Benito Mussolini to Hugo Chavez, Hollander turns his attention to intellectuals’ views of a wider range of dictators and authoritarian leaders. His study makes no pretension to scientific, or rather pseudoscientific, quantification, for example by first defining random groups of dictators and intellectuals and then administering structured questionnaires to the intellectuals about their attitude to the dictators. This kind of precision is often mistaken for rigor, but measurement is not meaning, and humans inhabit a world of meaning. Hollander’s study is therefore qualitative: none the worse, and a lot more interesting, for that. Even if only 10 percent of Western intellectuals, however defined, were apologists for, or admirers and supporters of dictators—sometimes serially, so that when one dictator finally dies or disappoints, another is adopted as a political hero—the phenomenon would still be significant and important. The list of influential intellectuals who have given their blessing to the most obviously terrible regimes is impressive: H. G. Wells, George Bernard Shaw, Romain Rolland, Jean-Paul Sartre (a serial offender), Norman Mailer, C. Wright Mills, Michel Foucault, and scores of others.
Pseudoscientific quantification. Amen and amen.
Two problems. Statistics, which is to say classical statistical modelinig, is now everywhere accepted as replacement for reality. We can no longer just say “Here is what happened.” No, we have to say what happened might not of happened because what happened wasn’t consonant with some ad hoc model. That’s called hypothesis testing, and is the first leg of pseudo-quantification.
Second, putting numbers where numbers don’t belong. I recently wrote this to a client about analyzing a set of survey questions:
Remember: do not over interpret. Most times correlations on surveys are only because the questions are near re-wordings of each other. Q1 = “Do you like corn?” Q2 = “Do you eat a lot of corn?” would have high correlation. But if you called Q1 “Maize Appreciation” and called Q2 “Oleic acid ingestion” (because corn oil has that substance), then you’d be foolish to write a paper which claims, “Oleic acid linked to increased maize appreciation.”
But it happens ALL the time.
It does. All the time.