# Idols With Wee P Values. Statistics As Ritual

Or, rather, wee p-values are idolized. And it isn’t just me saying so. Reader Dan Hughes points us to Gerd Gigerenzer and Julian Marewski’s *peer-reviewed* paper “Surrogate Science: The Idol of a Universal Method for Scientific Inference” (pdf) in the *Journal of Management*.

The paper can be read by anybody (well, you get the idea), but here are the juicy quotes and my comments. It’s long, but boy oh boy is it fun!

Determining significance has become a surrogate for good research.

Amen! Preach it, brother. Sing it loud. Hallelujah.

One of us reviewed an article in which the number of subjects was reported as 57. The authors calculated that the 95% confidence interval was between 47.3 and 66.7 subjects. Every figure was scrutinized in the same way, resulting in three dozen statistical tests. The only numbers with no confidence intervals or p values attached were the page numbers.

That author was nuts and forgot *statistics are never needed to tell us what happened.* Even though, yes, this unnecessary duplication and absurd quantification are the lifeblood of frequentism.

…in physics, Newton’s theory of simple cause and effect was replaced by the probabilistic causes in statistical mechanics and, eventually, by quantum theory.

The consequence is cause has long been forgotten. Or, rather, cause is whatever the research says it is. Terrible harm has been done because of this.

To understand how deeply the inference revolution changed the social sciences, it is helpful to realize that routine statistical tests, such as calculations of p values or other inferential statistics, are not common in the natural sciences. Moreover,

they have played no role in any major discoveries in the social sciences.[emphasis mine]

Nor in any other science. P-values only prove—or “prove”—(a) what is already known (proof), or (b) what is probably false (“proof”).

The Null Ritual

The null ritual is an invention of statistical textbook writers in the social sciences.

…spearheaded by humble nonstatisticians who composed statistical textbooks for education, psychology, and other fields and by the editors of journals who found in “significance” a simple, “objective” criterion for deciding whether or not to accept a manuscript.

Thus has laziness triumphed and become ingrained in science. Gigerenzer is right: statistics is pagan ritual. And now just as effective as offering sacrifices to a volcano.

Some of the most prominent psychologists of their time vehemently objected…the founder of modern psychophysics, complained about a “meaningless ordeal of pedantic computations.” …one of the architects of mathematical psychology, spoke of a “wrongheaded view about what constituted scientific progress,”…

Not that it mattered. The Wee P-value is triumphant.

Unlike many of his followers, Savage carefully limited Bayesian decision theory to “small worlds” in which all alternatives, consequences, and probabilities are known. And he warned that it would be “utterly ridiculous” to apply Bayesian theory outside a well-defined world—for him, “to plan a picnic” was already outside because the planners cannot know all consequences in advance (Savage, 1954/1972: 16)

Amen again! Decision analysis was pushed far, far past the breaking point years ago. The EPA, and pretty much every other agency that wants to “prove” pre-decided conclusions, never remember that (unknown probability) x (unknown costs) = (who the hell knows what’s best). Instead, scientism and false quantification run amok.

A second version of Automatic Bayes can be found in the heuristics-and-biases research program—a program that is widely taught in business education courses. One of its conclusions is that the mind “is not Bayesian at all” (Kahneman & Tversky, 1972: 450). Instead, people are said to ignore base rates, which is called the base rate fallacy and attributed to cognitive limitations. According to these authors, all one has to do to find the correct answer to a textbook problem is to insert the numbers in the problem into Bayes’ rule—the content of the problem and content-related assumptions are immaterial. The consequence is a “schizophrenic” split between two standards of rationality: If experimental participants failed to use Bayes’ rule to make an inference from a sample, this was considered irrational. But when the researchers themselves made an inference about whether their participants were Bayesians, they did not use Bayes’ rule either. Instead, they went through the null ritual, relying only on the p value. In doing so, they themselves committed the base rate fallacy.

Hilarious.

…an automatic use of Bayes’ rule is a dangerously beautiful idol. But even for a devoted Bayesian, it is not a reality: Like frequentism, Bayesianism does not exist in the singular.

This isn’t so. But Gig and pal think, what is natural, that Bayes means subjective probability. Logical probability does not suffer from singularity. And any statistical method which is part of the Cult of the Parameter must eventually fall to ritual.

We use the term surrogate science in a more general sense, indicating the attempt to infer the quality of research using a single number or benchmark. The introduction of surrogates shifts researchers’ goal away from doing innovative science and redirects their efforts toward meeting the surrogate goal.

Laziness again. It’s everywhere—and government sponsored.

SPSS and other user-friendly software packages that automatically run tests facilitate this form of scientific misconduct: A hypothesis should not be tested with the same data from which it was derived…

A similarly bad practice, common in management, education, and sociology, is to routinely fit regressions and other statistical models to data, report R2 and significance, and stop there

The first point should be shouted at every PhD defense. It is the key—really the only—difference between good and bad science. It is a point so important that you should read it twice. Don’t forgot to visit the Classic Posts page to see the common abuses about regression.

Surrogate science does not end with statistical tests. Research assessment exercises tend to create surrogates as well. Citation counts, impact factors, and h-indices are also “inferential statistics” that administrators and search committees may (ab)use to infer the quality of research. …hiring committees and advisory boards study these surrogate numbers rather than the papers written by job candidates and faculty members.

Did somebody say laziness and pseudo-quantification again? Yes: somebody did.

An even greater danger is that surrogates transform science by warping researchers’ goals. If a university demands publication of X journal articles for promotion, this number provides an incentive for researchers to dissect a coherent paper into small pieces for several journals. These pieces are aptly called just publishable units. Peter Higgs, the 2013 Nobel Prize winner in physics, once said in an interview, “Today I wouldn’t get an academic job. It’s as simple as that. I don’t think I would be regarded as productive enough” (Aitkenhead, 2013). He added that because he was not churning out papers as expected at Edinburgh University, he had become “an embarrassment to the department when they did research assessment exercises” (Aitkenhead, 2013).

Did somebody say laziness and pseudo-quantification *again*, even though he just said it? Yes.

**Update** I forgot to include the popular press article which highlighted the paper: Science is heroic, with a tragic (statistical) flaw: Mindless use of statistical testing erodes confidence in research.

**Update** I also forgot to give you the current status on my book, which talks about all these kinds of things *and* gives a solution. It’s thisclose to being done.

Amen again! Decision analysis was pushed far, far past the breaking point years ago. The EPA, and pretty much every other agency that wants to “prove” pre-decided conclusions, never remember that (unknown probability) x (unknown costs) = (who the hell knows what’s best). Instead, scientism and false quantification run amok.

Combine with (unknown probability) x (unknown benefits), Stir or Shake well and you get the “Precautionary Principle”!

Well, this all has to end somewhere — either as a stagnant heap of mouldering press releases nobody reads or (when the dialectic ultimately generates an antithesis) a more satisfactory way of determining the causes of effects. Predictions?

“The only numbers with no confidence intervals or p values attached were the page numbers.” What a great quote.!

The EPA really doesn’t care about the numbers–it’s just a way to trick people into thinking science is somehow involved. They’d read goat entrails if that got them what they want.

The comments on social science are quite interesting. In my psych courses, statistics were minimal and the class required for a degree was basically only concerned with “p” values, etc. There was no real explanation of how or why or even when the statistics were appropriate. In consumer psych, one of the textbooks was “How to Lie with Statistics” where I at least learned how easy it was to manipulate data and get the answer you needed.

Gary: I don’t think it will be the latter.

I’ve had four graduate-level statistics/research-techniques educational research courses (some time ago), so my question is, in educational research, what is the role of statistics or how can we really know what teacher actions are better than others?

Coast Ranger: I am not a statistician, but my experience is that everyone—parents, students, fellow teachers—“know” who the good teachers are. They also know which teachers are incompetent. But good teaching doesn’t necessarily boil down to “actions” in the classroom. A poor teacher can be schooled in what “techniques” to use, and what “scripts” to follow, but with a poor vessel to start with there are limits to improvements that can be gained. That said, the first mark of a “good” teacher is a mastery of and a passion for their subject matter.

1. There have been major discoveries in the social sciences?

2. otherwise, the claim is utter horsedung. The statistics are there, and were a supporting player in the build-up. “Wee p-values” never were supposed to be the “final” proof. Prediction and replication have that role. “Wee p-values” are falsifiers not provers.

The observation that a group of semi-numerate “scientists” grossly misuse a highly analytic technique doesn’t invalidate the technique. “Wee p-values” only indicate that there is something wrong with the initial hypotheses. (note the plural)

Briggs

Speaking of using and abusing statistics

Have you been following Steve McIntyre’s reanalysis of the “proof” of the Patriots deflating the footballs?

Curious about your thought on the matter

Some insight into the whole mathematics vs. statistics vs. computer modelling business can be found in this old paper:

http://people.physics.anu.edu.au/~tas110/Teaching/Lectures/L1/Material/WEAVER1947.pdf

YOS, such a fine paragraph in your reference:

Briggs, I’m not sure of what the authors meant by

“Newton’s theory of simple cause and effect was replaced by the probabilistic causes in statistical mechanics and, eventually, by quantum theory.”

If you have an equation F=ma, which is the cause and which is the effect?

If you have an equation F= G Msub1 *Msub2 / R^2, which is the cause and the effect?

Newton did not offer an explanation for gravity (“non fingo hypotheses” (sp?))

And the Schrodinger equation of quantum mechanics, like the equations of classical mechanics can be derived from an extremum principle applied to an appropriate Lagrangian–no probabilities required, except in terms of measurement.

Monday, 6 July

I have to update this daily! If I don’t, please remind me.

Just espied this

Newton’s mechanics involved a few bodies and variables that could be modeled by simple mathematical formulas. But

simplicityoften required simplification: point-masses, perfect elasticity, frictionless planes, perfect vacuum, and so on. But this worked well enough for phenomenological laws: simple descriptions of how the bodies behaved.The equations do not identify cause and effect. They are mathematical formalism. The physics does. Newton held that ponderable matter somehow produces gravity. He didn’t know how, but the body was the

causeof the attraction and the attraction was thecauseof motion. His equations don’t identify time as a variable, either; but that hardly means he didn’t think time does not exist. (And we now believe that gravity propagates at a finite speed, so…)The discovery of Coulomb’s law was a stunning confirmation of the Newtonian mechanics (and the atom was promptly imagined as a miniature solar system). So were the laws of luminosity. Inverse square ruled, dude. (It’s also why, with a bit of thought, more Phillie fans appear south and west of the Lawrence Line while more Yankee fans appear north and east.)

But Newton’s mechanics broke down with only three bodies. He never solved the orbit of the Moon, which together with the Earth and Sun constitutes the original 3-body problem. But the same issue appears in general: analytical solutions tend to collapse when the number of terms gets too large.

Statistics could replace mathematics when there were a large enough number of bodies

and they all behaved in essentially the same way.You can replace each of the many bodies with a single “average” body. Hence, thermodynamics and actuarial tables. This isdisorganized complexity.The most intractable problems are those of

organized complexity.This is when, as von Hayek put it in his Nobel lecture:Some of us may recognize “the individual elements of which they are composed” and “the manner in which the … elements are connected with each other” to be a modern resurrection of “matter” and “form.”

Hayek goes on to say:

IOW, neither old-style mathematics and new-style statistics are adequate. And that is how computer modelling entered the picture.

YOS, you’ve said some very interesting things, all worthwhile, but I maintain as a scientific empiricist, that interpretations–e.g. Newton ascribing gravity being caused by mass–are just that. If one goes to a General Relativity view one could say that mass causes the warp in space-time that is represented by gravity. But that is also an interpretation.

It is the equation that gives the information; the equation is descriptive. Whether the equation for a particular situation is susceptible to an analytic solution or has to be “solved” approximately by computer techniques is irrelevant to the fact that the “Cause” of what happens is not given in the fundamental equations. What “causes” bonding in the H2 molecule? What “causes” hydrogen bonding in water? What “causes” the particle passing through a double slit to land in one particular spot on the detecting screen rather than another?

To ask what a cause might be for a situation to be analyzed by physical theory beclouds what really can be learned, in my opinion. (Despite Newton’s interpretation.)

Bob, that puts you squarely in the instrumentalist camp. Laws in this metaphysics are entirely phenomenological. They simply describe what happens in a black-boxy kind of what, but they give us no insight into nature. On the one hand, that is all that is needed to make Useful Stuff. The Scientific Revolution, after all, subordinated Science to Engineering by making Useful Stuff the telos of science, replacing the Appreciation of the Beauty of Nature. But Modern Science is collapsing (along with the rest of the Modern Ages) and a new paradigm is emerging. What it might look like, who knows?

http://joelvelasco.net/teaching/120/cartwright-How_the_Laws_of_Physics_Lie.pdf

Postscript:

The Einsteinian return to a more Aristotelian view actually clarified the Newtonian construct. Newton had postulated a spooky-action-at-a-distance for the action of gravity, but Einstein reimagined it as mass effecting a curvature in the space-time manifold and the bent space would affect the passage of light and other forms of matter. IOW, there was direct “touching” as Aristotle had postulated.

The double-slit experiment has been physically duplicated at the macro scale using a standing wave, so that if we don’t yet know what causes what it may only be because we don’t know yet. The question always sounds like the questioner assumes that “caused” = “predictable.” That’s like saying Newton’s equations don’t predict which apple will fall or when.

YOS, being called an “instrumentalist” by someone who knows something about science, is like my priest calling me a Calvinist. Let me say, that I am a Scientific Agnostic, verging to being an empiricist (as with Bas van Fraassen). I do believe there is a fundamental reality there (unlike Gertrude Stein’s, “there’s no there” said of Oakland), but that reality is veiled, as shown by quantum mechanics.

Gee, YOS, being called an “instrumentalist” by someone knowledgeable in science, is like my priest calling me a Calvinist. I’d describe myself as science agnostic, or Science agnostic, making the upper-case distinction. I lean toward the views of Bas van Fraassen, an empiricist, who says there are no scientific laws, but only descriptive equations and models which fit empirically to observation.

As an addendum to thoughts about the completeness of science, you might be interested in a book by Thomas Nagle, “Mind and Cosmos: Why the Materialist Neo-Darwinian Conception of Nature is Almost Certainly False”. Nagle says that science is inadequate, because it neglect a teleological principle that operates through the “laws of nature”

There’s a free online pdf version (via MIT) of the book at

http://web.mit.edu/philosophy/religionandscience/nagel4.pdf

sorry about the duplicating….something’s strange with website!…the first part wasn’t there, and then I rewrote…sigh….

If anyone is tired of proving to frequentists for million’th time how wrong they are and would rather do something productive like extending the Bernoulli, Laplace, Jeffries, Jaynes viewpoint of the subject, then consider http://www.bayesianphilosophy.com.

Tukey had a saying, “Statistics work is DETECTIVE work!” If only more people had taken that to heart and not made it into judge-and-jury work all the time.

I used to teach P-values, by compulsion, and would often refer to “that great tally sheet in the sky”. Hardly any student tweaked to the implicit sarcasm, but it kept me amused.

I don’t live in the USA, but it seems to me the EPA has nothing on the FDA, who wrote its rules in a way that just about ensured people had to use SAS, at it most egregious. The consequences are not worth contemplating too deeply. ..

YOS, I forgot to ask: do you have a link for that interesting bit about the double slit experiment being reproduced by a standing wave? (One does get diffraction patterns for a traveling passing through a double slit screen–indeed, that’s the standard optical explanation for it…see for example (amongst many other sites listed by Google)

http://www.acoustics.salford.ac.uk/feschools/waves/diffract3.php

Briggs, you say “I also forgot to give you the current status on my book, which talks about all these kinds of things and gives a solution”. I am eager to read it. I want to know what we should use instead of “wee p-values” in clinical trials, how to calculate sample size without using the standard frequentists tools, how to make two groups you want to compare comparable for known and unknown factors without using randomization etc.

We use these tools because we get taught these tools. The bulk of clinicians that use frequentists statistics do not do so because they are evil/stupid, or just to get published! Some or indeed many use them to find out the causes or cures of disease. if we cant use wee p-values, what should we use?

Pop article here: http://www.wired.com/2014/06/the-new-quantum-reality/

Thanks for the reference YOS…I’ve read the article, and as I understand one of the defects of Bohm-Broglie theory, is that it isn’t Lorentz invariant. Another, looking at the article, is that I don’t see how it explains the delayed choice experiment. Is there an analog for the delayed choice experiment in the bouncing droplet model?

Investigating further it does seem to be the case, from the Wikipedia article on the double slit experiment, that no pilot wave analog of the delayed choice experiment has been carried out. Moreover, the article states that no hydrodynamic pilot wave experiment has been carried out to demonstrate entanglement. See

10.1146/annurev-fluid-010814-014506