# Bem, ESP, And Hypothesis Testing

Daryl Bem, Cornell professor of psychology, once again believes he has proven the validity of ESP (extra-sensory perception). Bem is a long-time researcher of the paranormal who once found notoriety by gluing ping-pong ball halves to people’s eyes.

Yes, and on these ping-pong balls, he shined a red light. And in the ears of those so afflicted, he piped a gentle hiss. This was to create the *ganzfeld*, or “total field”, a state sort of like sensory deprivation in which a person would be maximally open to psychic vibrations (it’s always vibrations).

Bem used statistics, p-values in particular, to prove that the *ganzfeld* worked. Trouble arose when other workers tried to replicate Bem’s success. None could, and the *ganzfeld* was abandoned. (I write more about this in my *So You Think You’re Psychic?*)

Now Bem is back with with a new—*peer-reviewed*—paper in *The Journal of Personality and Social Psychology*. Once more Bem is able to display wee p-values in support of his theory that people can see, as through a glass darkly, a short distance into the future. (We can discuss the specifics of this kind of ESP at another date.)

It is true Bem’s p-values are all publishable, but it is not true that p-values are what he thinks they are. But Bem only makes the same mistakes that plague those who rely on frequentist statistical methods. These misperceptions are so rife that even the *New York Times* has noticed them, using Bem’s paper to discuss “one of the longest-running debates in science.” (Thanks to Bob Ludwick for the link.)

The paper quotes Jim Berger (Duke, one of the men responsible for the Bayesian revolution) as saying, “I was on a mini-crusade about this 20 years ago and realized that I could devote my entire life to it and never make a dent in the problem.” I know exactly how he feels. In my own class, I teach Bayesian and frequentist methods, emphasizing Bayesian. By the time we arrive at frequentist methods, students are already skeptical of frequentism because of the hints I have given about it.

After I lay the theory out, and lay it out fairly, students become especially wary after they learn the *precise* definitions of p-values, confidence intervals, and so forth. But then comes the show-and-tell portion of the class, where we touch actual data. Before I let them have at it, I give them this favorite speech:

Even though you know the proper definition of a p-value—how it tells you nothing about what you really want to know, how it is silent on whether your hypothesis is true or false, how it is mute on whether your model is appropriate or not—you will not be able to resist it. When you see a publishable (less than the magic number of 0.05) p-value you too, like everybody else, will not be able to help yourselves. You

willbelieve you have proved your theory. You will be unable to ignore the call of the p-value.

Right after this, we launch into regression examples in which the software spits out tables of, *inter alia*, p-values. By the second or third iteration, students are already pointing at their screens saying, “Why can’t I keep this variable? The p-value is low.” And when they are describe their data they readily slip into the same kind of inappropriate causative language Bem does.

Solution? Eliminate teaching of frequentist statistics to all but specialists, mathematical Masters and PhDs and so forth. Do not expose undergraduates in any field, and graduates in non-mathematical fields, to the ideas of p-values, confidence intervals, or hypothesis testing.

These tools have been in the hands of scientists for nearly a century, and all experience has shown that they are subject to regular, even ritualized abuse. It is far too easy to “prove” what isn’t so using frequentist methods; at the least, the answers this form of statistics gives are not in response to the questions asked by researchers.

As the *Times* says (and correctly), “a team of statisticians led by Leonard Savage at the University of Michigan showed that the classical approach could overstate the significance of the finding by a factor of 10 or more.”

Switching to Bayesian statistics as the standard won’t eliminate biases and mistakes—no statistical procedure can—but it will reduce a vast amount of over-certainty.

I apologize for the likely stupidity of this question, but Briggs, do you suppose that statistics are employed in science where they have no business? I’m thinking of analyses where the hypothesis and any method of statistical appraisal can have no connection with each other, not analyses where there might be a rational way to do the statistical analysis and the doer has simply screwed it up.

“You will be unable to ignore the call of the p-value.”

That is so true. I have a statistics book in which the author correctly explains in the introduction that confidence intervals have to do with repeated experiments, then later in the book he procedes to use confidence intervals to make a probability statment about a parameter. It’s like he forgot what he said in the introduction.

Ray,

I should have better phrased it:

Your author is one of a legion of like authors. The problem is so rife that I have never seen an error-free applications paper. By this, I meant the interpretation is always wrongly stated. Of course, the authors of these papers might be right in what they are saying, but my point is that they are not allowed to say what they do if they hold with the philosophy behind the methods.

I keep opening up “studies” hoping that I will find one where the significance couldn’t be chalked up to just “chance”. They seem to all have forgotten one of the first lessons we get in statistics. At least it is one of the early probability problems presented.

Q: If I roll a dice seven times what is the probability that I will get at least one 6.

The chance of getting a six on one roll (using a non-weighted die of course) is 1/6.

Dr. Briggs is probably one of the few who didn’t first attempt to sum up the probabilities. 1/6+1/6+1/6… only to realize that probabilities . 100% were somehow NOT CORRECT and went straight to the correct method of 1- 5/6^7.

In all the studies of a health nature, they seem to focus on the 1/6 and not the 5/6^n.

I have become almost cynical to anything with the word “study” in it.

Someone recently did a study of Monopoly and which properties to buy to maximize your chance of winning. Apparently Orange are the properties that will most make you win. A strategy that will most likely make you lose. Probability cuts both ways. Owning orange may help you win, owning any monopoly is going to help you win sooner. Keeping others from having a monopoly by owning properties that break up the potential for a monopoly will help you. Focussing solely on getting orange will almost guarantee that you won’t win because you will miss out on buying the properties you do land on. My strategy for monopoly is buy everything you land on and see where the dice take you.

The statistics presented in the Monopoly discussion are worth considering pragmatically. The concept of the GO TO JAIL space being a a sink and the JAIL space being a source is actually quite useful. You can apply that to other situations in life and place your bets accordingly. A Starbucks on the entrance to a freeway is doing just that.

I do love the quote “resist the call of the p-value”.

MLMs are sort of similar. Present an opportunity to a business man of being a supplier to an MLM and their eyes will light up.. They know it is a scam, but they can’t resist trying to reap money from the unwashed.

It is too easy to calculate a regression. Everyone has Excel. It doesn’t matter that no one knows how to properly set-up the data. Advanced software packages will run non-linear regregressions, etc. that allow for more severe data manipulation.

OK, you’ve piqued my interest. I am (obviously) no statistician. I want to understand Bayesian vs Frequentist at a very simple level, sufficient to explain to executives and pre-school kids. Not much difference between the two sometimes 🙂

Is the following a valid extension to my oversimplified application of uncertainty to a measurement model?

1) Hypothesis: siblings’ ages can be modeled as: Age(Brother1) = N * Age(Brother2)

2) I initially estimate 90% probability this is true

3) I measure and find Age(John)=4.0, Age(Pete)=2.0, so Age(John) = 2.0 * Age(Pete)

At this point, a Frequentist would say I have one valid data point in support of my hypothesis. Not enough data to publish but things are looking good. A Bayesian would say this data increases the probability that my hypothesis is correct.

4) I measure again and find Age(John)=4.1, Age(Pete)=2.1, so Age(John)=1.95*Age(Pete)

A Frequentist would say I need to adjust the uncertainty of my data measurements and/or model parameter, but within the uncertainty bounds, my hypothesis is still supported. A Bayesian would say this data decreases the probability that my hypothesis is correct.

…and so forth.

Is that anywhere CLOSE to correct?

Thanks!

I KNEW IT!!! I knew they were going to prove ESP is real…

Jim Berger … â€œI was on a mini-crusade about this 20 years ago and realized that I could devote my entire life to it and never make a dent in the problem.â€

Thanks, that’s comforting. Early in my PhD (a million years ago) my supervisor suggested that I might work on introducing Bayesian statistics into our field of science. Instead I spent my time on experimental work. Perhaps I was right.

The statistical genie cannot be put back in the bottle. You cannot now not teach “the classical approach” because almost all of the literature up until uses it, and not teaching it would leave non-mathematicians without any chance to understand the foundational statistics of their field. We can and should make bayseian analysis the standard, but for the next few generations frequentist statistics will have to be taught as well.

Long time ago I realised that AGW was a hoax and Armstrong had used drugs, is that ESP?

Maybe I should have posed it another way.

Can scientific studies whose results at first seem to require the application of statistics ever be reconstituted to be binary? In other words, can studies be devised to either prove the premise or fail to prove it? The thing I’m pondering is that the resort to creative statistical analysis may substitute for not having spent enough time refining the “question” to one answerable yea or nay.

Once again I am pleased and gratified not to be considered a peer of that bunch.

Hmmm, take a look at this articel in the New Yorker:

http://www.newyorker.com/reporting/2010/12/13/101213fa_fact_lehrer#ixzz1BYjefYnF

A case in point?

The results from “studies” are not being replicated in broader application and – at least part of – the explanation is selection bias in the original study subjects.

In a long discussion yesterday with my wife (Phd in Economics) it seemed to me that this was a major part of the improvements a bayesian approach had – taking account of sampling errors in observational studies. I didn’t think this would have the same impact on experimental studies because my particular field (molecular biology) allowed for experiments under highly controlled conditions (and using a few billion identical bacteria takes away a lot of the sampling issues).

But as soon as we try to apply this in almost any real-world setting, sample selection issues become critical. Plants are pretty good (I can work with genetically homozygous lines and control conditions in a test site), but how does my selection of a test site (or test conditions) affect the size of my treatment effect? How many different test sites should I use to address this and who decides if I have chosen those test sites free of any bias?

At the same time, as a non-mathematician (who only went into molecular biology because it seemed like a good way to avoid doing stats; hey, the bacteria grows or it doesn’t – who needs stats?), the thought of using bayesian methodology scares the pants off me. This goes beyond just needing to get specialist help to design the experiments and analyse the results. I can also see a massive increase in testing costs for regulatory reviews (my current area of application), without any real increase in safety, if the agencies were to insist on using a bayesian approach. Our current highly risk averse (product development) culture could get tied up in even bigger knots.

Sorry to be so pessimistic – I really appreciated this posting and it has led to some excellent discussion and – for me – a better understanding of frequentist stats as well. I am just not too sure where it leaves me…..