# Journal Bans Wee P-values—And Confidence Intervals! Break Out The Champagne!

Well, it banned all p-values, wee or not. *And* confidence intervals! The journal of *Basic and Applied Social Psychology*, that is. Specifically, they axed the “null hypothesis significance testing procedure”.

I’m still reeling. Here are excerpts of the Q&A the journal wrote to accompany the announcement.

Question 2.What about other types of inferential statistics such as confidence intervals or Bayesian methods?

Answer to Question 2.Confidence intervals suffer from an inverse inference problem that is not very different from that suffered by the NHSTP. In the NHSTP, the problem is in traversing the distance from the probability of the finding, given the null hypothesis, to the probability of the null hypothesis, given the finding. Regarding confidence intervals, the problem is that, for example, a 95% confidence interval does not indicate that the parameter of interest has a 95% probability of being within the interval. Rather, it means merely that if an infinite number of samples were taken and confidence intervals computed, 95% of the confidence intervals would capture the population parameter. Analogous to how the NHSTP fails to provide the probability of the null hypothesis, which is needed to provide a strong case for rejecting it, confidence intervals do not provide a strong case for concluding that the population parameter of interest is likely to be within the stated interval. Therefore, confidence intervals also are banned from BASP.

Holy moly! This is almost exactly right about p-values. The minor flaw is not pointing out that there is no unique p-value for a fixed set of data. There are many, and researchers can pick whichever they like. And did you *see* what they *said* about confidence intervals? Wowee! That’s right!

They continue:

…The usual problem with Bayesian procedures is that they depend on some sort of Laplacian assumption to generate numbers where none exist. The Laplacian assumption is that when in a state of ignorance, the researcher should assign an equal probability to each possibility…However, there have been Bayesian proposals that at least somewhat circumvent the Laplacian assumption, and there might even be cases where there are strong grounds for assuming that the numbers really are there…thus Bayesian procedures are neither required nor banned from BASP.

Point one: they sure love to say *Laplacian assumption*, don’t they? Try it yourself! Point two: they’re a little off here. But they were just following what theorists have said.

If you are in a “state of ignorance” you can *not* “assign an equal probability to each possibility”, whatever that means, because why? Because you are in a *state of ignorance*! If I ask you how much money George Washington had in his pocket the day he died, your only proper response, unless you be an in-the-know historian, is “I don’t know.” *That* neat phrase sums up your probabilistic state of knowledge. You don’t even know what each “possibility” is!

No: assigning equal probabilities logically implies you have a very *definite* state of knowledge. And if you really do have that state of knowledge, then you must assign equal probabilities. If you have another state of knowledge, you must assign probabilities based on that.

The real problem is lazy researchers hoping statistical procedures will do all the work for them—and over-promising statisticians who convince these researchers they can deliver.

*Laplacian assumption*. I just had to say it.

Stick with this, it’s worth it.

Question 3.Are any inferential statistical procedures required?

Answer to Question 3.No…We also encourage the presentation of frequency or distributional data when this is feasible. Finally, we encourage the use of larger sample sizes than is typical in much psychology research…

Amen! Many, many, and even many times you don’t need statistical procedures. You just *look* at your data. How many in this group vs. that group. Just count! Why does the difference exist? Who knows? *Not* statistics, that’s for sure. Believing wee p-values *proved* causation was the biggest fallacy running. We don’t need statistical models to tell us what happened. The data can do that alone.

We only need models to tell us what *might happen* (in the future).

…we believe that the p < .05 bar is too easy to pass and sometimes serves as an excuse for lower quality research.

Regular readers will know how lachrymose I am, so they won’t be surprised I cried tears of joy when I read that.

We hope and anticipate that banning the NHSTP will have the effect of increasing the quality of submitted manuscripts by liberating authors from the stultified structure of NHSTP thinking thereby eliminating an important obstacle to creative thinking. The NHSTP has dominated psychology for decades; we hope that by instituting the first NHSTP ban, we demonstrate that psychology does not need the crutch of the NHSTP, and that other journals follow suit.

*Stultifying structure of hypothesis testing*! I was blubbering by this point.

—————————————————–

*Thanks to the multitude readers who pointed me to this story.*

Without the crutch, I suspect that “creative thinking” is liable to turn even more speculative. Any suggestions beyond counting lots more noses on how to channel it to improve research? Most of us just want to be able to say that treatment A has some effect B with some assurance that it’s real and we’re not fooling ourselves. The less vague we are in explaining our confidence, the happier we are.

I am also a fan of the Thin Man movies. Still cinema’s best era.

I have no doubt you were in tears! This is a remarkable development! The idea that you can just look at the data is just so revolutionary!

It could spell trouble if your stock-in-trade is teaching people to come up with p-values and confidence intervals. Never fear, “advanced counting” is undiscovered territory.

I really am not certain about statistics any more. Maybe I need a refresher, but I think it’s just that I have a disconnect between Statistics of the 70s and the Statistics of today?

I remember t-tests and Chi-Squared tests and maybe a few variants; but I don’t honestly remember Wee P Values. Did I miss something or was that advanced Statistics. (When I first heard about Wee P, I thought of Bill Paxton in “True Lies” or Pee Wee Herman in general).

How can I put Wee P’s in perspective?

If you remember t-tests and Chi square analysis, then you probably remember significance levels like 0.10.or 0.05…the “wee value of p” is that level of significance…that is,for a p=0.05, if one sampled an infinite number of times from a population whose distribution is as thought, then about 95% of the samples’ confidence intervals would have the real value of that parameter inside them…

you know something like mu+/- 3sigma?

well, at p=0.05, 95% of the sample sets would have the actual value of the parameter inside of their respective CIs

Hope I didn’t bungle that too badly, briggs.

The p-values were used in psychology in the 70’s. I remember calculating them for papers I wrote for classes.

I’m confused: what are the jounal’s proposed alternatives to hypothesis testing or Bayesian analysis? It’s hard looking at a mass of data, even if presented in a non-parametric context (e.g. my favorite, boxplots) to discern what’s happening.

On that Laplacian assumption:

Isn’t that what you do here (just for example)

when you say

That is not the answer, unless you are making an additional, unstated assumption, that all states are equally likely. Without some assumption like that, there is no way to assign any probability.

But I only studied probability theory as a branch of mathematics, and never statistics

per sein depth, so I might be missing something. However, I think I know what the journal editors are getting at with their carping on the “Laplacian assumption”, because I think I’ve noticed people who said they were taking a Bayesian approach do what you did above, and assign probabilities arbitrarily.“The real problem is lazy researchers hoping statistical procedures will do all the work for them—and over-promising statisticians who convince these researchers they can deliver.”

That really is the problem, but unfortunately this has gone beyond lazy researchers to lazy regulators as well. In my professional life I have to justify every finding which has a “wee-p”, irrespective of biological function or significance and woe betide me if I have a non-parametric character which is not subject to classical ANOVA. Regulatory scientists* are leaning on these crutches like no-one else and pretty much force us to use tests which we know are meaningless – often making me feel decidedly uneasy.

* I use the term scientists here in the full knowledge that many may disagree with me.

That is not the answer, unless you are making an additional, unstated assumption, that all states are equally likely. Without some assumption like that, there is no way to assign any probability.You seem to be thinking of probability as a frequency and not as a measure of knowledge level. If the only thing you know is M has three states your level of knowledge of the next state is 1/3.

—

Good for BASP. It’s a sign that psychology could yet become a science.

Dav, I don’t know what this means: “your level of knowledge of the next state is 1/3”. How can a level of knowledge be a number, and how do you decide what number it is?

Lee,

You can’t always assign a number but in this case you can. There are only three possibilities. If this is all you know, then the next state could be any one of them. You really have no choice (because you have no evidence to the contrary) in saying they all have equally likelihood of appearing.

Dav, I think you are expressing the “Laplacian assumption” that the journal editors object to. It seems to me that you do have another choice, the only rational choice you can take when you don’t know and have no way of knowing: say “I don’t know”. The assumption of equal likelihood seems to be arbitrary.

Lee,

We could go round and round on this all day. Substitute “uncertainty in outcome” for “level of knowledge”.You are saying you wouldn’t have any idea how uncertain you are?

Perhaps a thought experiment might help. Instead of machine states suppose they were courses of action and you had to pick one. Maybe it’s a selection of one of three paths that would lead you back to civilization after being lost. Nobody knows you’re lost so no one is looking for you. You can only return to civilization through your own actions. You don’t know which path (if any) is the quickest way. Which one would you pick and why? Apply the same answer to the machine states. There really is no difference between the two problems.

I have to agree with Lee here. It seems that our WmBriggs doesn’t like the “Laplacian assumption,” except when he does. Lee has found one of many cases where Briggs assigns equal probability to possible events with no other knowledge. How this differs from the Laplacian assumption escapes me. I do agree that it’s fun to write Laplacian assumption though.

I think the Journal makes a good point about simply publishing the data, or plotting a distribution. Too often I see small sample size papers where numerous summary statistics are provided when it would be more informative to simply show me the data.

Probability is a measure of certainty — a level of knowledge. If you know only three states are possible isn’t it fair to say your certainty (or conversely uncertainty) in state 1 appearing is equal to that of states 2 or 3? On what basis would you be more certain of one state than the others?

I suppose there is no point in promoting likelihood ratios then? I like to look at data as evidence of some sort and likelihood ratios present the evidence of this versus that well. I really hate the idea of saying “just look at the data.” A lot of bad science goes along with such an analysis.

If we just lay out the data, people can use any method they want to analyze it. This does not say you cannot torture or tease the data in any fashion you want to whatever end you want. It just says the data is the most important thing and it will no longer be truncated, homogenized and otherwise manipulated before publishing. This reduces the tendency of people to just go along with the headline on the paper and never really look at the data. It’s entirely possible that better theories will come out of data that is not presented in a predetermined fashion. Yes, people will have to think. That’s the idea.

Hahaha 😀 Congratulations, however psychology journals banning NHST is something that happened before and if you follow their crowd they will give you more moments of joy.

Ahhh psychologist, those crazy people 😀

Briggs,

Suppose that George on the day he died had empty pockets. That surely implies a lower bound of zero. Now imagine the least valuable coin or note he could have carried. That gives you a discrete interval, but for simplicity just think of $20 gold pieces. The upper limit has to be less than 1000 even with large pockets.

So to an accuracy of +/- $20 we can estimate the value of George’s pockets as somewhere in the interval [0, 20000] and having no other knowledge, after all I’m a Canadian, we should assign each discrete possibility an equal probability in the construction of our prior.

I think the real insight here is if we understand the statement of the problem our knowledge is always > zero. Usually this enables the construction of a prior.

PS I think ETJ might approve of this line of thought:)

Charles West,

Not too bad. I accept the lower bound of 0—assuming, of course, he didn’t have copies of IOUs made out to others. But to “imagine” coins of any kind is to supply very definite positive information. You have moved from ignorance to knowledge.

Now since all probability is conditional on whatever premises are supplied, you can then announce a probability. Your difficulty will be in justifying your list of premise to doubters.

Which is, after all, the real point of science.

This is not

per sea Bayesian act. It treats probability as argument, as a branch of logic.And that is the way to do prob & stats.

Fran,

You’re right. I think it was a medical journal. Can’t recall which. Policy changed back to the mysticism of wee p’s upon change of editor.

Didn’t the rot set in when cheap desktop computing let researchers hunt for significance until they randomly found something they could get away with publishing? Something that would have been too laborious to do with pencil and paper back in the early or mid 70’s.

As DAV mentions I think people are confusing the probability as used to measure the relative frequency of a random variable versus the measure of what action you would take in a situation. Faced with three options and no further knowledge, assigning probability 1/3 to each is equivalent to saying “I don’t know”.

Briggs,

On other interesting news, it seems psychologists have noticed that a lot of accidents occur at road intersections with traffic light intersections, and the editorial board of BASP is now calling for the removal of all traffic lights.

Yes. We are now going to get a lot more useless psychology papers that claim wonderful new findings that are the result of yet even more poorly interrogated data.

If the journal called for more analysis of the data, and for additional different ways of analyzing the data (data plots, charts, tables, different statistical tests) – that would make sense. No test is in reality perfect. Most professional researchers take any finding of statistical significance with a grain of salt. But cut the data a lot of different ways and you get more confidence in the findings. Maybe even require larger samples (interesting question: how do you know if a sample is big enough if you do not do some formal statistics?). But ditching conventional testing standards is just dumb – no matter what business you are in.

Your posts are always fun, but this one is a little reckless.

Keep morale high,

Casey.

Casey,

See the Classic Posts page (linked on top right) to see why hypothesis tests, p-values, and confidence have got to go go go.

It is interesting how many people believe research must by translated, decoded, whatever, to be published. It seems the researchers are supposed to tell us what to think—wait, that’s what happened in climate science and we got such a super dooper great result there. I can’t see how giving the data only could work out to much worse. It does seem science is supposed to tell us how to think, not to encourage us to think, at least looking at the objections in this thread.

Remember, these are journal articles. They are for people who work in the field. I would hope these people would be capable of independent thought. If not, no amount of statistics is ever, ever going to fix that.

Briggs,

Your Classic posts are very interesting, and quite good fun to read. You describe many “truths” about applied statistical analysis. However, your antipathy to classical testing seems more religious than pragmatic.

You might consider this perspective on testing. It is the paradigm that is generally applied by people working in my field of finance.

Consider a patient who gets a blood test and asks her physician if she is in good health.

The doctor has p-value for all the blood components, maybe thirty items. Consider some cases:

1. All the scores are within 95% range of average: Is the patient necessarily healthy?

2. Some of the scores fall outside the 95% range of average: Is the patient necessarily unhealthy?

3. What other things might the physician consider before making a diagnosis?

4. What other tests might the physician undertake before giving a diagnosis?

Well, the answers to those questions might be long and complicated. But the point is this: the initial p-values contain information, but usually not enough information to definitively resolve a diagnosis.

Equally, the reporting of classical statistical tests in research papers contains diagnostic information.

Throwing away the p-values on blood tests does not help the diagnosis, even when such blood tests (for all sorts of reasons) are not especially reliable or meaningful. The blood tests contain information!

The challenge in published research is to ensure that analysts follow the strategy of our hypothetical physician. They need to ask more questions, do more tests, and consider all aspects of the problem. Then, either formally or informally, act Bayesian to formulate a conclusion.

The “bitch” of a lot of published research is not that it reports conventional statistics, but rather that the analysis stops at the t-stats.

Keep morale high.

Casey

Forest,

Don’t confuse rousing rhetoric with religion.

Anyway, which p-value? I know, I know. A trick question. Doesn’t matter which you choose, it is always misleading. And it’s a false dichotomy to suggest if we throw away p-values we have no information. There is plenty and

superiorinformation without them.It’s also not a dichotomy between Bayes and frequentist. There is a third way.