William M. Briggs

Statistician to the Stars!

Search results: "wee p-values" (page 1 of 48)

Gonorrhea, Wee P-values, and Tax Increases


Adventurous reader Ted Poppke discovered a peer-reviewed paper that, according to the American Enterprise Institute (AEI), proved that increasing sales tax on booze “caused a 24% decrease in gonorrhea cases reported to the U.S. National Notifiable Disease Surveillance System, but had no effect on chlamydia.”

Caused. Strong word! The strongest there is in science. When a scientist says X causes Y, he has reached the pinnacle of Y-studies, for once we have learned the cause of Y, we have learned what most of what science can tell us of Y. Discovering cause is thus a terrible burden. Unfortunately, in many fields discovering a wee p-value has taken the place of discovering true cause, the consequences of which I detail in my just-in-time-for-Labor-Day Uncertainty: The Soul of Modeling, Probability & Statistics.

The paper is “Maryland Alcohol Sales Tax and Sexually Transmitted Infections: A Natural Experiment” in American Journal of Preventive Medicine by Stephanie A.S. Staras, Melvin D. Livingston, and Alexander C. Wagenaar. From the paper’s beginning:

Alcohol tax increases may decrease sexually transmitted infection rates overall and differentially across population subgroups by decreasing alcohol consumption in general and prior to sex, thus decreasing sexual risk taking and sexually transmitted infection acquisition…

Results strengthen the evidence from prior studies of alcohol taxes influencing gonorrhea rates and extend health prevention effects from alcohol excise to sales taxes. Alcohol tax increases may be an efficient strategy for reducing sexually transmitted infections.

To say “Alcohol tax increases may decrease sexually transmitted infection rates” is to invoke causal language. Somehow increasing the amount the government collects on bottles of beer will cause people who would have otherwise contracted gonorrhea to not contract gonorrhea.

Before examining the paper, think how this assertion can be proved. At least one man (or woman) who would have got gonorrhea when the sales tax was low would not have got it when the sales tax is high. How could raising a sales tax cause the absence of gonorrhea where that same gonorrhea would have necessarily been present under the low sales tax?

Obviously, a sales tax rate has no causative powers on gonorrhea, so the contention fails immediately. But the sales tax might have caused some other thing or things to happen, like setting off a chain of dominoes, which blocked the gonorrhea from setting up. And this is what the paper implies. A man who would have drank to excess and engaged in sex (or sex-like activities) with an infected woman (or vice versa) will now be stopped from drinking that little bit extra he would have under a cheaper tax, with the consequence he’ll now retain enough judgment to realize beer googles blur. Perhaps he’ll read a book instead.

Now the only way to tell this for sure is to run experiments on actual men and women, raising the tax for some, lowering it for others. But even this is dodgy, because we’ll always be left with a counterfactual question. Would this man had the sales tax been lower contracted the gonorrhea he safely avoided tonight when the tax was high? How can we ever know this? Answer: we cannot: we can only assume it.

This bewildering point is belabored and bothered to emphasize the over-certainty which is caused when mere correlations in external datasets take the place of (even imperfect) experiments. What the authors did was to collect gonorrhea and chlamydia rates before Maryland’s tax increase, and then collect them again after. Enter the statistical model, a regression-like creation with parameters for the state’s general tax rate, the state’s alcohol tax rate, a rate of gonorrhea and chlamydia infections for times and places other than in Maryland under the new rate, and an “ARIMA noise model”, which we can ignore.

The p-value associated with the alcohol tax rate parameter (under various manipulations) was wee for gonorrhea and not wee for chlamydia. What about parameter for rates of other sexually transmitted diseases? They didn’t check; or if they did, they remained silent about them (I’m guessing they didn’t check).

From the p-value weeness, they concluded “A 2011 Maryland 50% increase in alcohol-specific sales tax decreased statewide gonorrhea rates by an estimated 24%—preventing nearly 1,600 gonorrhea cases annually.”

Mighty bold claim. All gleaned from mixing databases and calculating a parameter inside a dicey statistical model.

The kicker was noticed by the AEI:

However, Staras et al. do not establish that alcohol consumption decreased as a result of the sales tax increase. In fact, the National Institute on Alcohol Abuse and Alcoholism, in Haughwout and colleagues, estimates that annual alcohol consumption per capita in Maryland increased by 0.03%, from 2.2058 gallons of ethanol in 2010 to 2.2065 gallons of ethanol in 2012 for people aged 14 years and older (the group studied by Staras et al.).

This too is indirect evidence, because we don’t know if some who would have drank more drank less because of the tax increase, or those who would have drank even still more still drank enough. And so on.

Not for the last time I ask all to abjure all hypothesis tests.

Is Young Fatherhood Causally Related To Midlife Mortality? Wee P-values Say Yes!

Table 1 from the paper.

Table 1 from the paper.

The title of today’s post is culled from the peer-reviewed paper “Is young fatherhood causally related to midlife mortality? A sibling fixed-effect study in Finland” by Elina Einiö and two others in the Journal of Epidemiology and Community Health. The question the title poses is answered “Yes.”

What is it with these academic attacks on marriage? Last week having a kid was said to be worse than death. “Having A Kid Worse Than Divorce Or Death? Wee P-values Say Yes.” Yesterday another academic asked, “Two thirds of married people admit to or desire an affair. Is it time to rethink sexual morality?” She says yes. (At least she didn’t abuse any p-values.) And everything about this making-the-rounds “Tinder-hookup culture” article is depressing.

What is striking in today’s academic foray is the word “causally” in the title. Wee p-values are being used to claim a causal relationship, which is exactly the wrong thing to do (video). It might be true that having kids young kills men, but proving it via wee p-values cannot be done.

It also sounds preposterous to claim that young men fathering children kills those men. That’s, after all, what “mortality” means. Kills. Rubs out. Knocks on the head. Sends to the Great Beyond. Being “at risk” for mortality because of having kids early means that the act of fathering is somehow killing some men.

Now this study looked at a sample of Finns. The authors followed men from age 45 to death or age 54, at which time they were “censored” (in the usual statistical way). All-cause mortality or censoring was the end point.

That’s a common approach, but it’s a silly one. Bus runs over a man, which is a cause of death (no p-value needed to confirm). If that man fathered a child young, in this database it was counted against him as a death not being caused by the bus but because he was a young father. In order for this to be true, it has to be that this young father walked (or was pushed? or slipped? it’s Finland, after all) in the bus’s path because he had a kid before gray hair set in. That sound plausible?

No. It doesn’t sound impossible, of course. But it is implausible. Especially when you consider the same thing can and must be said for every other “mode” of death. And listen, since it is obvious that the authors are wrong and that young fatherhood in and of itself isn’t killing men, we’re not after direct causes, but causes of the cause of the death. A cause of the cause of death in the bus example is that a young father was forced to take a bus to work because he was a young father.

How could this happen? Well, I don’t know, but that is, of course, no proof that it cannot.

Here’s the conclusion: “Men who had their first child before the age of 22 or at ages 22–24 had higher mortality as compared with their brothers who had their first child at the median or mean age of 25–26.” Smells like an arbitrary cut-off, no? Like maybe, just perhaps—I make no accusation—that ages were played with until a wee p-value from the model came forth.

But this is ungenerous. Nobody really hunts for wee p-values, right? The real story is in their Table 1, which is reproduced at the top of this post.

More (but not all) fathers under 22 had only “basic” education, more were unmarried, more were divorced than older fathers. This suggests it’s not so much fatherhood which killed the 6.6% of the young men, but other activities. What might these be?

We have no idea, at least, no idea from this data set. For, you (don’t) see, the authors never examined the stated or measured cause of death in any case. They should have—but that’s too much work!

This is a very important point, which we must repeat. Something caused each young and each old father to die (of those who died, naturally). If we say it is young fatherhood that is killing some young men, then it must be something else that killed the old fathers. What was that or what were those causes? Why and how did they differ?

The problem with classical statistical analysis is that it substitutes formulaic manipulation for hard work and hard thinking. And it’s a lousy substitute.

What is needed is (A) proper understanding that statistics can’t prove cause, and more importantly than anything else (B) a new (old)—dare we say a third?—way of doing analyses.


Thanks to KA Rogers for alerting us to this article.

Having A Kid Worse Than Divorce Or Death? Wee P-values Say Yes

This woman is obviously sadder than she would have been had her husband died after divorcing her.

This woman is obviously sadder than she would have been had her husband died after divorcing her.

When I first ascend to Emperor, after throwing into the dungeon any within earshot who cannot speak a full sentence without using ‘like’, my first act will be to create a year-long moratorium on all science publishing.

I’ll do this out of kindness. The system is rigged to tempt people beyond endurance to write papers that are either (A) nonsense or (B) what everybody already knows re-packaged as “research.” This must be stopped because it is having a terrible effect on the sanity of the nation.

As proof, I offer the peer-reviewedParental Well-being Surrounding First Birth as a Determinant of Further Parity Progression” by Rachel Margolis and Mikko Myrskylä in the journal Demography, a paper which was announced by the Washington Post with the headline, “Parenthood is worse than divorce, unemployment — even the death of a partner“.

The Post’s headline is possibly the result of insanity; it is certainly nonsense.

Margolis and Myrskylä begin their Abstract with: “A major component driving cross-country fertility differences in the developed world is differences in the probability of having additional children among those who have one.”

This is false. Probability, in differences or straight up, doesn’t drive anything. Probability isn’t a cause, and neither can statistical models discern cause. Some thing or some things caused each couple to have each child. Probability won’t be one of these things.

But could a cause of not having more children be the dissatisfaction that arose from having previous children?

Yes; yes, of course. So obvious is this “yes” that we haven’t any need, unless we’re an academic forced to publish, to “study” the question.

It was “studied”, however, by Margolis and Myrskylä. Sort of. The pair looked at already existing data from a thing called the “German Socio-Economic Panel Study” containing answers given by folks to questions designed for other purposes. “We include in our analytical sample individuals whom we observe from three years before a first birth through at least two years after the first birth”. Only those couples who had kids were examined. Some of these couples (about 58%) had a second or third after the first.

The main outcome was having a second child paired with this question (which, again, had nothing per se to do with child-rearing):

Respondents were asked annually, “How satisfied are you with your life, all things considered?” Responses range from 0 (completely dissatisfied) to 10 (completely satisfied).

I wonder why they didn’t start the scale at -17? And go to 5 x π? Any time you see numbers put to non-numerical things (like attitudes) you know you’re in for a rough ride. Anyway, this data is massaged further, mostly by calling the answer given on this question “well being” and then pretending it was well-being.

Then they do this: I know it’s long, but please read it:

[1] We measure levels of subjective well-being over the transition to parenthood, measured from two years before a child is born until the year after a first birth…

[2] We capture the gain in well-being in anticipation of a first birth. First, we calculate a baseline level of life satisfaction for each respondent by averaging their life satisfaction level for three, four, and five years before a first birth. Then we sum deviations from this base level for the period two years before, one year before, and the year of first birth.


[3] We calculate the size of the drop in subjective well-being around a first child’s birth. We measure the difference between the maximum level of life satisfaction before a child is born (from two years before the birth through the year the child’s birth is reported) and the minimum level of life satisfaction after the birth (measured in the year the child is reported and the year after the birth is reported). This is a continuous measure that ranges from 0 if there is no drop or a gain, to 9, the maximum drop we observe in the data. This measure captures the issues raised by new parents who reported that the most common high is just before or just after the child arrives and that the most common low is during the first year after birth.

Got it? Non-numerical things assigned numbers and then the numbers are manipulated in a quirky way…and finally submitted to an unjustifiable statistical model! Cox proportional hazards regressions (see their p. 1154) with linear and interactive effects. Because of course these things are additive to understanding the probability of having a second kid or not.

The results are presented as if the mean of a group applied to everybody in that group. That’s how the Washington Post took it. This unfortunate practice is so common that it’s not even seen as the problem it is. “Those who have a second birth gained more in life satisfaction around the time of a first child’s birth than those who stayed at parity…”

The Discussion begins, “A standing puzzle in demography is why fertility in many developed countries is so far below replacement level.” Puzzle? The answer is people aren’t having kids. Why? Contraception, abortion, this kind of nonsense, narcissism, both parents having jobs and chasing money, and all the rest which everybody already knows about.

I gave up and wished these authors would have done their study in the way I suggested yesterday. By making and verifying actual predictions.

Journal Bans Wee P-values—And Confidence Intervals! Break Out The Champagne!

I'll drink to that!

I’ll drink to that!

Well, it banned all p-values, wee or not. And confidence intervals! The journal of Basic and Applied Social Psychology, that is. Specifically, they axed the “null hypothesis significance testing procedure”.

I’m still reeling. Here are excerpts of the Q&A the journal wrote to accompany the announcement.

Question 2. What about other types of inferential statistics such as confidence intervals or Bayesian methods?

Answer to Question 2. Confidence intervals suffer from an inverse inference problem that is not very different from that suffered by the NHSTP. In the NHSTP, the problem is in traversing the distance from the probability of the finding, given the null hypothesis, to the probability of the null hypothesis, given the finding. Regarding confidence intervals, the problem is that, for example, a 95% confidence interval does not indicate that the parameter of interest has a 95% probability of being within the interval. Rather, it means merely that if an infinite number of samples were taken and confidence intervals computed, 95% of the confidence intervals would capture the population parameter. Analogous to how the NHSTP fails to provide the probability of the null hypothesis, which is needed to provide a strong case for rejecting it, confidence intervals do not provide a strong case for concluding that the population parameter of interest is likely to be within the stated interval. Therefore, confidence intervals also are banned from BASP.

Holy moly! This is almost exactly right about p-values. The minor flaw is not pointing out that there is no unique p-value for a fixed set of data. There are many, and researchers can pick whichever they like. And did you see what they said about confidence intervals? Wowee! That’s right!

They continue:

…The usual problem with Bayesian procedures is that they depend on some sort of Laplacian assumption to generate numbers where none exist. The Laplacian assumption is that when in a state of ignorance, the researcher should assign an equal probability to each possibility…However, there have been Bayesian proposals that at least somewhat circumvent the Laplacian assumption, and there might even be cases where there are strong grounds for assuming that the numbers really are there…thus Bayesian procedures are neither required nor banned from BASP.

Point one: they sure love to say Laplacian assumption, don’t they? Try it yourself! Point two: they’re a little off here. But they were just following what theorists have said.

If you are in a “state of ignorance” you can not “assign an equal probability to each possibility”, whatever that means, because why? Because you are in a state of ignorance! If I ask you how much money George Washington had in his pocket the day he died, your only proper response, unless you be an in-the-know historian, is “I don’t know.” That neat phrase sums up your probabilistic state of knowledge. You don’t even know what each “possibility” is!

No: assigning equal probabilities logically implies you have a very definite state of knowledge. And if you really do have that state of knowledge, then you must assign equal probabilities. If you have another state of knowledge, you must assign probabilities based on that.

The real problem is lazy researchers hoping statistical procedures will do all the work for them—and over-promising statisticians who convince these researchers they can deliver.

Laplacian assumption. I just had to say it.

Stick with this, it’s worth it.

Question 3. Are any inferential statistical procedures required?

Answer to Question 3. No…We also encourage the presentation of frequency or distributional data when this is feasible. Finally, we encourage the use of larger sample sizes than is typical in much psychology research…

Amen! Many, many, and even many times you don’t need statistical procedures. You just look at your data. How many in this group vs. that group. Just count! Why does the difference exist? Who knows? Not statistics, that’s for sure. Believing wee p-values proved causation was the biggest fallacy running. We don’t need statistical models to tell us what happened. The data can do that alone.

We only need models to tell us what might happen (in the future).

…we believe that the p < .05 bar is too easy to pass and sometimes serves as an excuse for lower quality research.

Regular readers will know how lachrymose I am, so they won’t be surprised I cried tears of joy when I read that.

We hope and anticipate that banning the NHSTP will have the effect of increasing the quality of submitted manuscripts by liberating authors from the stultified structure of NHSTP thinking thereby eliminating an important obstacle to creative thinking. The NHSTP has dominated psychology for decades; we hope that by instituting the first NHSTP ban, we demonstrate that psychology does not need the crutch of the NHSTP, and that other journals follow suit.

Stultifying structure of hypothesis testing! I was blubbering by this point.


Thanks to the multitude readers who pointed me to this story.

« Older posts

© 2017 William M. Briggs

Theme by Anders NorenUp ↑