William M. Briggs

Statistician to the Stars!

Page 152 of 598

Johnson’s Revised Standards For Statistical Evidence

Valen Johnson: Notice how his shirt matches his hair.

Valen Johnson: Notice how his shirt matches his hair.

Thanks to the many readers who sent me Johnson’s paper, which is here (pdf). Those who haven’t will want to read “Everything Wrong With P-values Under One Roof“, the material of which is assumed known here.

Johnson’s and Our Concerns

A new paper1by Valen Johnson is creating a stir. Even the geek press is weighing in. Ars Technica writes, “Is it time to up the statistical standard for scientific results? A statistician says science’s test for significance falls short.” Johnson isn’t the only one. It’s time for “significance” tests to make their exit.

Why? Too easy, as we know, to claim that the “science shows” the sky is falling. Johnson says the “apparent lack of reproducibility threatens the credibility of the scientific enterprise.” Only thing wrong that sentiment is the word “apparent.”

The big not-so-secret is that most experiments in the so-called soft sciences, which—I’m going to shock you—philosopher David Stove called the “intellectual slums”, are never reproduced. Not in the sense that the exact same experiments are re-run looking for similar results. Instead, data is collected, models are fit, and pleasing theories generated. Soft scientists are too busy transgressing the boundaries to be bothered to replicate what they already know, or hope, is true.

What happens

I’ve written about how classical (frequentist) statistics works in detail many times and won’t do so again now (see the Classic Post page under Statistics). There is only one point to remember. Users glop data into a model, which accommodates that data by stretching sometimes into bizarre shapes. No matter. The only thing which concerns anybody is whether the model-data combination spits out a wee p-value, defined as a p-value less than the magic number.

Nobody ever remembers what a p-value is, and nobody cares that they do not remember. But everybody is sure that the p-value’s occult powers “prove” whatever it is the researcher wanted to prove.

Johnson, relying on some nifty mathematics which tie certain frequentist and Bayesian procedure together, claims the magic number is too high. He advises a New & Improved! magic number ten times smaller than the old magic number. He would accompany this smaller magic number with a (Bayesian) p-value-like measure, which says something technical, just the like p-value actually does, about how the data fits the model.

This is all fine (Johnson’s math is exemplary), and his wee-er p-value would pare back slightly the capers in which researchers engage. But only slightly. Problem is that wee p-values are as easy to discover as “outraged” Huffington Post writers. As explained in my above linked article, it will only be a small additional burden for researchers to churn up these new, wee-er p-values. Not much will be gained. But go for it.

What should happen

What’s needed is not a change in mathematics, but in philosophy.

First, researchers need to stop lying, stop exaggerating, restrain their goofball stunts, quit pretending they can plumb the depths of the human mind with questionnaires, and dump the masquerade that small samples of North American college students are representative of the human race. And when they pull these shenanigans, they ought to be called out for it.

But by whom? Press releases and news reports have little bearing to what happened in the data. The epidemiologist fallacy is epidemic. Policy makers are hungry for verification. Do you know how much money government spends on research? Scientists are people too and no better than civilians, it seems, at finding evidence contrary to their beliefs. Though they’re much better at confirming their opinions.

This is all meta-statistical, i.e. beyond the model, but it all affects the probability of questions at hand to a far greater degree than the formal mathematics. (Johnson understands this.) The reason we given abnormal attention to the model is that it is just that part of the process which we can quantify. And numbers sound scientific: they are magical. We ignore what can’t be quantified and fix out eyes on the pretty, pretty numbers.

Second: remember sliding wooden blocks down inclined planes back in high school? Everything set up just so and, lo, Newton’s physics popped out. And every time we threw a tiny chunk of sodium into water, festivities ensued, just like the equations said they would. Replication at work.

That’s what’s needed. Actual replication. The fancy models fitted by soft scientists should be used to make predictions, just like the models employed by physicists and chemists. Every probability model that spits out a p-value should instead spit out guesses about what data never2 seen before would look like. Those guesses could be checked against reality. Bad models unceremoniously would be dumped, modest ones fixed up and made to make new predictions, and good ones tentatively accepted.

“Tentatively” because scientists are people and we can’t trust them to do their own replication.

The technical name for predictive statistics is Bayesian posterior predictive analysis, where all memories of parameters disappear (they are “integrated out”). There are no such things as p-values or Bayes factors. All that is left is observables. A change in X causes this change in the probability of Y, the model says. So, we change X (or looked for a changed X in nature) and then see if the probability of Y accords with the actual appearance of Y. Simple!

This technique isn’t used because (a) the math is hard, (b) it is unknown except by mathematical statisticians, and (c) it scares the hell out of researchers who know they’d have far less to say. Even Johnson’s method will double current sample sizes. Predictive statistics requires a doubling of the doubling—and much more time. The initial data, as before, is used to fit the model. Then predictions are made and then we have to wait for new data and see if the predictions match.

Right climatologists? Ain’t that so educationists? Isn’t this right sociologists?

Caution: even if predictive statistics are used, it does not solve the meta-statistical problems. No math can. We will always be in danger of over-certainty.

——————————————————————-

1Actually a summary paper. See his note 21 for directions to the real guts.

2This is not cross validation. There we re-use the same data multiple times.


17 Comments

Briggs Exposes His Better Side

Cool breeze.

Cool breeze.

These are the pants I wore yesterday through airports, airplanes, taxis, and the streets of San Francisco, thus exposing my better side to roughly half the country.

Notice that the tear begins at the belt and continues half way down the thigh.

I got out of the taxi and stood on a corner tucking my receipt away. A gentleman sidled up next to me and gave me the old up-and-down. This had the feeling of a touch (this is, after all, San Francisco), so I curtly nod back and scurry forward. The guy falls in behind me, matching my pace.

We met again at the next corner waiting for the light. Weak smiles exchanged. I pushed forward, he followed. Two blocks later he’s still there. I pass by a bar which is playing, I swear, Glenn Miller’s In the Mood. My kind of place.

I stood to peruse the bill o’ fare thinking of having a cold one and the guy is now forced to walk ahead of me, which he did. But only four or five paces, after which he searched his pockets until he found a piece of paper which so fascinated him he didn’t move. He stole a glance or two back at me.

Ahead, some bus or car honked loudly, and we both looked up. His attention was off me, so I slid around the corner, happy in my ruse. If he followed now, I figured, it would be too obvious.

He didn’t.

It wasn’t until about two hours later when I went to change that I noticed the gaping chasm and realized the gentleman was trying to find a polite way to tell me. What could he say? “Excuse me sir, you have a hole the size of the Grand Canyon on your posterior.”


11 Comments

Racist Researchers, The Racists, Accuse Gun Owners Of Racism, You Racist

That kick-butt guy in the black wig is me, kung fu-ing the stuffing out of asinine research. Listen to the klaxon clang as evildoers discover we are among them!

Or something.

Don’t know about you guys, but I grow weary of studies like “Racism, Gun Ownership and Gun Control: Biased Attitudes in US Whites May Influence Policy Decisions” by Kerry O’Brien and others. The product of lazy scientists relying on re-re-reanalyzing a set of data.

The set of data arose from a bunch of questions they thought sounded cool—this is how most sociology is conducted—while our crew asked what would happen if they dumped a select few of those questions into a statistical chopper. Wee p-values were ejected. Theories were generated. It’s all so tiring.

In 2008-2009, over four thousand folks were asked if they owned a firearm. 565 said yes, 615 said no; which means three thousand people weren’t asked, or refused to answer, or whatever. Would you tell some white-coated stranger if you had a gun? Depending on the googleyness of his eyes, I might say anything from no to “You better believe it and it’s pointed at you.”

How did our lazy research team account for this kind of measurement answer?

As I like to say, you just wait here for an answer.

Racism dreadfully concerns O’Brien (and most academics): “Blacks are disproportionately represented in US firearm homicides (14.6 per 100,000), and would benefit most from improved gun controls.” A racist statement if there ever was one, and an admission of outright bias.

O’Brien sought to categorize “racism” in two ways. The first was “implicitly”, measured by showing pictures of blacks to whites and asking how the whites felt about it. Since this measure of (almost) real racism didn’t play in their results, we don’t hear much about it their paper.

The second measure was “The Symbolic Racism 2000 Scale” by P. J. Henry and David O. Sears (Political Psychology, 2002, Vol. 23, pp. 253-283). “Symbolic racism (racial resentment) [is] an explicit but subtle form and measure of racism.” They stress this is not “old-fashioned or overt/blatant racism which had seen blacks as amoral and inferior”.

Symbolic “racism”, if it isn’t already obvious, thus means “not racism”. It instead probably means, as you will see, “knowledge of the racial politics.” A screwy thing about the scale is that it is only eight questions, any of which may be used as “the” scale: “the scale could be shortened or lengthened as needed”.

From Henry and Sears, here are four (the final eight were winnowed from many, hence the strange numbering):

2. Irish, Italian, Jewish, and many other minorities overcame prejudice and worked their way up. Blacks should do the same. (1, strongly agree; 2, somewhat agree; 3, somewhat disagree; 4, strongly disagree)…

9. How much of the racial tension that exists in the United States today do you think blacks are responsible for creating? (1, all of it; 2, most; 3, some; 4, not much at all)

11. How much discrimination against blacks do you feel there is in the United States today, limiting their chances to get ahead? (1, a lot; 2, some; 3, just a little; 4, none at all)…

16. Over the past few years, blacks have gotten more economically than they deserve. (1, strongly agree; 2, somewhat agree; 3, somewhat disagree; 4, strongly disagree)

O’Brien said their team used four of the eight questions, but I couldn’t discover which four. That’s science for you. I did double check, though: “Al Sharpton” was not a legitimate response to number 9. Neither was “Affirmative Action” listed for 16.

Anyway, you get the idea in which direction biased, coddled, gun-shy researchers would think answers are “racist”. There is only one politically correct view; everything else is “racist.”

Those who scored ever-so-slightly higher on the politics/”racism” scale were a tiny bit more likely to admit to strangers to owning a gun. This is not the same as saying those who scored higher on the politics/”racism” scale were a tiny bit more likely to own a gun, because gun ownership was never measured. Recall only a fraction of the respondents had anything to say on the subject of gun ownership.

Racism (“implicitly”) was not statistically important. So how do we explain newspaper headlines like this?

White racists more likely to be gun owners: study

And conclusions like this (from the authors)?

Opposition to gun control in US whites is somewhat paradoxical given the statistics on gun-related deaths, and such opposition may be undermining the public health of all US citizens.

Racism has to be the answer.


12 Comments

A Veteran’s Day Tale

Pretty, no?

From 1986 to 1989, I was stationed at Kadena Air Force Base, smack in the middle of Okinawa. Which wasn’t hard. Being in the middle, I mean. The island is small, about 60 miles long, a long string bean with an outgrowth in the north, the whole floating in the East China Sea.

I was with the 1962nd Communications Group. We fixed telephone lines, teletype machines, and (me) cryptographic whoozits. But since some of those scramblers and descramblers had to go over telephone lines, I had to fix those, too.

We spent endless hours “running lines.” Two guys at one remote location with an “o-scope” and two guys somewhere else shooting a tone down the line. We’d fiddle with some doodads and ensure the impedance of the wire was just so (the crypto stuff was finicky). Mostly it wasn’t. The wires had been run right after World War II in a hurry. The Okinawan telephone company had just begun to replace them.

At least the phone lines were buried, which means they weren’t snapping off in the frequent typhoons the island experienced.1 But since many of them ran shallow, every time it rained, which was always, and the cables were leaky, out we’d go and readjust.

We could only adjust so much, and if this wasn’t enough we’d have to swap out pairs of lines. Meaning we’d have to search the thick trunk (a cable) for unused telephone numbers with our “butt-sets”, a portable part rotary, part DTMF phone with alligator clips which we could use to listen in on calls, or even make them. Sometimes there weren’t any free pairs. Oh well, some colonel’s wife would have to do without a second line.

Sometimes even this wasn’t enough and we’d have to call to the switch, then still mechanical, a building-sized tangle of wires and relays, and have the Japanese phone company swap out the carbon block at a line’s termination. Think of these like charcoal filters which eliminated noise. Since the Japanese didn’t speak English and only Airman Enos (guess what his nickname was) could speak Japanese, things did not always go as planned.

So it was a relief to volunteer for temporary duty as NCOIC of Correctional Custody. Six weeks of guarding mostly minor offenders and a few of those being “PCSed out”, i.e. booted dishonorably. The bulk of inmates committed Article 15 offenses. It sounds grand to call these petty offenders “inmates” since the brig was just another building on base which was less well guarded than my ordinary station

Article 15 covered infractions such as failing to show up to duty, passing bad checks at the BX, insubordination, reckless driving and the like. These were people who were being rehabilitated, i.e. punished, and who would go back to their units after serving their time, usually three to six weeks, and maybe loosing a stripe or two. Those engaged in large-scale blackmarketing—usually buying booze from the Class VI store and reselling it to Okinawans—were kicked out or held waiting their courts martial. Blackmarketing was tempting because, say, a bottle of American whiskey bought on base for a few bucks could fetch ten times that amount off base.

Anyway, the crew had light duties. Marching from paperwork appointment to paperwork appointment or policing the grounds for stray bits of paperwork. I would daily mark down on paperwork that the inmates had completed their paperwork.

There was a TV in the barracks which inmates would be allowed to use for an hour or two at night if they had behaved. People looked forward to this time, but it’s not clear why. We only had AFRTS (pronounced A-farts) which ran ten-year-old sitcoms, some sports, and old movies. Like all TV, the level of programming was aimed at the lowest common denominator, i.e. marines.

One afternoon we couldn’t located Airman Jones. He was supposed to go out with the rest of the crew and march from A to B. The tangle nearby was searched, the toilets were searched, a nearby building was searched. But no Airman Jones. This was bad because if we couldn’t find him we’d have to fill out more paperwork.

Finally, another sergeant called me to the TV room. There was Airman Jones, crouched behind the TV in corner, holding a pair of rabbit ears above his head hoping we would mistake him for the base of the antenna. He didn’t want to miss his soap opera.


—————————————————————-

1Most of the island, unlike the P.I., is built in concrete and rebar, so typhoons were only a problem for the water supply. We liked typhoons because all the planes took off for Guam and we got the day off.


11 Comments
« Older posts Newer posts »

© 2015 William M. Briggs

Theme by Anders NorenUp ↑