Skip to content

Category: Statistics

The general theory, methods, and philosophy of the Science of Guessing What Is.

June 15, 2018 | 10 Comments

Statistics Are Now Hate Facts

Hate facts are true statements about reality that our elites demand remain occult and unuttered.

Elites don’t yet say that members of the elite cannot know hate facts, but they being good gnostics do try to control the spread of hate facts among the indigenous populants of these once United States.

Examples? We had many here. See the old post “Black And White Homicide Rates: Who’s Killing Whom?” Using official statistics, numbers which are therefore beyond dispute (“That’s a joke, son”), it was demonstrated that blacks murder at much higher rates than whites, that more blacks (proportionally) kill whites than whites kill blacks, and so on.

Like all hate facts, people know the truth of these statements, but you can see from the tone of the comments that some thought it in poor taste to state in public what we all knew to be true.

The fear of hate fact haters (our elites) is that hate facts will be used to generate hate, which is to say, to infer undesirable or incorrect explanations for the hate facts. Now it has been observed that blacks kill at higher rates than whites, and have done so for many decades, but the numbers themselves do not say why the difference exists. Some will say that the difference is caused because blacks and whites are different. Which is a trivially true statement. If it wasn’t trivially true, we would never be able to tell the difference between the races. The numbers do not however say why blacks as blacks kill at higher rates than whites as whites.

What to do about the difference in murder rate (or about any hate fact) is an altogether separate question. The answer can never be found in the hate facts themselves. The numbers are barren of cause. Cause and action have to be discovered outside them. Hate fact haters say that when the wrong people learn of hate facts, the cause they ascribe will invariably be some -ism or -phobia and the action (if required) will usually or always be hate. These conclusions do not necessarily follow.

It will be true sometimes that incorrect causes and unpalatable actions are proposed. But it is also true that hate fact haters come to incorrect causes and suggest actions that do more harm than good. We all know this story well enough about crime and race not to repeat it here.

Finally we come to confirmation about hate facts in a story discovered by reader Vince Lee. “Scholars claim that statistics ‘serve white racial interests’“.

Three British professors recently claimed that statistical analyses have been weaponized to “serve white racial interests” within academia and beyond.

Led by David Gillborn, a professor at the University of Birmingham, the professors argue that math serves white interests because it can “frequently encode racist perspectives beneath the facade of supposed quantitative objectivity.”

“Numbers are social constructs and likely to embody the dominant (racist) assumptions that shape contemporary society.”

“Contrary to popular belief, and the assertions of many quantitative researchers, numbers are neither objective nor color-blind,” Gillborn and his team assert in their article for the journal Race, Ethnicity, and Education.

7! 14! 23.5!

There’s some n-words for you, baby. N-umbers. Weapons. I slipped ’em in and you didn’t even notice.

I won’t tell you how “7” encodes a racist perspective beneath the facade of supposed quantitative objectivity because I’m already risking the censor by printing it. Saying what it means can land me prison.

Enough dumb jokes. The truth is these professors are frightened of hate facts. They know what numbers mean, and they know you know what numbers mean when you see them, but they wrongly suspect you will always ascribe incorrect causes and that you will propose harmful actions when you learn of the numbers.

These men have formed the field of “QuantCrit’—a portmanteau for ‘quantitative analysis’ and ‘critical race theory'”. They say “quantitative data is often gathered and analyzed in ways that reflect the interests, assumptions, and perceptions of White elites”, which is nonsense because it is impossible for any number to contain its cause. An analysis can be wrong when it ascribes the wrong cause. But numbers can never be wrong, nor can an analysis, unless they are lied about (I’m excepting mistakes).

Whatever else this is, it is a play for power. “The professors also acknowledge the tension between social justice and quantitative analysis, saying that while statistics can be used to point out the failures of social justice programming, ‘data is often used to shut down, silence, and belittle equity work.'”

In other words, hate facts undercut and disprove the theses of equality and diversity and they aren’t happy of it. Solution? Ban hate facts (in effect) by calling the hate facts themselves racsit, sexist, etc. etc. etc.

June 7, 2018 | 9 Comments

The Epidemiologist Fallacy Strikes Again: Premature Birth Rate Edition

Hypothesis testing leads to more scientific nonsense than any other practice, including fraud. Hypothesis testing, as regular readers know, cannot identify cause. It conflates decision with probability and leads to vast, vast over-certainties.

Why is it so liked? Two reasons. One, it is magic. When the wee p shows itself after the incantation of an algorithm, it is as if lead has been transmuted into gold, dross into straw. Significance has been found! Two, it saves thinking. Wee ps say are taken to mean the cause—or “link”, which everybody takes as “cause”—that was hoped for has been certified.

What is “significance”? A wee p. And what is a wee p? Significance.

And that is it.

Here’s the headline: Premature Birth Rates Drop in California After Coal and Oil Plants Shut Down: Within a year of eight coal- and oil-fired power plant retirements, the rate of preterm births in mothers living close by dropped, finds new study on air pollution..

Shutting down power plants that burn fossil fuels can almost immediately reduce the risk of premature birth in pregnant women living nearby, according to research published Tuesday.

Researchers scrutinized records of more than 57,000 births by mothers who lived close to eight coal- and oil-fired plants across California in the year before the facilities were shut down, and in the year after, when the air was cleaner.

The study, published in the American Journal of Epidemiology, found that the rate of premature births dropped from 7 to 5.1 percent after the plants were shuttered, between 2001 and 2011. The most significant declines came among African American and Asian women. Preterm birth can be associated with lifelong health complications.

Now this is a reporter, therefore we cannot expect her to know not to use causal language. The peer-reviewed study is “Coal and oil power plant retirements in California associated with reduced preterm birth among populations nearby” by Joan Casey and six other women.

The journal editors, all good peer reviewed scientists, surely know the difference between cause and correlation though, yes?

No. For in the same issue the paper ran appeared an editorial praising the article in causal terms. The editorial was from Pauline Mendola. She said, “We all breathe.”

Who knew?

She also said “Casey and colleagues have shown us that retiring older coal and oil power plants can result in a significant reduction in preterm birth and that these benefits also have the potential to lower what has been one of our most intractable health disparities.”

Casey did not show this. Casey found wee p-values in (we shall soon see) an overly complicated statistical model. Casey found a correlation, not a cause. But the curse of hypothesis testing is that everybody assumes, while preaching the opposite, that correlation is causation.

Onto Casey.

One would assume living near power plants, and even near recently closed power plants, we’d find folks unable to afford the best obstetrical services, and that we’d also find “disparities”—always a code word for differences in races, etc. So we’d expect differences in birthing. That’s causal talk. But with excellent evidence behind it.

Casey’s Table 1 says 7.5% of kids were preterm whose mothers’ address was near a power plant. They called this address the “exposure variable“. These power plants were all over California (see the news article above for a map).

Casey & Co. never measured any effect of any power plant—such as “pollution” or PM2.5 (i.e. dust), or stray electricity, or greater power up time, or etc. Since Casey never measured anything but an address, but could not help but go on about pollution and the like, the epidemiologist fallacy was committed. This is when the thing blamed for causing something is never measured and when hypothesis testing (wee p-values) are used to assign cause.

Anyway, back to that 7.5% out of 316 births. That’s with power plant. Sans plant it was 6.1% out of 272. Seems people moved out with the plants. But the rate did drop. Some thing or things caused the drop. What?

Don’t answer yet. Because we also learn that miles away from existent plants the preterm rate was 6.2% out of 994, while after plant closings it was 6.5% out of 1068. That’s worse! It seems disappearing plants caused an increase in preterm babies! What Caly needs to do is to build more plants fast!

Dumb reasoning, I know. But some thing or things caused that increase and one of the candidates is the closed plants—the same before and after.

So how did Casey reason it was plant removal that caused—or was “linked” to—preterm babies to decrease? With a statistical model (if you can find their paper, see their Eq. [1]). The model not only included terms for plant distance (in buckets), but also “maternal age (linear and quadratic terms), race/ethnicity, educational attainment and number of prenatal visits; infant sex and birth month; and neighborhood-level poverty and educational attainment.”

Linear and quadratic terms for mom’s age? Dude. That’s a lot of terms. Lo, they found some of the parameters in this model evinced wee ps, and the rest of the story you know. They did not look at their model’s predictive value, and we all know by now that reporting just on parameters exaggerates evidence.

Nevertheless they concluded:

Our study shows that coal and oil power plant retirements in California were associated with reductions in preterm birth, providing evidence of the potential health benefits of policies that favor the replacement of oil and coal with other fuel types for electricity generation. Moreover, given that effect estimates were stronger among non-Hispanic Black women, such cleaner energy policies could potentially not only improve birth outcomes overall but also reduce racial disparities in preterm birth.

Inappropriate causal language and politics masked as science. Get ready for a lot more of this, friends.

June 5, 2018 | 7 Comments

Lovely Example of Statistics Gone Bad

The graph above (biggified version here) was touted by Simon Kuestenmacher (who posts many beautiful maps). He said “This plot shows the objects that were found to be ‘the most distant object’ by astronomers over time. With ever improving tools we can peak further and further into space.”

The original is from Reddit’s r/dataisbeautiful, a forum where I am happy to say many of the commenter’s noticed the graph’s many flaws.

Don’t click over and don’t read below. Take a minute to first stare at the pic and see if you can see its problems.

Don’t cheat…

Try it first…

Problem #1: The Deadly Sin of Reification! The mortal sin of statistics. The blue line did not happen. The gray envelope did not happen. What happened where those teeny tiny too small block dots, dots which fade into obscurity next to the majesty of the statistical model. Reality is lost, reality is replaced. The model becomes realer than reality.

You cannot help but be drawn to the continuous sweet blue line, with its guardian gray helpers, and think to yourself “What smooth progress!” The black dots become a distraction, an impediment, even. They soon disappear.

Problem #1 one leads to Rule #1: If you want to show what happened, show what happened. The model did not happen. Reality happened. Show reality. Don’t show the model.

It’s not that models should never be examined. Of course they should. We want good model fits over past data. But since good models fits over past data are trivial to obtain—they are even more readily available than student excuses for missing homework—showing your audience the model fit when you want to show them what happened misses the point.

Of course, it’s well to separately show model fit when you want to honestly admit to model flaws. That leads to—

Problem #2: Probability Leakage! What’s the y-axis? “Distance of furthest object (parsecs).” Now I ask you: can the distance of the furthest object in parsecs be less than 0? No, sir, it cannot. But does both the blue line and gray guardian drop well below 0? Yes, sir, they do. And does that imply the impossible happened? Yes: yes, it does.

The model has given real and substantial probability to events which could not have happened. The model is a bust, a tremendous failure. The model stinks and should be tossed.

Probability leakage is when a model gives positive probability to events we know are impossible. It is more common than you think. Much more common. Why? Because people choose the parametric over the predictive, when they should choose predictive over parametric. They show the plus-or-minus uncertainty in some who-cares model parameters and do not show, or even calculate, the uncertainty in the actual observable.

I suspect that’s the case here, too. The gray guardians are, I think, the uncertainty in the parameter of the model, perhaps some sort of smoother or spline fit. They do not show the uncertainty in the actual distance. I suspect this because the gray guardian shrinks to near nothing at the end of the graph. But, of course, there must still be some healthy uncertainty in the model distant objects astronomers will find.

Parametric uncertainty, and indeed even parameter estimation, are largely of no value to man or beast. Problem #2 leads to Rule #2: You made a model to talk about uncertainty in some observable, so talk about uncertainty in the observable and not about some unobservable non-existent parameters inside your ad hoc model. That leads to—

Problem #3: We don’t know what will happen! The whole purpose of the model should have been to quantify uncertainty in the future. By (say) the year 2020, what is the most likely distance for the furthest object? And what uncertainty is there in that guess? We have no idea from this graph.

We should, too. Because every statistical model has an implicit predictive sense. It’s just that most people are so used to handling models in their past-fit parametric sense, that they always forget the reason the created the model in the first place. And that was because they were interested in the now-forgotten observable.

Problem #3 leads to Rule #3: always show predictions for observables never seen before (in any way). If that was done here, the gray guardians would take on an entirely different role. They would be “more vertical”—up-and-down bands centered on dots in future years. There is no uncertainty in the year, only in the value of most distant object. And we’d imagine that that uncertainty would grow as the year does. We also know that the low point of this uncertainty can never fall below the already known most distant object.

Conclusion: the graph is a dismal failure. But its failures are very, very, very common. See Uncertainty: The Soul of Probability, Modeling & Statistics for more of this type of analysis, including instruction on how to do it right.

Homework Find examples of time series graphs that commit at least one of these errors. Post a link to it below so that others can see.

June 4, 2018 | 10 Comments

Why Younger Evangelicals Waver in Support for Israel

Update It appears some think I believe unquestioning support of Israel is a good thing, or that I agree with evangelical interpretations of prophecy. If so, I wrote badly. This is false.

The title is a slight modification to the Ian Lovett-WSJ story with subtitle “Generational split reflects concern over Palestinians, spurring outreach by some churches and groups.

Lovett’s piece opens with a quote from a young evangelical who says he was taught from birth that “Christians are supposed to back Israel on everything.”

Alas, this young man found both that he could no longer do so given Israel’s latest treatment of Palestinians, and that he was not alone in his disappointment with the Mideast’s great limited democracy.

Lovett says “A generational divide is opening up among evangelical Christians in the U.S. over an issue that had long been an article of faith: unwavering support for the state of Israel.” But he also rightly points out that this “shift is part of a wider split within the evangelical movement, as younger evangelicals are also more likely to support same-sex marriage, tougher environmental laws and other positions their parents spent a lifetime opposing.”

That the young are drifting left everywhere is doubtless a reason for the slide away from blind support for Israel (whose cherishment was always labeled a “right-wing” cause), and the concern over its brutalities is another. But these are not the only reason for the discrepancy between old and young.

Here Lovett missed one of the most obvious of reasons, even though he had the evidence of it right in front of him.

Gary Burge, a professor at Calvin Theological Seminary and former professor at Wheaton College, an evangelical school, said the younger generation is less likely to quote Bible passages about Jerusalem, and more concerned with ethics and treatment of the downtrodden.

Now-codgers-then-whippersnappers quoted Bible passages because they believed, usually tacitly but often enough explicitly, that their support of Israel would hurry prophecy along. The sooner Israel was “fully restored”, including the (re-)building of the Third Temple, the sooner Our Lord would return. The greater the efforts—including the dispensing of gobs and gobs of money and Congressional votes—expended on Israel’s behalf, the quicker we could get this all over with amen.

Perhaps it’s difficult to recall how influential Hal Lindsey and his brother preachers were in the 1970s, most especially among evangelicals. Lindsey’s (and ghost-writer CC Carlson’s) Late, Great Planet Earth was a monumental success and genuine cultural phenomenon, read and discussed by everybody, but you would have had better luck finding a reactionary on a college campus than an evangelical who didn’t give Linsey at least some credence. A movie of the book was made in 1976, narrated by no less than Orson Welles, prodded just long enough from his alcoholic stupor. (Watch on YouTube.) However drunk he may have been, his voice was compelling. My God, he believes this! It could be true!

The earth was going to end, and end soon, because, Lindsey promised, the Bible foretold that it would within “a generation” of Israel’s restoration. Israel, of course, became a state in 1948, seventy years ago.

Now seventy years is a tad long for a generation, especially given Lindsey’s guess in Late Great that this length of time most likely meant we would never see the 1980s. Yet the achievement of the book and film, and the lack of progress toward the real End Time, gave rise to host of imitators who in earnest tweaked the prophecies. The generation didn’t start in 1948, but at some later date; or it would start only with the Third Temple; or generation was imprecise, but here’s how this event that happened the other week meant that the countdown has finally begun; the rapture was imminent. And so on ad infinitum.

Those who were adults or coming of age in the 1970s and who identified with these prophecies found, and still find, it difficult to give up on them. It would be like giving up hope, a position with which it is easy to find great sympathy. Ceasing adoration of Israel would be admitting the failure of the would-be prophecies.

The last surge of supporting Israel-for-the-sake-of-prophecy was in the 1990s with Tim LaHaye and Jerry B. Jenkin’s Left Behind series; the lead book also being made into a movie with our age’s master emoter Nicholas Cage. The movie was left behind, but the books sold well among evangelicals, but not all that good to a general audience, who took little notice. Catholics and protesting Christians of the 1970s knew of Late Great Planet Earth, but that can’t be said of Left Behind dozen (or where there more?).

Fervor waned. The upcoming generation naturally looked with less interest to Israel as being any kind of hope for Christians. Not needing to be reflexive apologists, the young are becoming freer to consider politics in the (now) usual way. They’re also learning the love had by evangelicals for Israel is not reciprocated. This naturally leads to indifference.