Skip to content

Category: Statistics

The general theory, methods, and philosophy of the Science of Guessing What Is.

February 6, 2019 | 6 Comments

Another Reason To Cheer For Global Warming: More Male Births!

Men, we must all admit, are the better sex. It something needs killing, you call a man. We’re taller and our looks improve whilst sporting a moustache. And talk about the ability to reach things on a high shelf? Boy.

These being incontrovertible truths, the world would be a happier place if there were a whole lot more men. And, thanks to global warming—which is going to strike any day now: soon, soon—there will be lots more men!

They may have wee p-values, though. No, wait. Rather, it is thanks to wee p-values we know men will shoulder the fairer sex into scarcity, they presumably not being able to take the heat.

This is the judgment of science; therefore, it is true. Just ask Misao Fukuda—whose last name is a slur in Russian, da?—and a slew of others who wrote “Climate change is associated with male:female ratios of fetal deaths and newborn infants in Japan” in the journal Environment and Epidemiology.

We imagine Fukuda said to him- or herself something like this: “Say. Global warming’s going to mosey along some day, and it would look bad if I didn’t have a research paper on the subject. I’m going to get in on it. What question can I ask? How about is global warming ‘having any impact on the sex ratio of newborn infants’?”

Because, the good Lord knows, global warming can only do bad things, and messing with the sex ratio is a bad thing.

To prove the connection, Fukuda popped a diskette into his floppy drive, or something similar, and used “Microsoft Excel Statistics 2012 for Windows” to analyze some hardcore data.

“What statistical methods did they employ, Briggs?”

Glad you asked, friend. Let me quote them: “Pearson correlation coefficients (r) were used to evaluate whether yearly mean temperature differences were associated with either male:female ratios of spontaneous fetal deaths or male:female ratios of births.”

Now if that isn’t science, I don’t know what is.

“You’re the expert. But tell me, how do changes in yearly mean temperatures meddle with the sex ratios averaged over all Japan?

Nobody knows. My guess is that yearly global warming particles, which are everywhere during years, seep into lady parts at—ahem—just the right moment. They get in there and whisper to the X sperm, “Psst. Are you as hot as me?” I mean metaphorical whispering, of course: this is science. The Xs distracted, the Ys are free to swim upstream and do their manly duty.

“So if my wife and I want a daughter, we should wait to try during winter?”

You’d think so, but no. Everybody knows abnormally cold temperatures are especial proof the world is warming. It’s complicated, but it has to do with polar vortexes.

“But aren’t polar vortexes part of the climate? And therefore if they’re making extra cold weather, shouldn’t we doubt global warming theory?”

I’ll take your question to mean you want to know about the wee p-values Fukuda found. Here’s the money quote: “a statistically significant negative association was found between temperature differences (the exposure of interest) and sex ratios of births (the outcome of interest) from 1968 to 2012 (r = -0.518, P = .0002).”

“That is a wee P.”

Yes, sir, it is. About as wee as they come. Therefore, since p-values have nothing whatever to say about any hypotheses whatsoever of interest, as I have proved—not just argued, proved—it must necessarily be the case this wee P has nothing to say about confirming a temperature-sex-ratio connection; therefore, the connection between temperature and sex-ratios, which can only be a causal connection, has been proved.

“Hold up. You just said p-values can’t be used for that purpose. Yet you went ahead and used them for that purpose anyway!”

It’s science, son. It’s complicated. If it were easy, anybody could do it.

And did I mention CNN picked up on the story? They figure “conceptions of boys especially vulnerable to external stress factors,” yet, echoing Fukuda, they say higher temperatures will give us more boys.

“So it must follow that higher temperatures give rise to less stress?”

You got it. Yet another reason to welcome global warming. Just don’t sit up waiting for it.

February 4, 2019 | 1 Comment

How To Do Predictive Statistics: Part X: Verification 1

We are finally at the most crucial part of the modeling process: proving if the damned thing works.

Not “works” in the sense that we get good unobservable parameter estimates, but works in the sense that the model makes useful, skillful predictions. No model should ever be released, by any researcher, until it has been verified at least using old data, and preferably using never-before-seen-or-used-in-any-way data. I’ll later have a paper on this subject, but I go on and on about it in this award-eligible book.

We’re going to use the Boston Housing data available in the mlbench package in R.


The last line of code will explain the dataset, which is 500-some observations of median housing prices (in $1,000s), from the 1970 Census, in different Boston neighborhoods. The key measure was nox, atmospheric “nitric oxides concentration (parts per 10 million)”, which we will take was measured without error. Which is not likely. Meaning that if we could take into account whatever uncertainty in the measure exists, we should use it, and the results below would be even less certain.

The idea was that high nox concentrations would be associated with lower prices, where “associated” was used as a causal word. To keep the example simple yet informative, we only use some of the measures: crim, per capita crime rate by town; chas, Charles River border indicator; rm, average number of rooms per dwelling; age, proportion of owner-occupied units built prior to 1940; dis, weighted distances to five Boston employment centres; tax, full-value property-tax rate; and b, a function of the proportion of blacks by town.

The ordinary regression gives this ANOVA table:

fit = glm(medv ~ crim + chas + nox + rm + age + dis + tax + b, data=BostonHousing)

              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -11.035298   4.193677  -2.631  0.00877  
crim         -0.110783   0.036748  -3.015  0.00270  
chas1         4.043991   1.010563   4.002 7.25e-05 
nox         -11.068707   4.145901  -2.670  0.00784  
rm            7.484931   0.381730  19.608  < 2e-16 
age          -0.068859   0.014473  -4.758 2.57e-06 
dis          -1.146715   0.207274  -5.532 5.12e-08 
tax          -0.006627   0.002289  -2.895  0.00395 
b             0.012806   0.003142   4.076 5.33e-05 

Look at all those excitingly wee p-values. Glorious, no? No. We'll soon see they lead to bitter disappointment.

Let's now fit the Bayesian version of the same regression, using defaults on the priors as we've been doing. We'll fit two models: one with nox, one without.

fit.s = stan_glm (medv ~ crim + chas + nox + rm + age + dis + tax + b,data=BostonHousing)
fit.s.n = stan_glm (medv ~ crim + chas + rm + age + dis + tax + b,data=BostonHousing)

We could look at the summaries of these models, and you should, but they would only give information about the posterior parameter distributions. Since we're using numerical approximations (MCMC methods) to give answers, we should see if the algorithms are working. They are. We could do better by tuning the approximation (larger resamples), but the defaults are close enough for us here.

So how do we check if the model works?

We have to ask a question that is pertinent to a decision we would make using it. The third quartile observed housing price was $35,000. What is the predicted probability prices would be higher than that given different levels of nox for data not yet observed? The answer for old data can be had by just looking. Why this question? Why not? If you don't like it, change it!

In order to answer our question, we also have to specify values for crim, chas, and all the other measures we chose to put into the model. I picked median observed values for all. If you have other questions or other values you like, try them!

Let's look at the predictions of housing price for future varying levels of nox: I used a sequence from the minimum to maximum observed values, with all other measures at their medians.

nnox = seq(min(BostonHousing$nox),max(BostonHousing$nox),by=.01)
s = length(nnox)
newdata = data.frame(crim = rep(median(BostonHousing$crim),s) , 
                     chas = rep("0",s) , 
                     nox = nnox ,
                     rm = rep(median(BostonHousing$rm),s) ,
                     age = rep(median(BostonHousing$age),s) ,
                     dis = rep(median(BostonHousing$dis),s) ,
                     tax = rep(median(BostonHousing$tax),s) ,
                     b = rep(median(BostonHousing$b),s) )
p.s.n = posterior_predict(fit.s,newdata)
p.s.non = posterior_predict(fit.s.n,newdata)

We built predictions many times before, so if you've forgotten the structure of this code, review! Now let's see what that looks like, in a relevance plot:

  plot(nnox,colMeans(p.s.n>35), type='l', lwd =3, xlab='nox (ppm)', ylab='Pr(Price > $35,000 | XDM)')
    lines(nnox,colMeans(p.s.non>35), lwd = 2, col=4)
    legend('topright',c('With nox','No nox'), col=c(1,4), lty=1, lwd=3, bty='n')

The predictive probability of high housing prices goes from about 4% with the lowest levels of nox, to something near 0% at the maximum nox values. The predictive probability in the model without nox is about 1.8% on average. The original p-value for nox was 0.008, which all would take as evidence of strong effect. Yet for this question the probability changes are quite small. Are these differences (a +/- 2% swing) in probability enough to make a difference to a decision maker? There is no single answer to that question. It depends on the decision maker. And there would still not be an answer until it was certain the other measures were making a difference. I'll leave that as homework.

We need this helper function, which is a simple adaptation of the original, which I don't love. If you'd rather use the original, have a ball.

 ecdf <- function (x) 
    x <- sort(x)
    n <- length(x)
    vals <- unique(x)
    rval <- cumsum(tabulate(match(x, vals)))/n

Now this is a fun little plot. It shows the probability prediction of Y for every old observed X, supposing that old X were new. This assumes my version of ecdf.

# the old data is reused as if it were new
p.s = posterior_predict(fit.s) 

P = ecdf(p.s[,1])
plot(P$vals, P$ecdf, type='l',xlim=c(-20,60), xlab="Price = s", ylab="Pr(Price < s | XDM)")
for (j in 2:nrow(BostonHousing)){
  P = ecdf(p.s[,j])
  lines(P$vals, P$ecdf, type='l')

Price (s) in on the x-axis, and the probability of future prices less than s, given the old data and M(odel), are on the y-axis. A dotted red line at $0 is shown. Now we know based on external knowledge to M that it is impossible prices can be less than $0. Yet the model far too often gives positive probabilities for impossible prices. The worst prediction is about a 65% chance for prices less than $0. You remember we call this probability leakage.

So right now we have good evidence this model has tremendous failure points. It is not a good model! And we never would have noticed had we not examined the model in its predictive form---and we also remember all models have a predictive form: even if you don't want to use that form, it's there. What we should do at this point is change the model to remove the leakage (and surely you recall we know how to do that: review!). But we won't: we'll keep it and see how the rest of the model verifies.

Let's look at four predictions, picking four old X, because the all-predictions plot is too busy. And on this we'll over-plot the ECDF of the observations, which are, of course, just step functions.

for (j in base::sample(1:nrow(BostonHousing),4)){
  P = ecdf(p.s[,j])
   plot(P$vals, P$ecdf, type='l',xlim=c(-20,60), xlab="Price = s", ylab="Pr(Price < s | XDM)")
   lines(stepfun(BostonHousing$medv[j],c(0,1)),xval = P$vals,do.points=FALSE)


From this plot we have the idea that the closer the ECDF of the prediction is to the ECDF of the observation, the better the model does. This is a good idea, indeed, a great one. It leads to the idea of a score that measures this distance.

One such, and very well investigated, score is the continuous ranked probability score, or CRPS. It is not the only one, just the one we'll use today.

Let F_i(s) = Pr( Y < s | X_i D_n M), i.e. a probabilistic prediction of our model for (past) measure X_i (which can be old or new; the D_n is the old data, as we have been writing). Here we let s vary, so that the forecast or prediction is a function of s, but s could be fixed, too. Let Y_i be the i-th observed value of Y. Then

    CRPS(F,Y) = sum_i ( F_i - I( s ≥ Y_i ) )^2,

where I() is the indicator function. If s is fixed and Y dichotomous, CRPS is called the Brier score. We can look at CRPS for each i, or averaged over the set. There are various ways to calculate CRPS, depending on the model assumed. Here we'll just use straight-up numerical approximation.

Why not use RMSE or MAD or R^2? Because those scores are not proper. I'll write elsewhere of this, but the idea is those scores throw away information, because they first have to compress the probability (and all our predictions are probabilities) into a point, thus they remove information in the prediction, leading to suboptimal scores. Proper scores have a large literature behind them---and you already know which award-eligible book writes about them!

Let's calculate, for both models with and without nox, the CRPS for each old observation.

crps = NA
crps.n = NA
k = 0
for (j in 1:nrow(BostonHousing)){
  k = k + 1
  P = ecdf(p.s[,j])
  crps[k] = sum((P$ecdf - sf(P$vals))^2)/length(P$ecdf)
  P = ecdf(p.s.n[,j])
  crps.n[k] = sum((P$ecdf - sf(P$vals))^2)/length(P$ecdf)


plot(BostonHousing$medv,crps, xlab="Price ($1,000s)", ylab="CRPS (with nox)",ylim=c(0,.35))

The stepfun creates a function which we use to compute the value of the ECDF of the observed price, which we need in our approximation of CRPS. Note, too, it uses the already computed posterior predictions, p.s. The crps is for the nox model; and crps.n is for the non-nox model. The plot is of the individual CRPS at the values of prices.

See that "floor" in the CRPS values? Looks like CRPS can go so low, but no lower. This turns out to be true. Given a model form, we can calculate an expected CRPS, and then bust up that expectation into pieces, each representing part of the score. One part is the inherent variability of the Y itself. Meaning, given a model, we have to accept the model will only be so good, that uncertainty will remain. There is much more to this, but we'll delay discussion for another day.

We could also do this:

plot(crps.n,crps, ylab="CRPS (with nox)",  xlab="CRPS (no nox)")

I won't show it, because it's hard to see, by eye, whether the nox model is doing better. But you can look on your own.

Enter the idea of skill---which was another of Fisher's inventions. Again, you know where you can read about it.

Skill scores K have the form:

    K(F,G,Y) = ( S(G,Y) - S(F,Y) ) / S(G,Y),

for some score function S(.,Y) (like CRPS), where F is the prediction from what is thought to be the superior or more complex model and G the prediction from the inferior. Skill is always relative. Since the minimum best score is S(F,Y)=0, and given the normalization, a perfect skill score has K = 1. Skill exists if and only if K >0, else it is absent. Skill like proper scores can be computed as an average or over individual points.

Models without skill should not be used!

Why not? Because the simpler model beats them! If you don't like CRPS, then you should, always should, use the score that reflects the cost-loss of the decisions you will make using the model. As always, a model may be useful to one man and useless to another.

Here is the skill plot:

plot(BostonHousing$nox,(crps.n-crps)/crps.n, ylab="Skill",  xlab="nox")

Everything below 0 are times when the non-nox model did better.

The overall average skill score was K = -0.011, indicating the more complicated model (with nox) does not have skill over the less complicated model. This means, as described above, that if the CRPS represents the actual cost-loss score of a decision maker using this model, the prediction is that in future data, the simpler model will outperform the more complex one.

We don't actually know how the model will perform on future data, we can only guess. So that model scores on old data are themselves predictions of future performance. How good they are is an open research question; i.e. nobody knows.

Whether this insufficiency in the model is due to probability leakage, or that the CRPS is not the best score in this situation, remain to be seen.

We have thus moved from delightful results as indicated by p-values, to more sobering results when testing the model against reality---where we also recall this is only a guess of how the model will actually perform on future data: nox is not useful. Since the possibility for over-fitting is always with us, it is the case that future skill measures would likely be worse than those seen in the old data.

Most of this post is extracted from an upcoming (already accepted) paper, which will be posted when it is published. This post is only a bare sketch.

January 23, 2019 | 9 Comments

What’s With The Sudden Push Against Meat?

Lots of animals eat nothing but or nearly all meat and suffer no ill effects because of it. Man, an animal, eats a lot of it, too, and the evidence is that a meat-based diet is good for man. And, vice versa, that an entirely plant-based diet is not so good. An all-veggie diet becomes more and more dubious for animals who are more designed (by evolution or whatever) to eat meat.

Of course, it matters where the meat comes from. It’s right to be suspicious of whatever it is processed food companies push as meat (to man or beast) in raw or cooked forms. Surely at least some of that stuff can’t be good to eat. I don’t know of anybody who disagrees with that, so we’ll let it pass.

Quality meat, including that which you stalk, kill, and process yourself is exceptionally healthy. Meat produced by many quality farmers might be second best, not for the quality of the meat, but because the thrill of the hunt is missing.

Even with all this good news, there is an increasing push by our elites against eating meat. Why?

There are, as I see it, two thrusts: (1) Meat should not be eaten because animals are people, too, and (2) meat is bad for “the planet”.

Now (1) is just silly. If we ban rising up, killing and eating, because animals must die, and animals have “rights” (or whatever), then we’d have to police the entire animal kingdom to stop the relentless, continuous, bloody red slaughter. Didn’t anybody pay attention to those nature documentaries? Meat-eaters rule. It is an ineradicable built-in fact of our fallen world that in order for some animals to live other animals must die.

It’s organizations like PETA who push bans on man eating meat. They argue it would make men manlier. Yet who would want to look like the PETA fellow in the picture heading this post? If you can stomach it, they have an entire video (complete with lousy music) with him and others in it.

Strange PETA would claim celery makes men manly, especially in a climate that condemns masculinity. All elites say it’s “toxic.” Increasing masculinity would thus increase toxicity. Thus we’re right to doubt PETA is telling the truth about wanting more manliness. Incidentally, the Babylon Bee wins the best headline: “Least Masculine Society In Human History Decides Masculinity Is A Growing Threat“.

Push (2), where meat must be eschewed and not chewed because it is “bad” for “the planet” is being backed by organizations like Kellogs. It is probably just a coincidence companies who back the organization pushing veganism do not sell meat. Anyway, somehow man eating meat will harm “the planet”, but that other animals eat meat won’t.

The organization that wants you to buy the products by global food companies because these products are supposedly better for “the planet” is EAT: “a global, non-profit startup dedicated to transforming our global food system through sound science, impatient disruption and novel partnerships.”

Impatient disruption?

Don’t we hear that kind of thing from social justice warriors? Answer: Yes, we do.

EAT says don’t eat meat. “Human diet causing ‘catastrophic’ damage to planet: study.

It is, as it should go without saying, impossible that eating meat will cause “catastrophic” damage to “the planet.” The planet will be just fine, and will survive whatever it is we might be able to do to it. We, however, may not survive what our elites want us to do to each other.

EAT says their planet-saving ” diet allows for about seven grammes (a quarter of an ounce) of red meat per day.” That’s about as much meat as you can balance on your pinkie fingernail, maybe less if you have large manly hands like I do.

Even this won’t save us, say these scientists. “‘We can no longer feed our population a healthy diet while balancing planetary resources,’ said The Lancet editor-in-chief Richard Horton.”

This is like saying “It’s us or the planet, and I choose the planet.”

It again should be obvious and go without saying that if so many people are so sick because farming practices are so bad, that the mostly veggies and carbohydrate processed food diet they’re taking is much more likely a culprit than unprocessed meat. So why not fix that problem? Stopping eating the vast majority of products grocery stores sell in the middle aisles would fix health and bad farming practices, too, since there would not be a need for all processed grains and oils and low-quality meat. Do you think we can get woke corporations and scientists to back that kind of initiative?

And incidentally, if we care so much about farming, do we really need to grow corn for the express purpose of turning it into automobile fuel?

Well, you can shrug all this EAT stuff off as the nonsense it is. But then don’t claim surprise when laws based on their “scientific” findings start being passed. And you’ll start seeing the media pushing the idea that health is right wing and white supremacist.

January 22, 2019 | 6 Comments

Equality Is False — In Aptitude, Intelligence, Ability. What Are The Consequences?

Listen to this:

Brothers and sisters: There are different kinds of spiritual gifts but the same Spirit; there are different forms of service but the same Lord; there are different workings but the same God who produces all of them in everyone. To each individual the manifestation of the Spirit is given for some benefit. To one is given through the Spirit the expression of wisdom; to another, the expression of knowledge according to the same Spirit; to another, faith by the same Spirit; to another, gifts of healing by the one Spirit; to another, mighty deeds; to another, prophecy; to another, discernment of spirits; to another, varieties of tongues; to another, interpretation of tongues. But one and the same Spirit produces all of these, distributing them individually to each person as he wishes.

Christians are obliged to believe this. God did not hand out aptitude, intelligence, and ability equally. Why this is so is His business. Therefore, Equality is false. It follows immediately that to call the unequal distribution of talents “unfair” is idiotic, or, if you like, sinful.

It is obvious, even to non-Christians, that not every person is born with the same set of equipment. This is, of course, denied by some, because they are under the sway of Equality, the theory. They believe and love the theory. This love is, in fact, the only evidence for the theory. All Reality speaks against it. No positive evidence has ever been in favor of it. It is therefore irrational to believe in Equality. But Love is often irrational. So we will assume below, at least for the sake of argument, that Equality is false.

Now I still owe my article (half finished; been swamped) on IQ and intelligence, but I will give you (without proof here) some of its conclusions. Intelligence is not, and cannot be, entirely biological. This follows from the immateriality of the intellect and will. Accepting this (for now), it follows that genetics and evolution will never provide complete explanations of intelligence and other mental abilities. Scores on tests of questions thought clever (inside a culture) will only explain a fraction of intelligence and other abilities. Statistical summaries of those scores, with some being labeled things like ‘g’, underestimate variability. Correlates of test scores and genes use wee p-values as evidence. Over-certainty abounds. Which will come as no surprise to regular readers.

Yet it is also true that over-certainty is not no-certainty. Many things are perfectly clear, such as the consistent, long-running differences in performance on certain tests and other performance measures between American blacks and Asians. (The reverse is true for physical abilities.) We can call race those groups with which we self identify. By “differences” I mean, take a man from each race and knowing only the race of both, there is a better than fifty percent chance that the Asian would outscore the black. After we measure both men, race no longer matters.

The question is why do these differences exist? Are Asians and blacks equal in mental abilities—-and here I mean “on average”, as should be clear—but the Asians are racist and somehow have powers to apply their racism and keep blacks from performing at their true level? Or do blacks simply refuse to assimilate, not caring about the kinds of material that go into these tests? Perhaps a third group is tamping blacks down or elevating Asians. But if that’s so, then this third group must be credited with having the smarts to impose its will in such a profound way; meaning this third group has provided additional evidence of its superiority in at least these dimensions.

Whatever the answer is, we can’t tell from the data on scores (and race) alone. Cause isn’t in data, as we have been preaching. We have to look outside.

Some look to genes. There are obvious genetic and outward differences between peoples of different races, so it is plausible, and in no way ruled out by any theory of biology (except Equality), that inward differences can exist, too. We have already admitted that much of this evidence is necessarily incomplete and exaggerated, based on flawed metaphysics and statistics, so it’s not clear how a systematic re-doing of all measures in the predictive way it would all shake out.

It is clear, though, that the possibility is real, albeit of smaller effect than previously considered. Which makes the case of James Watson interesting. The man is near death and wields no influence anywhere, but the mob is baying for his blood, again, because he suggested what must almost certainly be true to some extent.

It is, of course, a political matter whether such information as genetics differences in mental abilities by races is useful or how this information should be acted upon. That it can be suppressed by bullying is true to an extent. It is not hard to imagine geneticists in these fields wearing false moustaches. But that it is suppressed only serves to make it more fascinating, and the differences prone to exaggeration (if only to make a point).

None of this would be really important except for the race war our elites seem intent on forcing upon us. This can only cause people to be forced to take sides. How dreadful.