Skip to content
February 7, 2019 | 10 Comments

More Asinine Global Warming Research — Introducing ‘Climate Liar’

This is a follow-up to yesterday’s post. None of these “studies” merits a full-scale analysis, but they are silly enough to warrant a moment or two of your attention.

Item Non-scientist activist progressive Naomi Klein “proposes new name for skeptics: “Don’t call them deniers, they are arsonists'”

[Blue Cheka] Eric Holthaus readily agreed with Klein:

“A word of warning to Americans: Your government is literally cheering on planetary destruction. It’s time to get angry. It’s time to demand a better world,” Holthaus wrote on February 6.

We need a new term for those who knowingly or through gross negligence deny the reality that the climate and weather are not as bad as we have been screechingly told these last three decades.

Climate liar.

Their grip on the asinine and logically idiotic climate denier is stronger than an abortionist clutching his Hoover. Climate denier is a political name, like fascist or Nazi, and it need not have any relation to any Reality. It is used only as an insult to quiet a too-intelligent opponent.

Climate liar is just as political, and you’ll feel like a fool using it, and would not want to bring it out too often, or at all. But it is a term at least consonant with the evidence, it is accurate, and can be used against opponents of any intelligence (typically low). It rhymes with denier, too, which important in any childish debate. “You’re a climate denier with funding from big oil!” “Yeah? Well you’re a climate liar with funding from the Cathedral!” If anybody gives it a try, report your success below.

Item Climate change poses large-scale threat to mental health (Thanks to Dan Hughes for the tip)

Wellbeing falters without sound mental health. Scholars have recently indicated that the impacts of climate change are likely to undermine mental health through a variety of direct and indirect mechanisms. Using daily meteorological data coupled with information from nearly 2 million randomly sampled US residents across a decade of data collection, we find that experience with hotter temperatures and added precipitation each worsen mental health, that multiyear warming associates with an increased prevalence of mental health issues, and that exposure to tropical cyclones, likely to increase in frequency and intensity in the future, is linked to worsened mental health. These results provide added large-scale evidence to the growing literature linking climate change and mental health….

Average maximum temperatures greater than 30C amplify the probability of mental health
issues by over 1% point compared with 10C to 15C (coefficient: 1.275, P < 0.001, n = 1,961,743).

Mental health issues. By over 1%! Are these issuances like bugs that crawl out of ears? Never mind. I sadly remind the reader that with sample sizes this large, you have to work at not getting wee p-values.

This article is schematic of asinine global warming research. So typical is it that it is not irrational to suppose it was the product of an algorithm designed to pump out meaningless publications.

Step (1): decide upon a horror. Step (2): gather weather data. Step (3): search for correlations between the horror and weather data. Step (5): discover wee p-value (one can always be found). Step (6): theorize theorize theorize about the causative “link” between weather data and horror. Step (7): publish.

These steps are invariable. It would be the work of hours to make this algorithm. Thus I would not be shocked to learn it exists. The fellow Tyler Vigen missed an opportunity to become lauded as a world-class scientist, if only his spurious correlations collection had used weather and not economic data.

Item Earth system impacts of the European arrival and Great Dying in the Americas after 1492 (We flip this one and start at the very end: my emphasis)

Acknowledgements

We thank Andrew Sluyter for discussions on the extent of pre-Columbian land use in Northern Mexico and its representation in LUC datasets, Jed Kaplan for providing the KK10 dataset, William A. Huber for statistical advice, and Joyce Chaplin and Matt Liebmann for discussions on an adequate name for the depopulation event.

What is this depopulation event? Spaniards. And what is the name for this great horror? The Great Dying! I kid you not, dear reader. The Conclusion (again, my emphsis; do read it all):

We estimate that 55 million indigenous people died following the European conquest of the Americas beginning in 1492. This led to the abandonment and secondary succession of 56 million hectares of land. We calculate that this led to an additional 7.4?Pg?C being removed from the atmosphere and stored on the land surface in the 1500s. This was a change from the 1400s of 9.9?Pg?C (5?ppm CO2). Including feedback processes this contributed between 47% and 67% of the 15–22?Pg?C (7–10?ppm CO2) decline in atmospheric CO2 between 1520 CE and 1610 CE seen in Antarctic ice core records. These changes show that the Great Dying of the Indigenous Peoples of the Americas is necessary for a parsimonious explanation of the anomalous decrease in atmospheric CO2 at that time and the resulting decline in global surface air temperatures. These changes show that human actions had global impacts on the Earth system in the centuries prior to the Industrial Revolution. Our results also show that this aspect of the Columbian Exchange — the globalisation of diseases — had global impacts on the Earth system, key evidence in the calls for the drop in atmospheric CO2 at 1610 CE to mark the onset of the Anthropocene epoch (Lewis and Maslin, 2015, 2018). We conclude that the Great Dying of the Indigenous Peoples of the Americas led to the abandonment of enough cleared land in the Americas that the resulting terrestrial carbon uptake had a detectable impact on both atmospheric CO2 and global surface air temperatures in the two centuries prior to the Industrial Revolution.

The article lays speculation upon speculation laid upon wild guesses and even wilder estimates, all leading to the certain sure conclusion that the Great Dying…was good for the planet? Are they advocating an introduction of a people-killing, tree-restoring disease? At least that had to admit the existence of the Little Ice Age.

Item How Wearing Business Suits Makes Cities Warmer And Causes Thermostat Gender Bias

The inertia of business culture will likely dictate that business suits are worn in certain situations. However, science suggests that there may be an indirect impact on the urban heat, carbon dioxide emissions, gender equity issues and climate change. Innovative new fabrics and approaches offer some hope.

No. No hope at all.

Item Good News, Pilots—The Weather’s Getting Better: And cleaner urban air may be the reason.

The headline proves that they haven’t all got their stories straight or together yet. If the activist crowd discovers some authorities are acknowledging improvements, look for the bananas to fly.

February 6, 2019 | 6 Comments

Another Reason To Cheer For Global Warming: More Male Births!

Men, we must all admit, are the better sex. It something needs killing, you call a man. We’re taller and our looks improve whilst sporting a moustache. And talk about the ability to reach things on a high shelf? Boy.

These being incontrovertible truths, the world would be a happier place if there were a whole lot more men. And, thanks to global warming—which is going to strike any day now: soon, soon—there will be lots more men!

They may have wee p-values, though. No, wait. Rather, it is thanks to wee p-values we know men will shoulder the fairer sex into scarcity, they presumably not being able to take the heat.

This is the judgment of science; therefore, it is true. Just ask Misao Fukuda—whose last name is a slur in Russian, da?—and a slew of others who wrote “Climate change is associated with male:female ratios of fetal deaths and newborn infants in Japan” in the journal Environment and Epidemiology.

We imagine Fukuda said to him- or herself something like this: “Say. Global warming’s going to mosey along some day, and it would look bad if I didn’t have a research paper on the subject. I’m going to get in on it. What question can I ask? How about is global warming ‘having any impact on the sex ratio of newborn infants’?”

Because, the good Lord knows, global warming can only do bad things, and messing with the sex ratio is a bad thing.

To prove the connection, Fukuda popped a diskette into his floppy drive, or something similar, and used “Microsoft Excel Statistics 2012 for Windows” to analyze some hardcore data.

“What statistical methods did they employ, Briggs?”

Glad you asked, friend. Let me quote them: “Pearson correlation coefficients (r) were used to evaluate whether yearly mean temperature differences were associated with either male:female ratios of spontaneous fetal deaths or male:female ratios of births.”

Now if that isn’t science, I don’t know what is.

“You’re the expert. But tell me, how do changes in yearly mean temperatures meddle with the sex ratios averaged over all Japan?

Nobody knows. My guess is that yearly global warming particles, which are everywhere during years, seep into lady parts at—ahem—just the right moment. They get in there and whisper to the X sperm, “Psst. Are you as hot as me?” I mean metaphorical whispering, of course: this is science. The Xs distracted, the Ys are free to swim upstream and do their manly duty.

“So if my wife and I want a daughter, we should wait to try during winter?”

You’d think so, but no. Everybody knows abnormally cold temperatures are especial proof the world is warming. It’s complicated, but it has to do with polar vortexes.

“But aren’t polar vortexes part of the climate? And therefore if they’re making extra cold weather, shouldn’t we doubt global warming theory?”

I’ll take your question to mean you want to know about the wee p-values Fukuda found. Here’s the money quote: “a statistically significant negative association was found between temperature differences (the exposure of interest) and sex ratios of births (the outcome of interest) from 1968 to 2012 (r = -0.518, P = .0002).”

“That is a wee P.”

Yes, sir, it is. About as wee as they come. Therefore, since p-values have nothing whatever to say about any hypotheses whatsoever of interest, as I have proved—not just argued, proved—it must necessarily be the case this wee P has nothing to say about confirming a temperature-sex-ratio connection; therefore, the connection between temperature and sex-ratios, which can only be a causal connection, has been proved.

“Hold up. You just said p-values can’t be used for that purpose. Yet you went ahead and used them for that purpose anyway!”

It’s science, son. It’s complicated. If it were easy, anybody could do it.

And did I mention CNN picked up on the story? They figure “conceptions of boys especially vulnerable to external stress factors,” yet, echoing Fukuda, they say higher temperatures will give us more boys.

“So it must follow that higher temperatures give rise to less stress?”

You got it. Yet another reason to welcome global warming. Just don’t sit up waiting for it.

February 5, 2019 | 17 Comments

Pope Signs Document Nobody Asked Him To Sign

So the Pope did what nobody was asking him to do: sign a document that appears to have emanated from Harvard’s SJW dungeon, the same serpent that had such a hard time convincing Eve to eat that apple whispering into the authors’ ears.

Why the Pope signed I leave for you to tell me. Here, the highlights (all emphases mine) from “A document on human fraternity for world peace and living together.

Through faith in God, who has created the universe, creatures and all human beings (equal on account of his mercy), believers are called to express this human fraternity by safeguarding creation and the entire universe and supporting all persons, especially the poorest and those most in need.

Creatures are equal to human beings? Mary remove mousetraps what a dumb thing to say. Surely they can’t mean it. Look at those parentheses: I’m misreading it. Maybe they only meant to imply the false and pernicious lie that all people are equal, a belief contradicted directly in scripture?

Now if this description of this Netflix cleaning show is accurate, people have a hard enough time safeguarding their sock drawers, so I don’t know how they’re going to begin to safeguard the entire universe. Who’s going to be in charge of the extra-galactic patrols?

This transcendental value served as the starting point for several meetings characterized by a friendly and fraternal atmosphere where we shared the joys, sorrows and problems of our contemporary world.

The meeting was, after all, sponsored by Kleenex&tm;.

There followed some words about “therapeutic achievements“, “social injustice”, “inequality”, “discrimination”, and so on, cut and pasted from the New York Times opinion page. Then came the real meat.

In the name of God who has created all human beings equal in rights, duties and dignity…

No, no, and no.

In the name of human fraternity that embraces all human beings, unites them and renders them equal…

No.

Now war, poverty, torture, calamity, and other assorted horrors the document decries are bad and undesirable. Which everybody knows. And which everybody has always known. While there does exist the odd bloodlusting fool who calls for torture and terrorism, these people are not what anybody would consider to be a pressing problem. Not when — ahem — people are apostasizing on the pretext that today is Tuesday.

Surely the eternal souls of his flocks are of more importance than their attitude about recycling? The document nods in that direction:

[T]he most important causes of the crises of the modern world are a desensitized human conscience, a distancing from religious values and a prevailing individualism accompanied by materialistic philosophies that deify the human person and introduce worldly and material values in place of supreme and transcendental principles.

This is profoundly true. And so is this, more or less:

While recognizing the positive steps taken by our modern civilization in the fields of science, technology, medicine, industry and welfare, especially in developed countries, we wish to emphasize that, associated with such historic advancements, great and valued as they are, there exists both a moral deterioration that influences international action and a weakening of spiritual values and responsibility. All this contributes to a general feeling of frustration, isolation and desperation leading many to fall either into a vortex of atheistic, agnostic or religious extremism, or into blind and fanatic extremism, which ultimately encourage forms of dependency and individual or collective self-destruction.

Over-reaction is not as good as reaction — be a reactionary — but it is not a surprise it occurs, especially in a declining civilization.

But then comes this, the most curious and inexplicable bullet point, which is here broken in two pieces.

Freedom is a right of every person: each individual enjoys the freedom of belief, thought, expression and action.

That there is or should be freedom of expression and action is utterly false. It is as far from truth as infinity is from 0; this sentiment is even the opposite of what they preached earlier about having no freedom of action to commit torture, etc. In children, there is not and should not be freedom in thought and belief. Consciences have to be formed, not discovered. Keep children in mind when you finish reading the paragraph.

Here is where the meat turns rancid.

The pluralism and the diversity of religions, colour, sex, race and language are willed by God in His wisdom, through which He created human beings. This divine wisdom is the source from which the right to freedom of belief and the freedom to be different derives. Therefore, the fact that people are forced to adhere to a certain religion or culture must be rejected, as too the imposition of a cultural way of life that others do not accept;

No. But if it were true God wanted diversity of religions, then He should not have issued that first Commandment. And then we may as well embrace or have “dialogue” with Santeria Voodists, worshippers of Santa Muerta, Wiccans, Satanists, Baalites, Aztec heart surgeons, Planned (Un)Parenthood baby-blood drinkers, whatever demon Nancy Pelosi bows to, and on and on. If what this paragraph says is true, then there is no need for the Church, and thus no need for the Pope. Smart money says he doesn’t resign, though.

The rest of the document, littered with “rights”, and no remembrance Christ said he came to bring the sword.

If you cannot say This Is Right, you must bow to somebody who will.

February 4, 2019 | 1 Comment

How To Do Predictive Statistics: Part X: Verification 1

We are finally at the most crucial part of the modeling process: proving if the damned thing works.

Not “works” in the sense that we get good unobservable parameter estimates, but works in the sense that the model makes useful, skillful predictions. No model should ever be released, by any researcher, until it has been verified at least using old data, and preferably using never-before-seen-or-used-in-any-way data. I’ll later have a paper on this subject, but I go on and on about it in this award-eligible book.

We’re going to use the Boston Housing data available in the mlbench package in R.


install.packages("mlbench")
require(mlbench)
data(BostonHousing)
?BostonHousing

The last line of code will explain the dataset, which is 500-some observations of median housing prices (in $1,000s), from the 1970 Census, in different Boston neighborhoods. The key measure was nox, atmospheric “nitric oxides concentration (parts per 10 million)”, which we will take was measured without error. Which is not likely. Meaning that if we could take into account whatever uncertainty in the measure exists, we should use it, and the results below would be even less certain.

The idea was that high nox concentrations would be associated with lower prices, where “associated” was used as a causal word. To keep the example simple yet informative, we only use some of the measures: crim, per capita crime rate by town; chas, Charles River border indicator; rm, average number of rooms per dwelling; age, proportion of owner-occupied units built prior to 1940; dis, weighted distances to five Boston employment centres; tax, full-value property-tax rate; and b, a function of the proportion of blacks by town.

The ordinary regression gives this ANOVA table:


fit = glm(medv ~ crim + chas + nox + rm + age + dis + tax + b, data=BostonHousing)
summary(fit)

              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -11.035298   4.193677  -2.631  0.00877  
crim         -0.110783   0.036748  -3.015  0.00270  
chas1         4.043991   1.010563   4.002 7.25e-05 
nox         -11.068707   4.145901  -2.670  0.00784  
rm            7.484931   0.381730  19.608  < 2e-16 
age          -0.068859   0.014473  -4.758 2.57e-06 
dis          -1.146715   0.207274  -5.532 5.12e-08 
tax          -0.006627   0.002289  -2.895  0.00395 
b             0.012806   0.003142   4.076 5.33e-05 

Look at all those excitingly wee p-values. Glorious, no? No. We'll soon see they lead to bitter disappointment.

Let's now fit the Bayesian version of the same regression, using defaults on the priors as we've been doing. We'll fit two models: one with nox, one without.


fit.s = stan_glm (medv ~ crim + chas + nox + rm + age + dis + tax + b,data=BostonHousing)
fit.s.n = stan_glm (medv ~ crim + chas + rm + age + dis + tax + b,data=BostonHousing)

We could look at the summaries of these models, and you should, but they would only give information about the posterior parameter distributions. Since we're using numerical approximations (MCMC methods) to give answers, we should see if the algorithms are working. They are. We could do better by tuning the approximation (larger resamples), but the defaults are close enough for us here.

So how do we check if the model works?

We have to ask a question that is pertinent to a decision we would make using it. The third quartile observed housing price was $35,000. What is the predicted probability prices would be higher than that given different levels of nox for data not yet observed? The answer for old data can be had by just looking. Why this question? Why not? If you don't like it, change it!

In order to answer our question, we also have to specify values for crim, chas, and all the other measures we chose to put into the model. I picked median observed values for all. If you have other questions or other values you like, try them!

Let's look at the predictions of housing price for future varying levels of nox: I used a sequence from the minimum to maximum observed values, with all other measures at their medians.


nnox = seq(min(BostonHousing$nox),max(BostonHousing$nox),by=.01)
s = length(nnox)
newdata = data.frame(crim = rep(median(BostonHousing$crim),s) , 
                     chas = rep("0",s) , 
                     nox = nnox ,
                     rm = rep(median(BostonHousing$rm),s) ,
                     age = rep(median(BostonHousing$age),s) ,
                     dis = rep(median(BostonHousing$dis),s) ,
                     tax = rep(median(BostonHousing$tax),s) ,
                     b = rep(median(BostonHousing$b),s) )
p.s.n = posterior_predict(fit.s,newdata)
p.s.non = posterior_predict(fit.s.n,newdata)

We built predictions many times before, so if you've forgotten the structure of this code, review! Now let's see what that looks like, in a relevance plot:


  plot(nnox,colMeans(p.s.n>35), type='l', lwd =3, xlab='nox (ppm)', ylab='Pr(Price > $35,000 | XDM)')
    lines(nnox,colMeans(p.s.non>35), lwd = 2, col=4)
    grid()
    legend('topright',c('With nox','No nox'), col=c(1,4), lty=1, lwd=3, bty='n')

The predictive probability of high housing prices goes from about 4% with the lowest levels of nox, to something near 0% at the maximum nox values. The predictive probability in the model without nox is about 1.8% on average. The original p-value for nox was 0.008, which all would take as evidence of strong effect. Yet for this question the probability changes are quite small. Are these differences (a +/- 2% swing) in probability enough to make a difference to a decision maker? There is no single answer to that question. It depends on the decision maker. And there would still not be an answer until it was certain the other measures were making a difference. I'll leave that as homework.

We need this helper function, which is a simple adaptation of the original, which I don't love. If you'd rather use the original, have a ball.


 ecdf <- function (x) 
{
    x <- sort(x)
    n <- length(x)
    vals <- unique(x)
    rval <- cumsum(tabulate(match(x, vals)))/n
    return(list(ecdf=rval,vals=vals))
}

Now this is a fun little plot. It shows the probability prediction of Y for every old observed X, supposing that old X were new. This assumes my version of ecdf.


# the old data is reused as if it were new
p.s = posterior_predict(fit.s) 

P = ecdf(p.s[,1])
plot(P$vals, P$ecdf, type='l',xlim=c(-20,60), xlab="Price = s", ylab="Pr(Price < s | XDM)")
for (j in 2:nrow(BostonHousing)){
  P = ecdf(p.s[,j])
  lines(P$vals, P$ecdf, type='l')
}
grid()
abline(v=0,lty=2,col=2,lwd=2)

Price (s) in on the x-axis, and the probability of future prices less than s, given the old data and M(odel), are on the y-axis. A dotted red line at $0 is shown. Now we know based on external knowledge to M that it is impossible prices can be less than $0. Yet the model far too often gives positive probabilities for impossible prices. The worst prediction is about a 65% chance for prices less than $0. You remember we call this probability leakage.

So right now we have good evidence this model has tremendous failure points. It is not a good model! And we never would have noticed had we not examined the model in its predictive form---and we also remember all models have a predictive form: even if you don't want to use that form, it's there. What we should do at this point is change the model to remove the leakage (and surely you recall we know how to do that: review!). But we won't: we'll keep it and see how the rest of the model verifies.

Let's look at four predictions, picking four old X, because the all-predictions plot is too busy. And on this we'll over-plot the ECDF of the observations, which are, of course, just step functions.


par(mfrow=c(2,2))
for (j in base::sample(1:nrow(BostonHousing),4)){
  P = ecdf(p.s[,j])
   plot(P$vals, P$ecdf, type='l',xlim=c(-20,60), xlab="Price = s", ylab="Pr(Price < s | XDM)")
   lines(stepfun(BostonHousing$medv[j],c(0,1)),xval = P$vals,do.points=FALSE)
   grid()

}

From this plot we have the idea that the closer the ECDF of the prediction is to the ECDF of the observation, the better the model does. This is a good idea, indeed, a great one. It leads to the idea of a score that measures this distance.

One such, and very well investigated, score is the continuous ranked probability score, or CRPS. It is not the only one, just the one we'll use today.

Let F_i(s) = Pr( Y < s | X_i D_n M), i.e. a probabilistic prediction of our model for (past) measure X_i (which can be old or new; the D_n is the old data, as we have been writing). Here we let s vary, so that the forecast or prediction is a function of s, but s could be fixed, too. Let Y_i be the i-th observed value of Y. Then

    CRPS(F,Y) = sum_i ( F_i - I( s ≥ Y_i ) )^2,

where I() is the indicator function. If s is fixed and Y dichotomous, CRPS is called the Brier score. We can look at CRPS for each i, or averaged over the set. There are various ways to calculate CRPS, depending on the model assumed. Here we'll just use straight-up numerical approximation.

Why not use RMSE or MAD or R^2? Because those scores are not proper. I'll write elsewhere of this, but the idea is those scores throw away information, because they first have to compress the probability (and all our predictions are probabilities) into a point, thus they remove information in the prediction, leading to suboptimal scores. Proper scores have a large literature behind them---and you already know which award-eligible book writes about them!

Let's calculate, for both models with and without nox, the CRPS for each old observation.


crps = NA
crps.n = NA
k = 0
for (j in 1:nrow(BostonHousing)){
  k = k + 1
  sf=stepfun(BostonHousing$medv[j],c(0,1))
  P = ecdf(p.s[,j])
  crps[k] = sum((P$ecdf - sf(P$vals))^2)/length(P$ecdf)
  
  sf=stepfun(BostonHousing$medv[j],c(0,1))
  P = ecdf(p.s.n[,j])
  crps.n[k] = sum((P$ecdf - sf(P$vals))^2)/length(P$ecdf)

}

plot(BostonHousing$medv,crps, xlab="Price ($1,000s)", ylab="CRPS (with nox)",ylim=c(0,.35))


The stepfun creates a function which we use to compute the value of the ECDF of the observed price, which we need in our approximation of CRPS. Note, too, it uses the already computed posterior predictions, p.s. The crps is for the nox model; and crps.n is for the non-nox model. The plot is of the individual CRPS at the values of prices.

See that "floor" in the CRPS values? Looks like CRPS can go so low, but no lower. This turns out to be true. Given a model form, we can calculate an expected CRPS, and then bust up that expectation into pieces, each representing part of the score. One part is the inherent variability of the Y itself. Meaning, given a model, we have to accept the model will only be so good, that uncertainty will remain. There is much more to this, but we'll delay discussion for another day.

We could also do this:


plot(crps.n,crps, ylab="CRPS (with nox)",  xlab="CRPS (no nox)")
  abline(0,1)

I won't show it, because it's hard to see, by eye, whether the nox model is doing better. But you can look on your own.

Enter the idea of skill---which was another of Fisher's inventions. Again, you know where you can read about it.

Skill scores K have the form:

    K(F,G,Y) = ( S(G,Y) - S(F,Y) ) / S(G,Y),

for some score function S(.,Y) (like CRPS), where F is the prediction from what is thought to be the superior or more complex model and G the prediction from the inferior. Skill is always relative. Since the minimum best score is S(F,Y)=0, and given the normalization, a perfect skill score has K = 1. Skill exists if and only if K >0, else it is absent. Skill like proper scores can be computed as an average or over individual points.

Models without skill should not be used!

Why not? Because the simpler model beats them! If you don't like CRPS, then you should, always should, use the score that reflects the cost-loss of the decisions you will make using the model. As always, a model may be useful to one man and useless to another.

Here is the skill plot:


plot(BostonHousing$nox,(crps.n-crps)/crps.n, ylab="Skill",  xlab="nox")
  abline(h=0,lty=2,col=2)

Everything below 0 are times when the non-nox model did better.

The overall average skill score was K = -0.011, indicating the more complicated model (with nox) does not have skill over the less complicated model. This means, as described above, that if the CRPS represents the actual cost-loss score of a decision maker using this model, the prediction is that in future data, the simpler model will outperform the more complex one.

We don't actually know how the model will perform on future data, we can only guess. So that model scores on old data are themselves predictions of future performance. How good they are is an open research question; i.e. nobody knows.

Whether this insufficiency in the model is due to probability leakage, or that the CRPS is not the best score in this situation, remain to be seen.

We have thus moved from delightful results as indicated by p-values, to more sobering results when testing the model against reality---where we also recall this is only a guess of how the model will actually perform on future data: nox is not useful. Since the possibility for over-fitting is always with us, it is the case that future skill measures would likely be worse than those seen in the old data.

Most of this post is extracted from an upcoming (already accepted) paper, which will be posted when it is published. This post is only a bare sketch.