Statistical Significance Does Not Mean What You Think. Climate Temperature Trends. Ithaca Teaching Journal, Day 7

There are a subset of professional statisticians—defined as folks who have had formal training in the field given to them by other professionals—who feel that the field of statistics is useless. That if you have to prove anything using statistics, what you have “proven” stands a good chance of being a chimera. That if you need “statistically significant” findings, what you’ve “found” might be nothing more than the reflection of your desires. (Climate example is below.)

Now, if I hadn’t gone to one too many Finger Lakes wineries on Saturday, not only would I expand that theme, but I would quote from a famous twentieth-century physicist who put the matter better than I ever could. As it is, you’re on your own.

We already know that what is important in any experiment the only crucial thing is control. “Randomness” is not only not needed, it can be harmful. Again, I beg of you to consider physics.

If I desire, say, to measure the spin of certain particle, I will design an apparatus that controls for every possible variable that might affect my results. I will leave as little as possible to “randomness” and I will certainly not purposely infect my experiment with it.

It is true—it is always true for any experiment of any kind—that I might miss controlling for the variable or variables that are responsible for my results. But if I am careful, diligent, and have paid strict attention to the prior evidence, it is less likely I will miss these variables.

Anyway, I conduct the experiment and publish my results. And that’s when other physicists try to reproduce what I have done; such reproduction further reduces the risk I missed controlling variables. The most usual missed-controlling variable are my own biases, which cause me to misinterpret what I have seen, such that I write down not quite what happened, but what I wanted to happen. The people re-conducting the experiment (usually) won’t have my biases; thus, they stand a better chance of reporting what actually occurred.

Think now of standard sociological experiments, where I might be interested in how sex or race (are there any other topics?) affect some measure such as answers on a survey question. I then “randomize” (i.e., introduce noise) into my survey by calling “random” people on the phone, or more usually by grabbing data from a publicly available database, itself gathered by calling people on the phone, etc.

I then statistically model how each survey question is a function of sex or race. If the software spits out a p-value less than the magic, never-to-be-questioned number of 0.05, I announce that my worst fears have been realized and that sex and race are associated with attitudes towards…whatever it was I was asking about.

It is absurdly easy to generate publishable p-values. I often say that if you can’t find a small p-value in your data, then you haven’t tried hard enough. A small p-value, which means you have found “statistically significant” results, does not say what you think it says. It says nothing about how people of different races or sexes think about your survey question. It instead says how improbable a certain function of your data is, but only if you assume some very dicey premises which have nothing to do with the data.

Worst of all is that once other sociologists look at your results, they will almost immediately agree with them. Reproductions of sociological results are rarer than unicorns. Instead, you will find the same experiment carried out in different sub-populations, announced with, “Although much is known about X, nobody has yet written about X among under-served Eskimos…What a depressing field.

Climate Temperature Trends

Here I must beg indulgence and will jump right to the math. This is for my climatological colleagues. If y is some temperature and t time, a common model fragment might be

    yt = β0 + β1t + other stuff

where if β1 is positive we can announce that there is a “trend” in the data. Of course, there is no reason in the world to create this model. All we have to do is look at the data and ask whether y has or hasn’t increased over years we have observed. Perhaps it is because that question is considered too simple, thus learned, peer-reviewed papers cannot be written answering it.

Anyway, the statistical model will spit out a p-value on the assumption that β1 = 0 and on the assumption that the model is perfectly true. The p-value will tell you how unlikely is some statistic given those assumptions.

The p-value does not tell you if

    Pr (β1> 0 | model and observed data).

Nor will it tell you the probability β1 takes any value. It tells you nothing whatsoever about β1. Not one damn thing. Nothing. I hope it is plain that a p-value gives no evidence of any kind about β1.

Whether or not y has increased over time, we could have seen with just a glance. But perhaps we’re interested in predicting whether y will continue increasing (or decreasing). Then we need our model. Again, the p-value gives no evidence whether a trend will continue. As in none. As in zero.

But we might turn into Bayesians and compute

    Pr (β1> 0 | model and observed data).

And if that probability is sufficiently high—say it is 99%—we might claim there is a 99% chance that the trend will continue. If we did, we would be saying what is not so. For the posterior probability of β1 tells us nothing directly about future values of y. As in nothing. As in not a thing. As in knowledge of the parameters is not knowledge of the actual future temperatures.

What we can do—if we really are as interested in y as we say we are—is to compute things like this:

    Pr (yt+1> yt | model and observed data).

where the time t+1 is a time beyond what we have already observed. Or this:

    Pr (yt+1> yt + 1oC | model and observed data).

or any other question we have about observable data. I hope it is unnecessary to say that we need not restrict ourselves to just yt+1 and that all these probabilities are conditional on the model we used being perfectly true.

But nobody ever does this. Why they don’t is a great psychological mystery. I have no complete answer, except to say that this lack is consistent with my theory that 95% of human race is insane (a probability sufficiently high such that your author likely belongs to this majority).

19 Comments

  1. This series on why p-value doesn’t mean “statistical significance” & related cautionary essays is significant for anyone depending on, or influenced by, such misuse.

    Which is why you ‘ought’ compile this in something like a *.pdf in easy-to-read layman’s terms for widespread distribution.

    And, given the apathy with which you’ve found so many students at so many ‘places,’ “leaking” such a “why statistics really don’t matter” compilation to just those students — those miserable-if-they-cared-enough souls that are forced to take a stats class to meet degree requirments–will ensure it will get passed around & read (…”hey, check this out…this stats prof I’ve got has written this paper showing why stats don’t matter…”).

    Think about it.

  2. But nobody ever does this. Why they don’t is a great psychological mystery. I have no complete answer, except to say that this lack is consistent with my theory that 95% of human race is insane.

    Wonderfully, it’s also consistent with my theory that only one percent of the human race is insane but nobody knows which one per cent it is.

  3. Rewrite y(t+1)>y(t)as y(t+1)-y(t)>0.

    Replace y(t+1) and y(t) with β(0)+β(1)(t+1)+otherstuff(t+1) and β(0)+β(1)(t)+otherstuff(t), respectively.

    Now, see what has become of the probability statement
    Pr [ y(t+1)>y(t) | model and observed data ]

    So, it has been studied.

    There are a subset of professional statisticians…who feel that the field of statistics is useless.

    I know those professional statisticians don’t need my sympathy, but I am sorry that they feel this way.

  4. There are a subset of professional statisticians…who feel that the field of statistics is useless.

    Oh… Mr. Briggs, if you are one of these statisticians, I would definitely like to try to convince you otherwise.

  5. JH,

    You’re calculations are wrong, wrong, wrong. Plus, they are not right. You have fooled yourself into switching between observables and parameters. You cannot manipulated them in the fashion you have done. People do, that is true, but it is in error.

    When I return to the computer, I will prove this to you. For those in the know, what I have explicated is the posterior predictive distribution.

  6. Excellent point, and a reasonable argument, but the over-the-top rhetoric and poor proofreading really detract from what you are trying to say. Unfortunately, in the social sciences at least, there is a very, very strong tendency to stick with conventions unless you have a very strong argument for using something new. Since many social scientists have a poor grasp of mathematics, they shy away from rocking the boat, and those of us with mathematical backgrounds have an uphill fight just to do our own analyses responsibly. I wish you good luck in your endeavour, but I fear rants like the above will not change anyone’s mind.

  7. We can measure E[y(t+1)] assuming that our model is true.

    And we can measure P(E[y(t+1)]+z>y(t+1)>E[y(t+1)]-z) assuming that our model is true.

    And as we collect out of sample data we look for outliers that tell us that our model is probably crap, and should be discarded. But, most people I know, don’t have out of sample data. With every new datum, they recalibrate the existing model.

  8. Fitting a straight line to data is not a trivial process. When you fit the line to data you are selecting the coefficients of the function Y=A+B*X to minimize a norm of the error vector where the error term is calculated as Ei=Yi-(A+B*Xi). Now, what is the correct norm? You could , for instance minimize the absolute error, the squared error (Euclidian norm) or the maximum error (minimax norm). I have computer programs for doing all three. Most people are familar with curve fitting to minimize the squared error because that is what regression theory requires. Every math and statistical library contains a least squares curve fit program. The assumption is that the errors are normally distributed about the regression line and you want to minimize the variance of the errors. How do you know this assumption is correct? You don’t. In general, you will get different trend lines depending on the norm being minimized.

    This problem becomes more interesting because there are at least three ways to do a least squares curve fit. The usual method is to solve the normal equations. There are inherent mathematical problems with this method. First, the coefficient matrix of the simultaneous equations is trying to become a Hilbert matrix. The Hilbert matrix is very ill conditioned. Second, there is an implicit weighting of the data points dependent on their distance from the center. This means if you are trying to fit a higher order polynomial to lots of data points, you will be lucky to get the right answer. I do not use this method. The other two methods are to fit orthogonal polynomials to the data, just like a Fourier series, or use a Gram-Schmit orthonormalization procedure. I use the orthogonal polynomial method because it’s simple and I consider Gram-Schmit over kill. Gram-schnit is really powerful.

    The problem gets really interesting because anytime you fit a stright line to a finite length of data you will have a spurious trend. As an example, suppose we have data points generated by the function y=Sin(X). There is no trend and the correct straight line fit is obviously Y=0. You will only obtain this result fitting a line to the data if the data starts at zero phase and contains an integral number of cycles. Otherwise you will have a trend depending on the end points and length of the data.

    Now you know why I don’t believe those temperature trends put out by Hansen and others. I don’t care what P value they calculate.

  9. All,

    Much of what is written in the comments is strictly false, but understandably so. But much of it is very useful. I have to run and will update the post or comments sometime this evening.

    JH, your equations are just false. You have forgotten epsilon, which is naughty of you.

    Like I always say, those exposed to the predictive/new methods who have first learnt classical statistics have the hardest time. It’s always difficult to unlearn what you thought you knew.

  10. It is a good thing that climate models are not based on statistics then, isn’t it? Climate models are based on the physics that has been done using controls. This is physics that is neither new (it was first published in 1859 by Tyndall) nor controversial.

    What climate modelers do is compare the results of the observed changes in the system with the observed inputs, and use the well-established physics to predict how the system will respond. That is where they use the statistics and where the p-values come in; not by saying “if this trend goes on” but by saying “this change should create a change of N degrees; what is the probability of such a change arising by chance?”

  11. Whether or not something is scientifically controversial is (or at least should be) irrelevant to the truth value of the theory. Sure, theories that are less controversial should tend to be more likely to be true…but this cannot be invoked as an argument against the “dissenters,” or as an argument of support for a theory.

    Those opposed to theories on ESP oppose it because of a lack of solid evidence. The fact that most people oppose it is irrelevant.

    Those opposed to theories on AGW oppose it because of a lack of solid evidence. The fact that most people support it is irrelevant.

  12. Mr. Briggs,

    I didn’t forget epsilon (=otherstuff)… YOU THINK I DID! Please point it out precisely where I forgot about it in my previous comments… and tell me exactly how I have switched between observable and parameters? And you may not “imagine” what I didn’t write.

    I just came back from the movie X-men: First class. My number-two daughter and I were immediately bothered that we couldn’t remember in which film we had seen the actor who plays Charles Xavier before. At the end of the movie, all of sudden, I remembered that he plays Leto in Children of Dune.

    (I love superhero movies. This is not to say that all superhero movies are good, just to be clear. )

    And guess what else also came to my mind: Pr [ y(t+1)>y(t) | model and observed data ] is easy to calculate once you have the posterior predictive distribution! Anyway, please think about what this probability means and what kind of decision making can be based on it.

    I have always thought a “probability” means something in reality only when you have to make a decision. For example, weather forecast. Do I need to bring an umbrella?

    I stand by my previous comments.

    Furthermore, the following is really unnecessary and wrong! I think that the majority of well-known Bayesians have first learned classical statistics.

    Like I always say, those exposed to the predictive/new methods who have first learnt classical statistics have the hardest time. It’s always difficult to unlearn what you thought you knew.

  13. JH, All,

    I will write about this more tomorrow. Obviously, the probabilities I say you should calculate are from the posterior predictive distribution, one which is parameter free.

    Now, using ordinary regression as an example, the equation

        yt = β0 + β1t + ε

    is not of the y’s, but of the central parameter of the normal distribution which describes our uncertainty in y. That is, our uncertainty in y is represented as a normal distribution with parameters β0, β1, and σ. Mathematically,

      y ~ N(β0 + β1t, σ)

    But understand this does not mean that y is “normally distributed.” No living nor no dead thing is ever “normally distributed.” Our uncertainty in y is represented by a normal.

    Thus, there is no way to get from the equation as written and solve for y. That is because the equation isn’t really of y in the first place (as you know). You must integrate out the unobservable, and entirely metaphysical, parameters β0, β1, and σ. Once this is done, you can calculate the probabilities as I said.

    And let’s never forget that these probabilities are all conditional assuming the model we’ve chosen is perfectly true.

    And let’s also not forget that what I claimed is also true: if want to know whether y increased, decreased, stayed the same, or did any damn thing over the period of time we observed, we need no models whatsoever. All we need do is look!

  14. Matt,

    I think you need to elaborate on y=A+Bt+eps not being the model of y but instead is the uncertainty of y. Most normally think of eps as the uncertainty term. IOW: if my model of y(t) is a straight line with two major parameters (i.e. A and B) which I don’t know but attempt to determine them using an ordinary regression then the model is suddenly transformed into a model of my uncertainty? You really need to explain that. Did you leave something out?

  15. You already know your Ys, as you used them to generate your parameter estimates. What you don’t know is what other parameters are lurking in epsilon that you didn’t include. As i recall, in time series, the Var(y)=Var(epsilon). So when you forecast, you are really modelling the var in epsilon+1, as that will produce the (hopefully) best prediction of y. Or maybe i just am missing briggs point as well.

  16. Wade,

    Not quite. All statements of logic, knowledge, and probability are conditional on certain premises. Those probability statements I made are all true assuming the model is. I cannot learn whether the model is true or false given these probability statements. I must look outside the model to ascertain its truth.

Leave a Comment

Your email address will not be published. Required fields are marked *