The objection which will occur to those, Lord help them, who have had some statistical training is that “increased” means a combination of “linear increase” and “significance.” These objections, as we’ll next see, are chimera, but the fault they are made lies with me. Mea culpa! I hereby accept blame for the poor statistical education most people receive. We statisticians often do a terrible job teaching our subject to outsiders (ask any student and they will agree with this). We know we do poorly because scarcely anybody remembers what these and other statistical concepts mean once they leave the classroom (however, their ignorance rarely affects their confidence). For my penance, this article of clarification.
Suppose you say that “increased” meant that the data did not decrease in a statistically significant linear fashion. That is, you are willing to allow that the actual data “had a downward trend”, but that this trend—by which you mean a straight line drawn through the data—was not “significant,” and that therefore an “increase” of some kind was still a possibility. The data is shown below, with a regression line drawn through.
First we have to focus on eq. (13). We must start, continue, and end with the idea firmly in mind that the actual data did not in fact “increase” (by our definition).
Second, the regression line is a model: call it Mr. It sounds as if we want to compute
(18) Pr(X decreased | Mr),
but this is not what people want when they think about classical statistics. If we mean by “decreased” the opposite of “increased”—that is, that X went down more often than it increased or stayed the same—we can calculate (18) (or any other function of the observed data), but nobody does. They instead calculate one of two different things, depending on whether they are a frequentist or Bayesian.
Before we get to that, we first have to understand what Mr means. We don’t have to get overly specific; all we have to know is that Mr is indexed by unobservable parameters, only one of which (for this simple regression) has to do with the “trend”: call this parameter θ. It helps to think of it as the slope of the line we drew. High school geometry tells us that if θ > 0, then the line will go up, and that if θ < 0 then the line will go down. If θ = 0 then the line will be flat.
A frequentist will calculate
(19) Pr( F(X) > f(X) | Mr, θ = 0),
where f(X) is an ad hoc function of the observed data, F(X) is the same function over data never seen, and where both are subjectively chosen from a very large supply of functions (usually the absolute value of the functions are taken). The probability assumes that the “experiment” that gave rise to X will be repeated indefinitely, and that for each repetition a new F(X) will be calculated. (19) is thus the probability of seeing a larger F(X) than the actual f(X) in these repetitions, assuming Mr is true but with its “slope” parameter set equal to 0. If (19) is “small” θ is said to be “not zero”; if instead (19) is “large”, θ is said to be 0 and the trend “not statistically significant.”
A Bayesian will calculate1
(20) Pr( θ < 0 | Mr & X & E),
which is the probability that the slope is less than 0, but still assuming that the model is true and given the old data and something called “E”, which is the evidence we need to tell us about the parameters before we see any data. We call this information the “prior”, but we needn’t spend any time on it, because happily for simple regression models like Mr the frequentist and Bayesian will agree about θ. For when (20) is “large”, (19) will be “small”, and θ will be declared not to be 0; and when (20) is “small”, (19) will be “large”, and θ will be declared to be 0.
It turns out that for this data, (19) is about 10-16, which is “small”, and (20) is about 1 – (19)/2, which is “large.” For this data, classical statisticians would announce, “X did in fact decrease” or “There was a statistically significant decrease in X.” A person ignorant of any statistics will have calculated (13) long ago and concluded that, yes, the data did in fact decrease, because it did.
But suppose instead that (19) was “large” and (20) “small”, but that (13) still holds. Then the statistician would say, “The decrease in X was not statistically significant.” Unfortunately for the statistician, this is not equivalent to “X did not decrease”, because we have already agreed that it did. This situation is thus somewhat akin to Congresspersons who say “The budget is being cut” when what they mean is “We are reducing the amount of increase, but there is still an increase.” Well, that’s statistics for you.
Now, as stated earlier, we could have computed (18) and said something about the probability of the actual observable X itself decreasing given the model2. (18) is not (19) nor is it (20) (but all assume the model is true), and in general these probabilities won’t match. It turns out that (18) is easy to calculate, but in order to do so we must first supply a guess of the parameters of Mr or, if you are a “predictive” Bayesian, to guess the parameters and then “integrate them out.” That is, Mr = Mr(parameters), so before we can compute (18) we need to plug in guesses of the parameters. Bayesians can actually integrate out all uncertainty in the parameters, frequentists do something else. It doesn’t matter which method you choose—pick maximum likelihood if you’re a frequentist, or say a BLUP estimator, or on and on, all the way to frequentist predictive techniques, which sometimes mimic the Bayesian predictive techniques. All we need understand is that a guess for parameters has been made and that the uncertainty in these guesses has been accounted for. (18) can then be calculated.
Suppose, after you’ve done this, (18) is “small”. We earlier saw that (18) was like (14) or (15), so just because (18) is “small” (or “large”), this does not change (13), which states that, given the observations and our definition of “decrease”, the data did in fact decrease. (18) is conditional on a model, which we assume is true. (13) is conditional on the observations, which we assumed was error free.
If you’re unhappy about this, you have two statistical options. The first is to change the model. We pulled Mr out of a hat anyway, so why not try different M? You’re bound to find one that agrees with what you wanted, which was to say that the data did not decrease. That is, you will surely, if you search hard enough, find an M which gives pleasing results for (18) — (20). After all, who said Mr was true? Nobody. We just assumed it. We’ll talk later about how to tell how good Mr is. All we have to understand here is that we can’t talk about “significance” or “trend” without assuming a model: you can’t have one without the others. It is an impossibility.
The second option is to include more data. Reject the original question, which was “Did X increase from time 1 to 156?”, or say it wasn’t really what you meant, and that you instead meant, “X increased over the longer term.” That’s certainly vague enough, and gives you room to play because it frees you from saying exactly what you mean by “longer term.”
But, invariably, there will be somebody who pins you to the wall and insists that you define, exactly, precisely what you mean by “over the longer term.” At this point you’re stuck3, for when you pick an exact start date, say X-n where negative indexes indicate time before 1, all of what was outlined above still holds. That is, all we need do is glance at the data and compute the new (13) and even the new (18) — (20) for this X-n. Depending on the start date, these four numbers will either agree or not: they will anyway change with every new start date. And for every new model M, including physical models. And all the while, (13) remains fixed and unbudgeable.
Oh, my. We still haven’t gotten to what to do if X is measured with error, or what the physical models mean, or what is a good or bad model. Stick around.
————————————————————————————-
1A Bayesian might calculate a “Bayes factor” instead of (20), but this difference does not matter here, because the conclusion would be the same. I mean, the interpretation (the meaning of what follows) would be the same.
2We might have to modify the notation of (18) to indicate whether we’re computing this probability before seeing X or after it.
3Good joke!
‘Suppose you say that “increased†meant that the data did not decrease in a statistically significant linear fashion’
I really don’t understand why the discussion here continues to focus on definitions of warming that climatologists wouldn’t actually use, it seems to me a recipe for misunderstandings.
In most sciences it would be normal for there to be three possibilities, a statistically significant increase, a statistically significant decrease or a rate statistically indistinguihable from zero.
To test for a statistically significant increase or decrease, the standard (but flawed) procedure would be to take the null hypothesis to be that the underlying trend is zero and claim a statistically signiicant increase or decrease if the probability of an observed trend at least as large under the null hypothesis were sufficiently unlikely. This has a degree of build in self-skepticism as it assumes there is no meaningfull increase or decrease unless the possibility of there being a flat trend could be effectively ruled out.
What wouldn’t be done is to assume that there was an increase if there was not a statistically significant decrease, as this has no self skepticism, it assumes that there in an increase unless proven otherwise.
The thing that is normally omitted is that if the trend is not statistically different from zero, the power of the test should be examined to see if we should be surprised by that.
Now if you want to criticise statistical practices used in climatology then that is fine, but you need to restrict the discussion to practices that are actually used in climatology.
“But, invariably, there will be somebody who pins you to the wall and insists that you define, exactly, precisely what you mean by “over the longer term.†At this point you’re stuck”
Not really, a reasonable definition of “over the longer term” in this case would be “long enough for the test to have sufficient statistical power that we should expect to be able to reject the null hypothesis when it is false”.
Of course the period also needs to be short enough that a linear approximation to the forced response is a reasonable assumption.
DM,
“‘long enough for the test to have sufficient statistical power’…short enough that a linear approximation to the forced response is a reasonable assumption.”
In other words, you are assuming you can fit a straight line somewhere over a range of data, as long as you can pick and choose that range. Well, this is likely true. That is, of course, no proof that this arbitrary statistical model is best or true.
But whether it is true, it does not change whether the data in fact “increased” (by our definition, or any other we accept) or “decreased” over that range. Or whether it “increased” etc. over some other range.
You are still stuck.
Dr Briggs “you are assuming you can fit a straight line somewhere over a range of data, as long as you can pick and choose that range.”
No, I’m sorry but I said nothing of the sort. If you use a trend long enough for the test to have useful statistical power then the exact start and end date are unlikely to make a difference. Use the most recent data point as the end date, extend back far enough to have satisfactory statistical power.
This is pretty basic frequentist statistics, you determine the sample size using statistical power, you then test for significance.
“That is, of course, no proof that this arbitrary statistical model is best or true.”
Nobody is claiming that it is proof that it is best or true (in fact we know a-priori it is neither). However, it is reasonable to assume that a linear model may be a reasonable local approximation to the underlying function about which we are trying to perform inferences.
“But whether it is true, it does not change whether the data in fact “increased†(by our definition, or any other we accept) or “decreased†over that range. Or whether it “increased†etc. over some other range. ”
There is a difference between the measured trend and the forced response of the climate that is the focus for inference for the climatologists. The measured trend either is increasing, decreasing or exactly zero. Our inference of the forced response on the other hand is uncertain (and based on a model giving a local approximation), and it is either significantly warming, significantly cooling or statistically indistinguishable from flat. The last of these three outcomes is key to the discussion. If you use too short a timespan, the test will have little statistical power, and you should expect “statistically indistinuishable from flat” even if the forced trend is actually warming or cooling.
DM,
The start and end date do make a difference. Anyway, whether or not that is true, you are still fitting a probability model—whichever kind you pick—over some range. Over that range, it will either unambiguously true that the data “increased” or “decreased” by whatever definitions we agree upon, regardless with what your probability model tells you. Now tell me: do you agree with this or not?
If you agree, then all that other “forcing, etc., etc.” is utterly beside the point. If you do not agree, then we must here part ways.
Incidentally, I do take on your “forcing” in Part IV. Also by the by, you seem to be arguing for your probability model using the argument of authority. As in “my model has good power over such and such conditions.” What evidence have you that your model is true? Custom? After all, there are an infinite number of models that can be fit to the data.
You are in danger of falling—or perhaps you have already tumbled—into the common statistical error of substituting a model for reality (the observations).
Dr Briggs, as I have pointed out there is a difference between the measured trend on the data (which is either increasing or decreasing, unless it is numerically zero) and the trend of the underlying process that gives rise to the data. The climatologists are interested in and talking about the latter. We can’t always unambiguously say whether the underlying trend is increasing or decreasing, becuase the inference is inherently uncertain and the error bars on the inferred trend may include both positive and negative values. That is why we have tests of statistical significance.
“You are in danger of falling—or perhaps you have already tumbled—into the common statistical error of substituting a model for reality (the observations).”
I find it rather difficult to see how you could come to such a conclusion given that I have already explicitly stated ON THIS THREAD that we know a-priori that the linear model is not true, but that it may be a reasonable local approximation to the underlying function of interest.
“If you agree, then all that other “forcing, etc., etc.†is utterly beside the point.”
yet again you are missing the point that the observations are a combination of the forced response and the unforced response. The climatologists are talking about the forced response, so the distinction is directly relevant to any discussion of what the climatologists claim. The observations tell you unambiguosly about the combination of the forced and unforced response, they do not tell you unambiguosly about the forced response (climatology would be much easier if they did). To ignore this distinction is merely to unwittingly set up a straw man.
Dr Briggs wrote “What evidence have you that your model is true?”
How many times do I have to quite GEP Box – “all models are wrong, but some are useful”. The idea that any model is true is fundamentally wrong-headed, the question is not whether the model is true, but whether it is a reasonable approximation over the region of interest.
DM,
You may quote Box until thy last breath. But it doesn’t make his aphorism true; indeed it is false. Under my Start Here tab I have a discussion of this.
If you say that your model is “approximately true”—whichever probability model you like: feel free to pick your juiciest, and why not pick any start and end dates you like?—then it will still be the case you have a model and whatever probability statements that model makes are irrelevant to the statement (13) makes.
Dr Briggs, your post on Box’s quote appears to me to be completely missing Box’s point. Of course in theory there is a true model, in the sense that an exact duplicate of the system is a “true model” (c.f. my example based on alternate Earths in parallel universes). However statistical models are necessarily simplified approximations to reality, simply because the “true model” it too complex to be analytically or computationally tractable.
So in any non-trivial statistical exercise, all statistical models are wrong, but some are useful approximations. As I said it is deeply wrong-headed to think that any statistical model of observed reality is true, the question is whether it is an adequate approximation.
As to why not pick any start and end dates I like. Well I could pick today and yesterday. Where I live it is about 1 degree colder today than it was yesterday. Is this evidence that the climate is cooling? If not, why not?
The pouched contributor still seems to have an obsession with ‘the trend’, an affliction that is widespread among climate scientists and their devoted followers (even worse, he speaks of ‘the forced trend’).
I have lost count of the number of times our patient host has written about this delusion, in one post (https://www.wmbriggs.com/blog/?p=5107) admitting to exasperation, and emphasizing
The lesson is, of course, that straight lines should not be fit to time series.
It’s refreshing to see a debate involving climate that doesn’t resort to name calling.
DM,
You say climatologists are interested in “the trend of the underlying process that gives rise to the data,” and “the observations are a combination of the forced response and the unforced response.”
If I understand what you said, climatologists are interested in a particular cause-effect relationship (forced response) that is obscured in the real world (observations) by something else (unforced response). But whatever is “unforcing” is a cause, too.
The assumption behind all this is that climate is an effect caused by many phenomena, and climatologists wish to separate out all the causes so as to extract the effect of one among many. The “interesting” cause is anthropogenic CO2, and the “interesting” effect is global warming.
Even though global temperatures have been decreasing for 10+ years, it does not mean that CO2 has no effect, it just means the other causal phenomena have over-ridden (out forced) CO2.
The observed temperatures reveal something. They show that the effect of (anthropogenic) CO2 is weak compared to other causal phenomena.
That should be comforting to climatologists who claim CO2 is a crisis (past the tipping point, runaway warming, oceans boiling, end of life as we know it, etc.) There is nothing to panic about, no need to re-order civilization, tax to the max, inflict deprivation and suffering, etc., because the observations indicate that CO2 is a weak causal phenomenon.
There are such climatologists, you know, who wish to raise a panic, and politicians who cater to that kind of thing. But as rational adults, we don’t need to play that game. We can make reasonable inferences from the data, as described above.
Thank you Dr. Briggs
This phiz cist student who has to make living as an Injuneer is still awake!
Well – at least kind of.
Uncle Mike, The unforced response is chaotic an quai-cyclic, so over the longer term it cancels out. It is perfectly true that the forced response is small compared to the unforced response on a decadal scale, but unlike the unforced response it isn’t cyclic and doesn’t cancel out. So on the scales (that ought to be) relevant to policymakers, forced response will dominate unforced variability. This is why decadal trends are not informative when it comes to climate projection.
As I said before, you need to understand the climatology to understand what the relevant questions are. A statistician that approaches the observations as just a set of numbers with no interest in the process generating the data is unlikely to perform a useful analysis.
As for tipping points etc. The IPCC report is a good representation of what mainstream science has to say about the plausibility of different outcomes. AGW is likely to cause plenty of problems for the worlds population without any tipping points being reached or a runaway greenhouse effect. If you want to make arguments about tax etc, you need to address what mainstream science actually says, rather than just the straw man of the most extreme outcomes that are possible, but not likely.
DM – If the unforced response is cyclical, and the forced response is small compared to the magnitude of the cyclical unforced response (despite the fact that unforced cancels out), why wasn’t any of this taken into account in any climate models – call them predictions, projections, whatever – that I have seen? Are there any climate models that project a long-term increase in GAT, but projected either a decrease or relative flatness in the past decade? If the unforced response is simply cyclical, it seems there should be some degree of predictability.
The unforced response is chaotic and hence inherently unpredictable. It is only quasi-cyclical, it goes up and down and tends to pretty average out is you look at a timescale long enough to include a few (quasi-)cycles. The models do exhibit the same sort of unforced variability in individual model runs, so the existence of decadal periods of little or no warming is predicted by the models, see e.g. Easterling and Wehner
http://www.agu.org/pubs/crossref/2009/2009GL037810.shtml
however as unforced variability (e.g. ENSO) is chaotic, the models can’t be expected to predict when they will happen.
It is like a double pendulum, it goes back and forth, so it is quasi-cyclical, but you can’t predict its trajectory, which is chaotic (deterministic, but extremely sensitive to initial conditions). However if you average the position of the pendulum over a sufficient number of quasi-cyles, it will point in the direction in which gravity pulls on it. If we were to slowly bring a magnet towards it, it would still oscillate unpredictably, but the average position of the pendulum (which happens to be made of iron) would become increasingly biased towards the magnet. The effect of the magnet is small compared to the magnitude of the oscillations, but given sufficient data, it is detectable.
Fancy theory, DM. Unpredictability is just another word for we don’t know.
If the effect of the magnet (CO2) is small compared to the magnitude of the oscillations (the observed weather data), then we need not get all worked up about it, right? No need to tax carbon, or declare CO2 to be a pollutant, or huddle in the cold and dark.
CO2, after all, is the essential nutrient of life. And warmer is better for numerous reasons. All the malarky about tipping points and Thermageddon was based on a postulated massive effect, and we know the effect is so slight as to be difficult to detect. It can’t be found in the noise of the data.
What are f and F? All you say about them is that they’re chosen ad-hoc. Does it matter what functions you choose? If it affects the result, shouldn’t they be considered part of the model?
Perhaps it’s intentional, but from the way it’s phrased I can’t see why you’d want to introduce these functions at all. Maybe an example would help – in fact I’ve been thinking that a lot of your points might be clearer if you worked through the analysis as a demonstration.