Skip to content
March 9, 2008 | 17 Comments

You cannot measure a mean

I often say—it is even the main theme of this blog—that people are too certain. This is especially true when people report results from classical statistics, or use classical methods when implementing modern, Bayesian theory. The picture below illustrates exactly what I mean, but there is a lot to it, so let’s proceed carefully.

Look first only at the jagged line, which is something labeled “Anomaly”; it is obviously a time series of some kind over a period of years. This is the data that we observe, i.e. that we can physically measure. It, to emphasize, is a real, tangible thing, and actually exists independent of whatever anybody might think. This is a ridiculously trivial point, but it is one which must be absolutely clear in your mind before we go on.

I am interested in explaining this data, and by that I mean, I want to posit a theory or model that says, “This is how this data came to have these values.” Suppose the model I start with is

A: y = a + b*t

where y are the observed values I want to predict, a and b are something called parameters, and t is for time, or the year, which goes from 1955 to 2005. Just for fun, I’ll plug in some numbers for the parameters so that my actual model is

A’: y = -139 + 0.07*t

The result of applying model A’ gives the little circles. How does this model fit?

time series data

Badly. Almost never do the circles actually meet with any of the observed values. If someone had used our model to predict the observed data, he almost never would have been right. Another way to say this is

Pr(y = observed) ~ 0.04

or the chance that the model equals the observed values is about 4%.

We have a model and have used it to make predictions, and we’re right some of the time, but there is still tremendous uncertainty in our predictions left. It would be best if we could quantify this uncertainty so that if we give this model to someone to use, they’ll know what they are getting into. This is done using probability models, and the usual way to extend our model is called regression, which is this

B: y = a + b*t + OS

where the model has the same form as before except for the addition of the term OS. What this model is saying is that “The observed values exactly equal this straight line plus some Other Stuff that I do no know about.” Since we do not know the actual values of OS, we say that they are random.

Here is an interesting fact: model A, and its practical implementation A’, stunk. Even more, it is easy to see that there are no values of a and b that can turn model A into a perfect model, for the obvious reason that a straight line just does not fit through this data. But model B always can be made to fit perfectly! No matter where you draw a straight line, you can always add to it Other Stuff so that it fits the observed series exactly. Since this is the case, restrictions are always placed on OS (in the form of parameters) so that we can get some kind of handle on quantifying our uncertainty in it. That is a subject for another day.

Today, we are mainly interested in finding values of a and b so that our model B fits as well as possible. But since no straight line can fit perfectly, we will weaken our definition of “fit” to say we want the best straight line that minimizes the error we make using that straight line to predict the observed values. Doing this allows us to guess values of a and b.

Using classical or Bayesian methods of finding these guesses leads to model A’. But we are not sure that the values we have picked for a and b are absolutely correct, are we? The value for b might have been 0.07001, might it not? Or a might have been -138.994.

Since we are not certain that our guesses are perfectly correct, we have to quantify our uncertainty in them. Classical methodology does this by computing a p-value, which for b is 0.00052. Bayesian methodology does this by computing a posterior probability of b > 0 given the data, which is 0.9997. I won’t explain either of these measures here, but you can believe me when I tell you that they are excellent, meaning that we are pretty darn sure that our guess of b is close to its true value.

Close, but not exactly on; nor is it for a, which means that we still have to account for our uncertainty in these guesses in our predictions of the observables. The Bayesian (and classical1) way to approximate this is shown in the dashed blue lines. These tell us that there is a 95% chance that the expected value of y is between these lines. This is good news. Using model B, and taking account of our uncertainty in guessing the parameters, we can then say the mean value of y is not just a fixed number, but a number plus or minus something, and that we are 95% sure that this interval contains the actual mean value of y. And that interval looks pretty good!

Time to celebrate! No, sorry, it’s not. There is one huge thing still wrong with this model: we cannot ever measure a mean. The y that pops out of our model is a mean and shares a certain quality with the parameters a and b, which is that they are unobservable, nonphysical quantities. They do not exist in nature; they are artificial constructs, part of the model, but you will never find a mean(y), a, or b anywhere, not ever.

Nearly all of statistics, classical and Bayesian, focuses its attention on parameters and means and on making probability statements about these entities. These statements are not wrong, but they are usually beside the point. A parameter almost never has meaning by itself. Most importantly, the probability statements we make about parameters always fool us into thinking we are more certain than we should be. We can be dead certain about the value of a parameter, while still being completely in the dark about the value of an actual observable.

For example, for model B, we said that we had a nice, low p-value and a wonderfully high posterior probability that b was nonzero. So what? Suppose I knew the exact value of b to as many decimal places as you like. Would this knowledge also tell us the exact value of the observable? No. Well, we can compute the confidence or credible interval to get us close, which is what the blue lines are. Do these blue lines encompass about 95% of the observed data points? They do not: they only get about 20%. It must be stressed that the 95% interval is for the mean, which is itself an unobservable parameter. What we really want to know about is that data values themselves.

To say something about them requires a step beyond the classical methods. What we have to do is to completely account for our uncertainty in the values of a and b, but also in the parameters that make up OS. Doing that produces the red dashed lines. These say, “There is a 95% chance that the observed values will be between these lines.”

Now you can see that the prediction interval—which is about 4 times wider than the mean interval—is accurate. Now you can see that you are far, far less certain than what you normally would have been had you only used traditional statistical methods. And it’s all because you cannot measure a mean.

In particular, if we wanted to make a forecast for 2006, one year beyond the data we observed, the classical method would predict 4.5 with interval 3.3 to 5.7. But the true interval for the prediction of the interval, while still 4.5, has the interval 0.5 to 9, which is three and a half times wider than the previous interval.

…but wait again! (“Uh oh, now what’s he going to do?”)

These intervals are still too narrow! See that tiny dotted line that oscillates through the data? That’s the same model as A’ but with a sine wave added on to it, to account for possibly cyclicity of the data. Oh, my. The red interval we just triumphantly created is true given that model B is true. But what if model B was wrong? Is there any chance that it is? Of course there is. This is getting tedious—which is why so many people stop at means—but we also, if we want to make good predictions, have to account for our uncertainty in the model. But we’re probably all exhausted by now, so we’ll save that task for another day.

1Given the model and priors I used, this is true.
March 7, 2008 | No comments

Afternoon at GISS

Tim Hall at the Goddard Institute for Space Studies invited me to give a seminar on statistical hurricane modeling. A link to my presentation is below.

Tim, with Stephen Jewson, is doing some interesting work on modeling hurricane tracks, so far mainly in the Atlantic. He has some papers on the GISS web site which you can download. He’s using this work to better quantify landfall frequencies, which are of obvious interest.

What I found most intriguing is that he’s able to show how the location of tropical storm cyclogenesis shifts towards Africa as sea surface temperature increases. Storms born here can tend to be stronger, but they are also less likely to make landfall in the US because of the greater distance.

I got some good comments on my model. Some people did not like that I used the AMO and instead asked for direct SST measures. Well, some like the AMO and some don’t. But I’m perfectly happy to try SSTs. At the least, it’ll make my model a better forecast model.

Didn’t get to meet Hansen, as he’s obviously too busy most of the time. Tim told me that he receives so many requests to come and give talks, that some of the other staff sometimes takes his place.

Here is my talk, in PDF format. Not too many words on the slides, I’m afraid, as I really hate words on slides. Nothing worse than having somebody read words on a slide that everybody in the room can already see. But you can go to my resume page and download the paper to get some words.

March 5, 2008 | 7 Comments

Titan TV’s short piece on the Heartland Conference

A couple of days ago I wrote that people from Titan TV interviewed me, and a slew of others, at the Heartland Climate Conference. Their piece is now on the web and can be found here. I didn’t make the cut, sadly; proving once again I have the perfect face for radio.

I gather, by the selection and arrangement of the sounds bites presented, the Titan TV reporter was attempting irony and humor, which I can tell you ain’t easy. Most who try fail.

Oh—and you’ll get this if you watch the two-minute video—I do not own a car, or motorcycle, or any other form of transportation, not even a bike, and I have not owned any of these for over a decade. I walk most places and I actually do use those miniature fluorescent light bulbs to illuminate my exorbitantly expensive 800 square feet, but only to foil Con Edison’s plan to take as much of my paycheck as the money-besotted Congress does.


Heartland Climate Conference Summary

This is an editorial that I sent out to various places.

I am one of the scientists that attended the recent Heartland Climate Conference in Manhattan, where I live. It is my belief that the strident and frequent claims of catastrophes caused by man-made global warming are stated with a degree of confidence not warranted by the data.

Although it is a logically fallacy to invoke this argument against opponents, let me say first that I have never accepted any money (except my graduate student tuition) for the work I have done in statistical meteorology and climatology. Incidentally, it isn’t because I wouldn’t, it’s just that nobody’s ever offered. I also did not get the one-thousand dollar honorarium from Heartland for speaking at this conference.

At the conference, I presented the same original research that I recently gave at the American Meteorological Society conference in New Orleans. I serve on the Probability and Statistics Committee of the AMS. This work was based on a paper I wrote and is about to appear in the Journal of Climate that shows that the number of tropical storms and hurricanes have not increased in number or intensity since we have had reliable satellite measurements. I also find that previous crude statistical methods others have used to analyze hurricanes have given misleading results.

It is trivially true that man, and every other organism, influences his environment, and hence his climate. It is only a question of how much, is it harmful, and can the harm be mitigated. It is indisputable that mankind causes climate change, even harmful change. But most of this change is local and due mainly to land use modifications. For example, replacing a forest with crop land creates different heat exchange characteristics in the boundary layer. These differences are easily measurable: cooler nighttime temperatures over crop land is an easy example.

It is important to recognize that some changes to our climate are beneficial. That converted crop land, for example, feeds people, which most would agree is a benefit. Diverted and dammed rivers provide water.

We also know with something near certainty that carbon dioxide has been increasing since the late 1950s. We are less certain, though nearly sure, that it has been increasing since about 1900. Before this date, we are even less certain of the global average amount. The reason is that before 1959 there were no consistent direct atmospheric measurements and so we must estimate the values based on proxies. Converting proxies to estimates requires statistical modeling. Part of every statistical model is, or should be, a quantification of the uncertainty of the estimates. This uncertainty is known by those who convert the proxies, but nearly always forgotten by those who use the estimates as input to climate or economic models.

It is absolutely clear that mankind is responsible for a portion of the carbon dioxide increase. What most people—not climatologists, but others—do not know is that this portion is only a fraction of the increase. The rest of the increase is due to other causes. These causes are not fully understood—a sentence you have often seen, and which means that we are not certain.

Temperatures have been directly measured for a little over a century. The number of locations at which temperature is taken has gradually increased, reaching something like full coverage only in the last thirty to forty years. It is certain that at many individual stations mankind has caused changes in measured temperature. Mankind caused both warming, due to the urban heat island effects, and cooling, such as by land use changes.

Joining these disparate measurements, and controlling for the changes and increases in locations, and the changes known to be due to urban heat island and other land use changes, to form an estimate of global average temperature again requires statistical modeling. And very difficult and uncertain statistical modeling at that. The resulting estimate should be presented with its error bounds, though it never is. These error bounds are currently larger than any projected increases in temperature, which makes it difficult or impossible to verify climate model output.

Surprisingly, climate models are not certain. We have deduced, and therefore know, the fundamental equations of motion, but there is some uncertainty in how to solve them inside a computer. We also are fairly sure of the physics of heat and radiative transfer, but there is large uncertainty in how to best represent these physics in computer code because climate models describe processes at very large scales and heat physics take place at the microscopic level. So these physics are parameterized, which increases the uncertainty in the climate model forecast.

All climate models undergo a “tuning” process, whereby the parameterizations and other parts of the computer code are tweaked so that the model better fits the past observed data. This necessary step always increases the uncertainty we have in predicting independent data, which is data that has not been used in any way to fit or tune the models. And it is a fact, and therefore certain, that, so far, climate models have over-forecast independent data, meaning that they have said temperatures would be higher than have actually occurred.

Lastly, there is the abundance of secondary research that uses climate model output as fixed input. This is the work that shows global warming causes every possible ill. I have never met one of these studies that quantified the uncertainty due to assuming climate models are error free. This means that their conclusions are vastly overstated.

Too many people are too confident about too many things. That was the simple message of the Heartland conference, and one that I hope sinks in.

Update 6 March: I have been getting some private questions, so I wanted to emphasize that I have not even gotten grant money to do my meteorology/climatology work. Any grant money I did get was from my advisor for my research fellowship in mathematical statistics when I was a graduate student. Since then it has been in the form of NIH and private foundation grants for biostatistical work. Unlike most climate researchers, I do it for fun and not for profit.