March 10, 2008 | 12 Comments

## It depends what the meaning of mean means.

Yesterday’s post was entitled, “You cannot measure a mean”, which is both true and false depending—thanks to Bill Clinton for the never-ending stream of satire—on what the meaning of mean means.

The plot I used was a numerical average at each point. This implies that at each year there were several direct measures that were averaged together and then plotted. This numerical average is called, among other things, a mean.

In this sense of the word, a mean is obviously observable, and so yesterday’s title was false. You can see a mean, they do exist in the world, they are just (possibly weighted) functions of other observable data. We can obviously make predictions of average values, too.

However, there is another sense of the word mean that is used as a technical concept in statistics, and an unfortunate sense, one that leads to confusion. I was hoping some people would call me on this, and some of you did, which makes me very proud.

The technical sense of mean is as an expected value, which is a probabilistic concept, and is itself another poorly chosen term, for you often never expect, and cannot even see, an expected value. A stock example is a throw of a die, which has an expected value of 3.5.

B: `y = a + b*t + OS`

I now have to explain what I passed over yesterday, the `OS`. Recall that `OS` stood for “Other Stuff”; it consisted of mystery numbers we had to add to the straight line so that model B reproduced the observed data. We never know what `OS` is in advance, so we call it random. Since we quantify our uncertainty in the unknown using probability, we assign a probability distribution to `OS`.

For lots of reasons (not all of them creditable), the distribution is nearly always a normal (the bell-shaped curve), which itself has two unobservable parameters, typically labeled μ and σ^2. We set μ=0 and guess σ^2. Doing this implies—via some simple math which I’ll skip—that the unknown observed data is itself described by a normal distribution, with two parameters `μ = a + b*t` and the same σ^2 that `OS` has.

Unfortunately, that μ parameter is often called “the mean“. It is, however, just a parameter, an unobservable index used for the normal distribution. As I stressed yesterday (as I always stress), this “mean” cannot be seen or measured or experienced. It is a mathematical crutch used to help in the real work of explaining what we really want to know: how to quantify our uncertainty in the observables.

You cannot forecast this “mean” either, and you don’t need any math to prove this. The parameter μ is just some fixed number, after all, so any “forecast” for it would just say what that value is. Like I said yesterday, even if you knew the exact value of μ you still do not know the value of future observables, because `OS` is always unknown (or random).

We usually do not know the value of μ exactly. It is unknown—and here we depart the world of classical statistics where statements like I am about to make are taboo—or “random”, so we have to quantify our uncertainty in its value, which we do using a probability distribution. We take some data and modify this probability distribution to sharpen our knowledge of μ. We then present this sharpened information and consider ourselves done (these were the blue dashed lines on the plot yesterday).

The unfortunate thing is that the bulk of statistics was developed to make more and more precise statements about μ : how to avoid bias in its measurement, what happens (actually, what never can happen) when we take an infinite amount of data, how estimates of it are ruled by the central limit theorem, and on and on. All good, quality mathematics, but mostly besides the point. Why? Again, because even if we knew the value of μ we still do not know the value of future observables. And because people tend to confuse their certainty in μ with their certainty in the observables, which as we saw yesterday, usually leads to vast overconfidence.

From now on, I will not make the mistake of calling a parameter a “mean”, and you won’t either.

March 9, 2008 | 1 Comment

## Harmonica Convergence: classic column

#### The Second Harmonica Convergence, by Lem Polomski

I had missed the first Harmonica Convergence as I was touring Lichtenstein with the Borscht Five Polka band, which as you might remember was started by three ex-members of the Traveling Schmenges. So you can imagine my excitement when I discovered that I would be on tour in Arizona for the famed event. I took a diversion and arrived early. I worked feverishly the night before, polishing my best Steinowski—a forty-six holer. I would be ready. At five the next day, many hundreds were milling around on the Mesa, many still in their bathrobes. When would we play! I kept my beloved instrument under wraps as no one else had their harmonica out. Finally everyone joined hands in a large circle. They began humming. Concert C! This was it! They were about to tune up. I could stand no more and pulled out my brass beauty and began an impassioned rendition of the “She’s Too Fat For Me” Polka. I would show these Americans a true Harmonica Convergence! Imagine my confusion when no one joined in. Instead several of the bathrobed convergers chased after me with large crystalline rocks, chanting in a strange language. I left, dejected, and washed away my troubles with some tasty cabbage rolls and coffee and am now a wiser, more spiritually fulfilled person.

## You cannot measure a mean

I often say—it is even the main theme of this blog—that people are too certain. This is especially true when people report results from classical statistics, or use classical methods when implementing modern, Bayesian theory. The picture below illustrates exactly what I mean, but there is a lot to it, so let’s proceed carefully.

Look first only at the jagged line, which is something labeled “Anomaly”; it is obviously a time series of some kind over a period of years. This is the data that we observe, i.e. that we can physically measure. It, to emphasize, is a real, tangible thing, and actually exists independent of whatever anybody might think. This is a ridiculously trivial point, but it is one which must be absolutely clear in your mind before we go on.

I am interested in explaining this data, and by that I mean, I want to posit a theory or model that says, “This is how this data came to have these values.” Suppose the model I start with is

A: `y = a + b*t`

where `y` are the observed values I want to predict, `a` and `b` are something called parameters, and `t` is for time, or the year, which goes from 1955 to 2005. Just for fun, I’ll plug in some numbers for the parameters so that my actual model is

A’: `y = -139 + 0.07*t`

The result of applying model A’ gives the little circles. How does this model fit?

Badly. Almost never do the circles actually meet with any of the observed values. If someone had used our model to predict the observed data, he almost never would have been right. Another way to say this is

`Pr(y = observed) ~ 0.04`

or the chance that the model equals the observed values is about 4%.

We have a model and have used it to make predictions, and we’re right some of the time, but there is still tremendous uncertainty in our predictions left. It would be best if we could quantify this uncertainty so that if we give this model to someone to use, they’ll know what they are getting into. This is done using probability models, and the usual way to extend our model is called regression, which is this

B: `y = a + b*t + OS`

where the model has the same form as before except for the addition of the term `OS`. What this model is saying is that “The observed values exactly equal this straight line plus some Other Stuff that I do no know about.” Since we do not know the actual values of `OS`, we say that they are random.

Here is an interesting fact: model A, and its practical implementation A’, stunk. Even more, it is easy to see that there are no values of `a` and `b` that can turn model A into a perfect model, for the obvious reason that a straight line just does not fit through this data. But model B always can be made to fit perfectly! No matter where you draw a straight line, you can always add to it `Other Stuff` so that it fits the observed series exactly. Since this is the case, restrictions are always placed on `OS` (in the form of parameters) so that we can get some kind of handle on quantifying our uncertainty in it. That is a subject for another day.

Today, we are mainly interested in finding values of `a` and `b` so that our model B fits as well as possible. But since no straight line can fit perfectly, we will weaken our definition of “fit” to say we want the best straight line that minimizes the error we make using that straight line to predict the observed values. Doing this allows us to guess values of `a` and `b`.

Using classical or Bayesian methods of finding these guesses leads to model A’. But we are not sure that the values we have picked for `a` and `b` are absolutely correct, are we? The value for `b` might have been 0.07001, might it not? Or `a` might have been -138.994.

Since we are not certain that our guesses are perfectly correct, we have to quantify our uncertainty in them. Classical methodology does this by computing a p-value, which for `b` is 0.00052. Bayesian methodology does this by computing a posterior probability of `b > 0` given the data, which is 0.9997. I won’t explain either of these measures here, but you can believe me when I tell you that they are excellent, meaning that we are pretty darn sure that our guess of `b` is close to its true value.

Close, but not exactly on; nor is it for `a`, which means that we still have to account for our uncertainty in these guesses in our predictions of the observables. The Bayesian (and classical1) way to approximate this is shown in the dashed blue lines. These tell us that there is a 95% chance that the expected value of `y` is between these lines. This is good news. Using model B, and taking account of our uncertainty in guessing the parameters, we can then say the mean value of `y` is not just a fixed number, but a number plus or minus something, and that we are 95% sure that this interval contains the actual mean value of `y`. And that interval looks pretty good!

Time to celebrate! No, sorry, it’s not. There is one huge thing still wrong with this model: we cannot ever measure a mean. The `y` that pops out of our model is a mean and shares a certain quality with the parameters `a` and `b`, which is that they are unobservable, nonphysical quantities. They do not exist in nature; they are artificial constructs, part of the model, but you will never find a `mean(y)`, `a`, or `b` anywhere, not ever.

Nearly all of statistics, classical and Bayesian, focuses its attention on parameters and means and on making probability statements about these entities. These statements are not wrong, but they are usually beside the point. A parameter almost never has meaning by itself. Most importantly, the probability statements we make about parameters always fool us into thinking we are more certain than we should be. We can be dead certain about the value of a parameter, while still being completely in the dark about the value of an actual observable.

For example, for model B, we said that we had a nice, low p-value and a wonderfully high posterior probability that `b` was nonzero. So what? Suppose I knew the exact value of b to as many decimal places as you like. Would this knowledge also tell us the exact value of the observable? No. Well, we can compute the confidence or credible interval to get us close, which is what the blue lines are. Do these blue lines encompass about 95% of the observed data points? They do not: they only get about 20%. It must be stressed that the 95% interval is for the mean, which is itself an unobservable parameter. What we really want to know about is that data values themselves.

To say something about them requires a step beyond the classical methods. What we have to do is to completely account for our uncertainty in the values of `a` and `b`, but also in the parameters that make up `OS`. Doing that produces the red dashed lines. These say, “There is a 95% chance that the observed values will be between these lines.”

Now you can see that the prediction interval—which is about 4 times wider than the mean interval—is accurate. Now you can see that you are far, far less certain than what you normally would have been had you only used traditional statistical methods. And it’s all because you cannot measure a mean.

In particular, if we wanted to make a forecast for 2006, one year beyond the data we observed, the classical method would predict 4.5 with interval 3.3 to 5.7. But the true interval for the prediction of the interval, while still 4.5, has the interval 0.5 to 9, which is three and a half times wider than the previous interval.

…but wait again! (“Uh oh, now what’s he going to do?”)

These intervals are still too narrow! See that tiny dotted line that oscillates through the data? That’s the same model as A’ but with a sine wave added on to it, to account for possibly cyclicity of the data. Oh, my. The red interval we just triumphantly created is true given that model B is true. But what if model B was wrong? Is there any chance that it is? Of course there is. This is getting tedious—which is why so many people stop at means—but we also, if we want to make good predictions, have to account for our uncertainty in the model. But we’re probably all exhausted by now, so we’ll save that task for another day.

##### 1Given the model and priors I used, this is true.
March 7, 2008 | No comments

## Afternoon at GISS

Tim Hall at the Goddard Institute for Space Studies invited me to give a seminar on statistical hurricane modeling. A link to my presentation is below.

Tim, with Stephen Jewson, is doing some interesting work on modeling hurricane tracks, so far mainly in the Atlantic. He has some papers on the GISS web site which you can download. He’s using this work to better quantify landfall frequencies, which are of obvious interest.

What I found most intriguing is that he’s able to show how the location of tropical storm cyclogenesis shifts towards Africa as sea surface temperature increases. Storms born here can tend to be stronger, but they are also less likely to make landfall in the US because of the greater distance.

I got some good comments on my model. Some people did not like that I used the AMO and instead asked for direct SST measures. Well, some like the AMO and some don’t. But I’m perfectly happy to try SSTs. At the least, it’ll make my model a better forecast model.

Didn’t get to meet Hansen, as he’s obviously too busy most of the time. Tim told me that he receives so many requests to come and give talks, that some of the other staff sometimes takes his place.

Here is my talk, in PDF format. Not too many words on the slides, I’m afraid, as I really hate words on slides. Nothing worse than having somebody read words on a slide that everybody in the room can already see. But you can go to my resume page and download the paper to get some words.