Yesterday’s post was entitled, “You cannot measure a mean”, which is both true and false depending—thanks to Bill Clinton for the never-ending stream of satire—on what the meaning of mean means.
The plot I used was a numerical average at each point. This implies that at each year there were several direct measures that were averaged together and then plotted. This numerical average is called, among other things, a mean.
In this sense of the word, a mean is obviously observable, and so yesterday’s title was false. You can see a mean, they do exist in the world, they are just (possibly weighted) functions of other observable data. We can obviously make predictions of average values, too.
However, there is another sense of the word mean that is used as a technical concept in statistics, and an unfortunate sense, one that leads to confusion. I was hoping some people would call me on this, and some of you did, which makes me very proud.
The technical sense of mean is as an expected value, which is a probabilistic concept, and is itself another poorly chosen term, for you often never expect, and cannot even see, an expected value. A stock example is a throw of a die, which has an expected value of 3.5.
Yesterday’s model B was this
B:
y = a + b*t + OS
I now have to explain what I passed over yesterday, the OS
. Recall that OS
stood for “Other Stuff”; it consisted of mystery numbers we had to add to the straight line so that model B reproduced the observed data. We never know what OS
is in advance, so we call it random. Since we quantify our uncertainty in the unknown using probability, we assign a probability distribution to OS
.
For lots of reasons (not all of them creditable), the distribution is nearly always a normal (the bell-shaped curve), which itself has two unobservable parameters, typically labeled μ and σ^2. We set μ=0 and guess σ^2. Doing this implies—via some simple math which I’ll skip—that the unknown observed data is itself described by a normal distribution, with two parameters μ = a + b*t
and the same σ^2 that OS
has.
Unfortunately, that μ parameter is often called “the mean“. It is, however, just a parameter, an unobservable index used for the normal distribution. As I stressed yesterday (as I always stress), this “mean” cannot be seen or measured or experienced. It is a mathematical crutch used to help in the real work of explaining what we really want to know: how to quantify our uncertainty in the observables.
You cannot forecast this “mean” either, and you don’t need any math to prove this. The parameter μ is just some fixed number, after all, so any “forecast” for it would just say what that value is. Like I said yesterday, even if you knew the exact value of μ you still do not know the value of future observables, because OS
is always unknown (or random).
We usually do not know the value of μ exactly. It is unknown—and here we depart the world of classical statistics where statements like I am about to make are taboo—or “random”, so we have to quantify our uncertainty in its value, which we do using a probability distribution. We take some data and modify this probability distribution to sharpen our knowledge of μ. We then present this sharpened information and consider ourselves done (these were the blue dashed lines on the plot yesterday).
The unfortunate thing is that the bulk of statistics was developed to make more and more precise statements about μ : how to avoid bias in its measurement, what happens (actually, what never can happen) when we take an infinite amount of data, how estimates of it are ruled by the central limit theorem, and on and on. All good, quality mathematics, but mostly besides the point. Why? Again, because even if we knew the value of μ we still do not know the value of future observables. And because people tend to confuse their certainty in μ with their certainty in the observables, which as we saw yesterday, usually leads to vast overconfidence.
From now on, I will not make the mistake of calling a parameter a “mean”, and you won’t either.
Yes, very interesting.
I once used a package called Crystal Ball as an aid in business planning. It did monte carlo simulations. To use it, you had to take your critical variables, for instance margin percentage or unit sales over the years, and assign one of about 10 probability distributions to it.
Well you can imagine. We were way beyond my own ability to guess intelligently, let alone that of the head of product marketing….. Do you think the distribution should be normal or exponential or poisson…? And what should its parameters be…? Impossible.
The only good thing that came out of it was that it brought home to people that their forecasts should be in the middle of a range. You could show them they were not by enquiring what the upper end of the range was, that their forecast was in the middle of. If they told you it could not possibly go up that far, you knew, and could point out to them, they were being too optimistic according to their own beliefs. They didn’t like it, but they had to agree.
It didn’t make them much better forecasters though. It just made them reconcile their plans with their real forecasts.
Anon, Good grief; I had no idea. Well, you can’t question the output from a computer model, can you? After all, a computer produced the numbers.
Another oddity. It turns out that better forecasts are made when people do not get together. Consensus usually leads to overconfidence.
Say, have we heard that somewhere before?
The title of this piece reminds me of a friend’s tongue-in-cheek saying: “You never know….. you know?”
Thanks, Dave. I might have to steal that. It’s actually fairly interesting. If you want to get technical, it the same as saying that sentence A is true where sentence A is “The probability of event B, where B is contingent, is less than 1 but greater than 0.” You know A but you never know B.
Well, if that wasn’t boring, I don’t know what is.
I’m normally a lurker. Adiscussion at Anthony Watts weblog has become quite heated concerning Richard Lindzen’s comment concerning no significant warming since 1997.
“A note from Richard Lindzen on statistically significant warming”
http://wattsupwiththat.wordpress.com/2008/03/11/a-note-from-richard-lindzen-on-statistically-significant-warming/
Dr. Briggs, have you ever addressed this issue? I’d be interested in your analysis.
Thanks.
Dr. Biggs,
I’m another lurker. Really enjoy your blog. Wish I understood even half of it 😉
JM’s comment brought to mind a related question. Have you, or will you offer an essay on the capability (or not) of statistical models, in particular GCMs, to be used to forecast future events?
I did read your essay “Two differences in perception between global cooling and global warming”, but hope for something more specific about why these models can or can not be predictive.
Thank you
sorry to misspell your name Dr. Briggs. Too fast on the submit button.
JM,
Had a look at Lindzen’s pic. The term “statistical significance” is one I would like to see forever banned, but that’s a story for another time.
First, you cannot have SS without reference to some model or some formalized hypothesis about the data. Second, the SS makes some probability statement about some function of the data given some belief about a model’s parameters.
Example: in the linear regression I used in this post, the hypothesis is usually
b = 0
. The probability statement is “What is the probability this odd function of the data would be larger than what we got if we repeated out experiment an infinite number of times given our hypothesis is certainly true?” That’s what a p-value is.Confused? You should be. I’ll try to make it simpler: we need a model and a probability statement about that model. I don’t see either in Lindzen’s pic. What model did he use? Our model B starting at 1993? Why pick 1993 and not 1992 or 1994? You have to take into account the uncertainty in picking the start date, which I very much doubt he has done.
Look at his fuzzy error bars, which I always say should be on plots of this kind. Are they for the expected value (the second definition of “mean”) or are they for the observed data? Looks like the former to me, but I’m guessing.
Best you can probably say is “The data do not appear completely consistent with the hypothesis of a strict linear increase if we assume the linear increase started in 1994.” But who knows?
Bruce,
There is no reason why climate models cannot provide skillful forecasts. In fact, some climate models do. The Climate Prediction Center (or did they just change their name again?) routinely produces skillful predictions. However, these are on the order of one to six months into the future and the skill is very modest.
Point is, there is no theoretical bar preventing skillful forecasts. So far, though, the big GCMs have not done well predicting independent data.
Lindzen’s uncertainties look like the 95% confidence intervals based on measurement uncertainties for individual data points (using GISS or Hadley estimates of the measurement uncertainty.) That would mean they should be for the observed value. But he didn’t say in that brief post.
Unfortunately true the arguments about statistical significance in Climate Blog Wars seem to be un-connected to any stated hypothesis. Also, many seem stuck on testing the hypothesis trend?0 C/century, as if no other hypothesis can be tested.
Can you suggest specific papers that compare GCM predictions to independent data? Generally, when people point me to papers they contain pretty pictures and the comparison tend to be “See. Don’t they look similar”? It would be nice to see something quantitative if that exists.
Pingback: William M. Briggs, Statistician » Homework #1: Answer part II
Checking temperature every hour. Perfect 100% reliable instruments in a pristine location with no biases.
Day 1. 2 AM low of the day at 40. 4 PM high of the day at 80. Mean of 60. Every other hour of the day 52.
Day 2. 3 AM low of the day at 50. 5 PM high of the day at 70. Mean of 60. Every other hour of the day 62.
Can I even really compare these two numbers at a single location between the two days? No. Does the daily number show me anything? No. Do they even tell me anything about the conditions? No.
Much less combining them all into months, comparing the months with some 30 year mean of similar measurements for that month, coming up with offsets, and then combining every location with many other locations.