We have before us X_{1} to X_{156}. We started by assuming that something, called T, caused these data to take the values it did. We agreed that for many physical (contingent) phenomena we can’t know T, but that we can approximate it via modeling. Very well.

For global temperatures, various physical models exist. One group thinks M_{1} is tops, another group is happier with M_{2}, while a third prefers M_{3}, and so on. It may be that M_{1} is better than the others for predicting precipitation and that M_{1} is best for X before time 1, and M_{2} best for X a very long time ago, and it may even be that one of these models will be best for some portions of the time 1 to 156 and others best for other portions, but considered as a whole (the entire time), only one model can be best.

We won’t know which is best until *after* we observe X. That is, before we see (or before we acknowledge having seen) X, we can calculate

(21) Pr( X_{1 – 156} | M_{j}),

which are the model predictions for models j = 1, 2, 3, … *After* we observe X we can compute

(22) G( (21)_{j} , X ),

where G() is a goodness function which measures how close the predictions (21) are to the actual observations X and the j are over the models under consideration. There are many G(), and it may be that a model is best with one G() but not best with another. The G() you should pick should reflect the decisions you make on the forecasts (21). That is, a model said X would take certain values with a given probability, you acted on this information, and in so acting you suffered or gained. That suffering or gaining is quantified by G().

There are many off-the-shelf G() from which to choose if you are unable to think of how your predictions will be used. But, just as an aside, if you can’t imagine how your predictions will be used, you probably shouldn’t be making them. Anyway, assume we all agree on some G(). We can now order the models, from least to best, according to G(). If we do not pick a G(), we cannot speak of goodness, badness, or even indifference. This is a necessary step.

Included in the list of models under consideration are the probability models we discussed last time. Isn’t that strange? If it isn’t, it ought to be. After all, we’re mixing physical with probability models. Actually, it’s physical scientists who mix up, not mix, these models. To make things easy, suppose we are considering only two models, M_{phys} and M_{prob}. That is, suppose a “consensus” develops among all physical scientists that M_{phys} is the one and only physical model that anybody should use, and that M_{prob} is some probability model, like the regression model used last time. It needn’t be a regression model: imagine instead that M_{prob} is the grandest probability model you can think of.

With M_{phys} and M_{prob} in hand, we can compute (21) for each. We then wait until the X come in and then calculate (22). Either M_{phys} will be better or M_{prob} will be: there is a small chance that (22) will be equal for both. Statisticians have an automatic edge because they often do not reveal their M_{prob} until after the X have revealed themselves (this gives them the chance to “massage” things a bit: a perquisite of office).

Suppose that, given G() and X, M_{prob} is better. What does this mean? Well, that M_{prob} was better at describing the uncertainty in X than was M_{phys}. Does this mean that M_{prob} was therefore true and that M_{phys} false? No. Does it mean, as most scientists oddly believe that it does, that M_{phys} is still probably true but that M_{prob} is merely some kind of “helper” in understanding uncertainty? Not only no, but, well, just no.

We can calculate, if we want,

(23) Pr( M_{prob} | X ) = 1 – Pr( M_{phys} | X ).

Usually if G(M_{prob}) is better than G(M_{phys}), Pr( M_{prob} | X ) > Pr( M_{phys} | X ) (G() might be “strange” such that the relationship is inverted; these are degenerate situations). If this is so, if the probability of M_{prob} being true given the data is higher than the probability of M_{phys} being true given the data, would anybody believe it?

No, and neither would I; at least, not for temperature. It might be, and even is true that for some physical/contingent phenomena probability models really are better than any (known) physical model at describing the uncertainty in the observable. But for temperatures, who would believe that a statistical model is better than, say, a sophisticated global climate model? As I said, not I. But this is because (23) is the wrong equation. (23) does not account for any prior understanding we have on the two models under consideration. We really want

(23′) Pr( M_{prob} | X & E) = 1 – Pr( M_{phys} | X & E).

where E is background information pertinent to the X, including our prior probabilities that M_{phys} and M_{prob} are true.

But now if we believe that M_{phys} is more likely true, even after we’ve seen X, and even if M_{prob} is better than M_{phys} with respect to G(), then just what are we saying? Recall it is still true that (13) says that temperatures decreased (because they did). It may be that either or both M_{phys} and M_{prob} said that it was improbable that X would have decreased, but decrease it still did. You cannot claim, in order to refute the observations that X really did decrease (over times 1 – 156), as we saw last time, that M_{phys} or M_{prob} really have to say something about X “over longer periods.” You’re stuck with the observations no matter what.

Why are we using M_{prob} anyway? Don’t we believe that M_{phys} is much more likely to be true? Well, maybe. We believe that *some* physical model is better than the statistical, but how do we know that the physical model before us is it? Before answering that, consider how strange it is to abandon the physical model we currently hold to entertain statistical evidence of temperature change. Because even if the probability model did in fact show a temperature increase (recalling (13) still holds), this does *not* mean that the physical model did. That is, the statistical model saying one thing or another is *not*, in *any* way, proof that the physical model is true.

I’ll repeat that, because it’s important. No matter what the probability model says, it is not proof for or against the physical model. Even if G(M_{prob}) is wonderful, this does not imply that G(M_{phys}) is any good. And if you claim, because G(M_{prob}) is good that therefore, if not our particular M_{phys}, that *some* physical model (with the same basic theory) is therefore true, you are saying what is unwarranted.

In short, M_{phys} must be judged on its own. If you consider M_{prob} as a *replacement* for M_{phys}, then it is very well to talk of G(M_{prob}) besting G(M_{phys})) or that (23) is “large.” It is no salvation for M_{phys} that G(M_{prob}) is “good”. If G(M_{phys}) is “bad”, then it is “bad” period. (The inverse is also true.)

Next time: we’re finally ready to handle X measured with error, i.e. “predictive” statistics.

Reading what you said of multiple models and which is true makes me think of a maching learning method called Genetic Programming (Holland, Koza).

The general idea is simple; start with a bunch of random equations, rate each equation based on how well it works against observations, and then apply ‘survival of the fittest’ to determine which equations get randombly perturbed copies, and which get removed from the population. Repeat a few hundred thousand times and your done (in theory).

It’s an interesting, but very slow, way of demonstrating just how many different solutions there can be to the same problem.

Will,

A quicker way is to get a bunch of people together (say, Internet) and ask their opinion. It’s pretty much the same thing. You don’t necessarily get a better answer. A lot depends on the selection criteria. Not to mention local minima/maxima abound many problems.

DAV,

RE: Chinese Emperor’s nose problem. Same thing.

All,

Incidentally, where are all the comments gone? This is the meatiest part yet, but everybody seems to have disappeared. How could this have happened? Tamino, where are you?

With respect to comments: I’m certainly no statistician, but you have explained the concepts in a way that makes them very plausible to a naive observer such as me.

My apologies for refreshing rather frequently — I’m looking forward to the counter-arguments.

Big Mike,

Refresh away!

I too was looking forward to counter-arguments: there are some to be made, some subtle ones, too. If nobody else asks, I’ll ask them myself.

As this series has been progressing, I have been thinking about how we discuss time series in financial and economic data.

In the financial world we decribe an up trend if the new both the most recent high is higher than the previous high, and the most recent low is also higer than the previous low.

It is always assumed that the current trend will eventually break.

@Doug M: I know little of finance and economics — how is your definition of “up-trend” (and I presume its counterpart “down-trend”) used in practice? Do you find it’s a useful indication of anything? Do you limit “previous” to a specific time period? In examining a time series of the price of a particular asset, do you apply any “E” (in Dr. Briggs’ terminology) to judge heteroscedasticity? For example, a given “asset” — say a share in a company — surely adopts different price-over-time characteristics as, say, financial reports are anticipated and ultimately released.

Thanks!

This is probably my sixth refresh today… I was really hoping for some debate. Looks like you’re title as statistican to the stars remains uncontested Mr. Briggs.

Mr. Briggs: if I understand you correctly, you are saying that for any given set of observations there exist many explanations.

Is there a way to know if one model is closer the the true mechanics of a system than another? For example: I could use something like a neural network to help me model highway traffic congestion, or I could build a model of “driver”, complete with a 3D physics simulation for the cars, and fill in all of the different things that I think are relevant to driving and highways. Which model is closer to the true model (reality), and, does it even matter?

You ask for comments but you don’t respond to mine.

I haven’t had time to read this yet, but my first comment is to question how you can say that “considered as a whole (the entire time), only one model can be best.” Doesn’t identifying the “best” model depend on what discounting method you use to get convergence of the infinite tails?

Alan C,

Ask again after you have time to read.

Will,

Yes, to your first observations, and yes to your first question. Calculating (23) for instance. Or by calculating a G() that is meaningful to you. We can go over specific examples later.

Hopefully ive interpretted 23 correctly. Assuming i have:

By saying that the model which performs “better” is a truer model, are we not also saying that we just don’t know what reality is? I mean this in a fundamental umpteen dimensional string theory kind of way.

Sorry if that seems metaphysical..it’s just that recent advances in machine learning, and processing power, have made it so that that letting a computer decide the model type doesn’t seem like a bad idea. An inexpensive desktop can crunch a few billion numbers in no time. But I feel uncomfortable believing that some of these models are anything at all like reality. Boosted regression trees and support vectors don’t feel as solid as f=ma.

Ah! I see that I shouldn’t have let you “guilt” me into commenting prematurely as your reference to different “goodness” functions giving different “best” models clearly qualifies your reference in the first paragraph to only one model being best.

Hey, Will — I don’t pretend to have anywhere near Dr. Briggs’ level of expertise in this matter, but I would say: of course the need to apply a model means we don’t know what reality is. A model, in my understanding, is little more than a way of guessing how the underlying process may react under certain circumstances — it isn’t the underlying process itself.

f=ma is, in fact, a model. It’s known to hold quite well over a very useful range of values of “m” and “a”. But the problem is, assuming that the values “m” and “a” take are real (as opposed to rational) numbers, we can’t ever know them with complete precision, and so it’s possible to get several values of “f” for what we perceive to be the same values of “m” and “a” in an experiment where we accelerate a constant mass a number of times. Presumably, if we repeat the experiment very carefully a sufficient number of times, the measured values of “f” are likely to cluster around the “true” value — the existence of which we can only postulate, and certainly never know for sure. We may be able to isolate it to a smaller and smaller region by improving our experiment, and by taking more and more measurements, but the notion of an “exact” value of “f” is essentially meaningless in practice.

As far as I recall, this was the state of affairs (or at least the zenith of my understanding) when I studied 1st year physics so very long ago…

Thanks for responding to my previous (premature) question.

I now have another couple.

1) Did I miss something, or is this your first reference to quantities of the form P(M|X)?

(My reservations about your use of conditional probability notation for probabilities computed under various model assumptions are exceeded by those I have about the reverse conditionals in the absence of any specified Bayesian context.)

2) Your equation (23) (or even (23′) seems to imply that in the presence of observations X (and maybe other information E) one or other of Mphys or Mprob *has* to be exactly true, but perhaps by P(M|X) you just mean some kind of “probability” that G(M) is the bigger of the two G values (scare quotes because I still don’t know what kind of probability you mean in such a statement). Am I interpreting you correctly here and, either way on that, what *do* you mean by P(M|X)?

But what if M(prob) was accurate and precise at predicting future X, and M(phys) was a total bust? Wouldn’t that tell us something about M(phys)? For instance, if M(prob) was a regression line from 2000 to 2005, and the post-2005 data fell right on that (decreasing) line? If M(phys) of 2005 predicted an increase which didn’t happen, that’s a black eye of some degree, right?

What if there were some physical models — let’s call them M(skeptic) — that were accurate and precise, but the establishment refuses to consider them, or derides them as poor science? Such as solar (sunspot) models, cosmic ray models, Jupiter models, etc., which predicted the observed decrease, whereas the establishment (CO2) models got the sign wrong and shot off in the wrong direction?

No constructive critique from me.

I think you are doing a very good job in this series.

Do not tell me you are deliberately misleading me when I finally think I begin to understand.

Please.

@Will

There is a difference between a theory and a model. Theories are by definition meant to explain and predict the observable world. Models are built to explore the world. So some models are also theories, and all theories are models.

I don’t understand the fundamental difference between Mphys and Mprob. Aren’t they both arbitrary ways to predict the observables? Why should we be depressed if one type of model does a better job than another – doesn’t it just mean we should probably trust that model more?