Let’s Try This Time Series Thing Again: Part IV

By Briggs February 8, 201218 Comments

Part I, II, III, IV, V.

We have before us X₁ to X₁₅₆. We started by assuming that something, called T, caused these data to take the values it did. We agreed that for many physical (contingent) phenomena we can’t know T, but that we can approximate it via modeling. Very well.

For global temperatures, various physical models exist. One group thinks M₁ is tops, another group is happier with M₂, while a third prefers M₃, and so on. It may be that M₁ is better than the others for predicting precipitation and that M beats out the others for snowfall specifically, and so on. I’ll still leave the idea of “model goodness” vague for the moment. Even if one model is better than another for one variable, it cannot be that all three (or however many) of these models are best for temperature, the X we are assuming we’re interested in. It may be that M₁ is best for X before time 1, and M₂ best for X a very long time ago, and it may even be that one of these models will be best for some portions of the time 1 to 156 and others best for other portions, but considered as a whole (the entire time), only one model can be best.

We won’t know which is best until after we observe X. That is, before we see (or before we acknowledge having seen) X, we can calculate

(21) Pr( X_{1 – 156} | M_j),

which are the model predictions for models j = 1, 2, 3, … After we observe X we can compute

(22) G( (21)_j , X ),

where G() is a goodness function which measures how close the predictions (21) are to the actual observations X and the j are over the models under consideration. There are many G(), and it may be that a model is best with one G() but not best with another. The G() you should pick should reflect the decisions you make on the forecasts (21). That is, a model said X would take certain values with a given probability, you acted on this information, and in so acting you suffered or gained. That suffering or gaining is quantified by G().

There are many off-the-shelf G() from which to choose if you are unable to think of how your predictions will be used. But, just as an aside, if you can’t imagine how your predictions will be used, you probably shouldn’t be making them. Anyway, assume we all agree on some G(). We can now order the models, from least to best, according to G(). If we do not pick a G(), we cannot speak of goodness, badness, or even indifference. This is a necessary step.

Included in the list of models under consideration are the probability models we discussed last time. Isn’t that strange? If it isn’t, it ought to be. After all, we’re mixing physical with probability models. Actually, it’s physical scientists who mix up, not mix, these models. To make things easy, suppose we are considering only two models, M_phys and M_prob. That is, suppose a “consensus” develops among all physical scientists that M_phys is the one and only physical model that anybody should use, and that M_prob is some probability model, like the regression model used last time. It needn’t be a regression model: imagine instead that M_prob is the grandest probability model you can think of.

With M_phys and M_prob in hand, we can compute (21) for each. We then wait until the X come in and then calculate (22). Either M_phys will be better or M_prob will be: there is a small chance that (22) will be equal for both. Statisticians have an automatic edge because they often do not reveal their M_prob until after the X have revealed themselves (this gives them the chance to “massage” things a bit: a perquisite of office).

Suppose that, given G() and X, M_prob is better. What does this mean? Well, that M_prob was better at describing the uncertainty in X than was M_phys. Does this mean that M_prob was therefore true and that M_phys false? No. Does it mean, as most scientists oddly believe that it does, that M_phys is still probably true but that M_prob is merely some kind of “helper” in understanding uncertainty? Not only no, but, well, just no.

We can calculate, if we want,

(23) Pr( M_prob | X ) = 1 – Pr( M_phys | X ).

Usually if G(M_prob) is better than G(M_phys), Pr( M_prob | X ) > Pr( M_phys | X ) (G() might be “strange” such that the relationship is inverted; these are degenerate situations). If this is so, if the probability of M_prob being true given the data is higher than the probability of M_phys being true given the data, would anybody believe it?

No, and neither would I; at least, not for temperature. It might be, and even is true that for some physical/contingent phenomena probability models really are better than any (known) physical model at describing the uncertainty in the observable. But for temperatures, who would believe that a statistical model is better than, say, a sophisticated global climate model? As I said, not I. But this is because (23) is the wrong equation. (23) does not account for any prior understanding we have on the two models under consideration. We really want

(23′) Pr( M_prob | X & E) = 1 – Pr( M_phys | X & E).

where E is background information pertinent to the X, including our prior probabilities that M_phys and M_prob are true.

But now if we believe that M_phys is more likely true, even after we’ve seen X, and even if M_prob is better than M_phys with respect to G(), then just what are we saying? Recall it is still true that (13) says that temperatures decreased (because they did). It may be that either or both M_phys and M_prob said that it was improbable that X would have decreased, but decrease it still did. You cannot claim, in order to refute the observations that X really did decrease (over times 1 – 156), as we saw last time, that M_phys or M_prob really have to say something about X “over longer periods.” You’re stuck with the observations no matter what.

Why are we using M_prob anyway? Don’t we believe that M_phys is much more likely to be true? Well, maybe. We believe that some physical model is better than the statistical, but how do we know that the physical model before us is it? Before answering that, consider how strange it is to abandon the physical model we currently hold to entertain statistical evidence of temperature change. Because even if the probability model did in fact show a temperature increase (recalling (13) still holds), this does not mean that the physical model did. That is, the statistical model saying one thing or another is not, in any way, proof that the physical model is true.

I’ll repeat that, because it’s important. No matter what the probability model says, it is not proof for or against the physical model. Even if G(M_prob) is wonderful, this does not imply that G(M_phys) is any good. And if you claim, because G(M_prob) is good that therefore, if not our particular M_phys, that some physical model (with the same basic theory) is therefore true, you are saying what is unwarranted.

In short, M_phys must be judged on its own. If you consider M_prob as a replacement for M_phys, then it is very well to talk of G(M_prob) besting G(M_phys)) or that (23) is “large.” It is no salvation for M_phys that G(M_prob) is “good”. If G(M_phys) is “bad”, then it is “bad” period. (The inverse is also true.)

Next time: we’re finally ready to handle X measured with error, i.e. “predictive” statistics.

Part I, II, III, IV, V.

Last updated on February 9, 2012

Briggs

Briggs is an internationally reviled thoughtcriminal, listed as One Of The Top 7 Dangerous Minds by the Hague.

View All Posts

18 Comments

Will

February 8, 2012, 11:56 am

Reading what you said of multiple models and which is true makes me think of a maching learning method called Genetic Programming (Holland, Koza).

The general idea is simple; start with a bunch of random equations, rate each equation based on how well it works against observations, and then apply ‘survival of the fittest’ to determine which equations get randombly perturbed copies, and which get removed from the population. Repeat a few hundred thousand times and your done (in theory).

It’s an interesting, but very slow, way of demonstrating just how many different solutions there can be to the same problem.
DAV

February 8, 2012, 3:38 pm

Will,

A quicker way is to get a bunch of people together (say, Internet) and ask their opinion. It’s pretty much the same thing. You don’t necessarily get a better answer. A lot depends on the selection criteria. Not to mention local minima/maxima abound many problems.
Briggs

February 8, 2012, 3:48 pm

DAV,

RE: Chinese Emperor’s nose problem. Same thing.

All,

Incidentally, where are all the comments gone? This is the meatiest part yet, but everybody seems to have disappeared. How could this have happened? Tamino, where are you?
Big Mike

February 8, 2012, 4:48 pm

With respect to comments: I’m certainly no statistician, but you have explained the concepts in a way that makes them very plausible to a naive observer such as me.

My apologies for refreshing rather frequently — I’m looking forward to the counter-arguments.
Briggs

February 8, 2012, 6:17 pm

Big Mike,

Refresh away!

I too was looking forward to counter-arguments: there are some to be made, some subtle ones, too. If nobody else asks, I’ll ask them myself.
Doug M

February 8, 2012, 7:06 pm

As this series has been progressing, I have been thinking about how we discuss time series in financial and economic data.

In the financial world we decribe an up trend if the new both the most recent high is higher than the previous high, and the most recent low is also higer than the previous low.

It is always assumed that the current trend will eventually break.
Big Mike

February 8, 2012, 8:07 pm

@Doug M: I know little of finance and economics — how is your definition of “up-trend” (and I presume its counterpart “down-trend”) used in practice? Do you find it’s a useful indication of anything? Do you limit “previous” to a specific time period? In examining a time series of the price of a particular asset, do you apply any “E” (in Dr. Briggs’ terminology) to judge heteroscedasticity? For example, a given “asset” — say a share in a company — surely adopts different price-over-time characteristics as, say, financial reports are anticipated and ultimately released.

Thanks!
Will

February 8, 2012, 8:31 pm

This is probably my sixth refresh today… I was really hoping for some debate. Looks like you’re title as statistican to the stars remains uncontested Mr. Briggs.

Mr. Briggs: if I understand you correctly, you are saying that for any given set of observations there exist many explanations.

Is there a way to know if one model is closer the the true mechanics of a system than another? For example: I could use something like a neural network to help me model highway traffic congestion, or I could build a model of “driver”, complete with a 3D physics simulation for the cars, and fill in all of the different things that I think are relevant to driving and highways. Which model is closer to the true model (reality), and, does it even matter?
Alan Cooper

February 8, 2012, 8:38 pm

You ask for comments but you don’t respond to mine.
I haven’t had time to read this yet, but my first comment is to question how you can say that “considered as a whole (the entire time), only one model can be best.” Doesn’t identifying the “best” model depend on what discounting method you use to get convergence of the infinite tails?
Briggs

February 8, 2012, 9:32 pm

Alan C,

Ask again after you have time to read.

Will,

Yes, to your first observations, and yes to your first question. Calculating (23) for instance. Or by calculating a G() that is meaningful to you. We can go over specific examples later.
Will

February 8, 2012, 10:37 pm

Hopefully ive interpretted 23 correctly. Assuming i have:

By saying that the model which performs “better” is a truer model, are we not also saying that we just don’t know what reality is? I mean this in a fundamental umpteen dimensional string theory kind of way.

Sorry if that seems metaphysical..it’s just that recent advances in machine learning, and processing power, have made it so that that letting a computer decide the model type doesn’t seem like a bad idea. An inexpensive desktop can crunch a few billion numbers in no time. But I feel uncomfortable believing that some of these models are anything at all like reality. Boosted regression trees and support vectors don’t feel as solid as f=ma.
Alan Cooper

February 8, 2012, 10:56 pm

Ah! I see that I shouldn’t have let you “guilt” me into commenting prematurely as your reference to different “goodness” functions giving different “best” models clearly qualifies your reference in the first paragraph to only one model being best.
Big Mike

February 8, 2012, 11:21 pm

Hey, Will — I don’t pretend to have anywhere near Dr. Briggs’ level of expertise in this matter, but I would say: of course the need to apply a model means we don’t know what reality is. A model, in my understanding, is little more than a way of guessing how the underlying process may react under certain circumstances — it isn’t the underlying process itself.

f=ma is, in fact, a model. It’s known to hold quite well over a very useful range of values of “m” and “a”. But the problem is, assuming that the values “m” and “a” take are real (as opposed to rational) numbers, we can’t ever know them with complete precision, and so it’s possible to get several values of “f” for what we perceive to be the same values of “m” and “a” in an experiment where we accelerate a constant mass a number of times. Presumably, if we repeat the experiment very carefully a sufficient number of times, the measured values of “f” are likely to cluster around the “true” value — the existence of which we can only postulate, and certainly never know for sure. We may be able to isolate it to a smaller and smaller region by improving our experiment, and by taking more and more measurements, but the notion of an “exact” value of “f” is essentially meaningless in practice.

As far as I recall, this was the state of affairs (or at least the zenith of my understanding) when I studied 1st year physics so very long ago…
Alan Cooper

February 8, 2012, 11:29 pm

Thanks for responding to my previous (premature) question.
I now have another couple.

1) Did I miss something, or is this your first reference to quantities of the form P(M|X)?
(My reservations about your use of conditional probability notation for probabilities computed under various model assumptions are exceeded by those I have about the reverse conditionals in the absence of any specified Bayesian context.)

2) Your equation (23) (or even (23′) seems to imply that in the presence of observations X (and maybe other information E) one or other of Mphys or Mprob *has* to be exactly true, but perhaps by P(M|X) you just mean some kind of “probability” that G(M) is the bigger of the two G values (scare quotes because I still don’t know what kind of probability you mean in such a statement). Am I interpreting you correctly here and, either way on that, what *do* you mean by P(M|X)?
Uncle Mike

February 8, 2012, 11:40 pm

But what if M(prob) was accurate and precise at predicting future X, and M(phys) was a total bust? Wouldn’t that tell us something about M(phys)? For instance, if M(prob) was a regression line from 2000 to 2005, and the post-2005 data fell right on that (decreasing) line? If M(phys) of 2005 predicted an increase which didn’t happen, that’s a black eye of some degree, right?

What if there were some physical models — let’s call them M(skeptic) — that were accurate and precise, but the establishment refuses to consider them, or derides them as poor science? Such as solar (sunspot) models, cosmic ray models, Jupiter models, etc., which predicted the observed decrease, whereas the establishment (CO2) models got the sign wrong and shot off in the wrong direction?
Bill S

February 9, 2012, 2:16 am

No constructive critique from me.
I think you are doing a very good job in this series.
Do not tell me you are deliberately misleading me when I finally think I begin to understand.
Please.
Sander van der Wal

February 9, 2012, 4:29 am

@Will

There is a difference between a theory and a model. Theories are by definition meant to explain and predict the observable world. Models are built to explore the world. So some models are also theories, and all theories are models.
George

February 10, 2012, 9:38 am

I don’t understand the fundamental difference between Mphys and Mprob. Aren’t they both arbitrary ways to predict the observables? Why should we be depressed if one type of model does a better job than another – doesn’t it just mean we should probably trust that model more?

Let’s Try This Time Series Thing Again: Part IV

Related

18 Comments

Leave a Reply

Share this:

Related

18 Comments

Leave a Reply