We’ve gone on and on about how to think about time series, but we are having trouble grasping some very simple ideas. The discussion here, and on other blogs, demonstrates there is a lot of confusion and plenty of misunderstanding. Also a complete lack of humor. Who would have guessed that something as banal as statistics could get so many people so excited?
The true test of an honest mind is how seriously it considers arguments that produce uncomfortable conclusions. This is not to say that uncomfortable conclusions are always right; clearly they are not. But in the case of how to think about time series, I am right and my enemies are wrong.
I rarely ask this, but you’d be doing us all a favor if you passed this series on to those in need of it. I’ll answer questions after the series is completed. I’ll Latex this up when it’s finished, sans asides, for easy and portable reading. Remember: be nice.
Below is pictured a time series. Imagine it is something to do with climate, say, monthly temperature anomalies. Let’s first suppose that each of the points on the picture are measured without error. That is, we are 100% sure that each point is what it is. The first value is X1 = 0.43. Given our observations, what is the probability that X1 = 0.43? It is 1, or 100%. And so on for each data point. If you find yourself disagreeing with me at this point, well, there is nothing I can do for you: we must remain forever at odds.
Now, something caused that data to take the values it did. Call this cause T. (Something causes every observation to take the values it does.) You must agree with this, too, or all is lost. T will be more or less complicated depending on what X is. If X is, say, global average temperature (anomaly), then T will contain everything that can change the temperature, even down to butterflies flapping their wings. T is the earth and sun, etc.
In real life, we rarely (if ever) exactly, precisely, down-to-each photon know what T is. But suppose we did. Then we can answer questions like this: what is
(1) Pr(X1 = 0.43 | T)?
It is 1, or 100%. T says, after all, exactly what causes each X, therefore if we know T we know before taking any observations what each X will be with certainty.1 Equation (1) is different than
(2) Pr(X1 = 0.43 | Observations),
which also equals 1, or 100%. In other words, we know (1) before we take observations, but we know (2) after. This is an important distinction. Okay so far?
Again, we hardly ever know T precisely in real life; we surely do not know it if X is any kind of atmospheric or oceanic temperature. We might guess, or use evidence compiled from various sources, to say that, although we cannot know T exactly, we can approximate it, i.e. we can model it. To be clear: no scientist claims to know T precisely, but all believe (I do, too) that we can approximate T by a model.
One person will say that the best model is M1, another will claim that it is M2, and so on. It will usually be the case that
(3) Pr(Xi = x | Mj) n.e. Pr(Xi = x | Mk)
where “n.e.” means “not equal”, x is some value, i is for the i-th value of X, and j and k are indexes over our collection of posited models. This should be no surprise: if instead in (3) there was equality for all i, j, and k, then there would be no difference in the models.
A sticking point for you might be using the language of probability to speak of physical models. It shouldn’t be. For one, probability is the language of uncertainty, and don’t forget that we don’t know T, we only guess that M is a good approximation of T, so we have to speak not in terms of certainty, but uncertainty.
Let’s take a fully deterministic model as an example:
(4) M = “Yi+1 = Yi + 2.”
From this we can ask, this (or any other question of the Ys),
(5) Pr( Y17 > Y12 | M )
which is 1, or 100%. There is no problem, therefore, using probability even though M itself has no probabilistic components. Again, if you fail to agree with this, we must part ways.
Let’s get back to X, which we are imagining has something to do with temperature. What we cannot ask is this: what is
(6) Pr(X1 = 0.43)?
There is no answer because we are not considering how X1 came about. We’re missing the stuff that comes after the vertical bar “|”. If we say X was caused as T said it was, then we have eq. (1). If we mean (6) to implicitly incorporate the observations, i.e. given we have already seen X1, then we have eq. (2). We must first supply a “premise” of how X1 came about: eq. (6) is therefore incomplete. We can ask, for instance, this:
(7) Pr(X1 = 0.43 | M1),
or similar questions for every different model we are considering.
The only other point you must understand, before we move on, is that usually
(8) Pr(Xi = x | Observations) n.e. Pr(Xi = x | M),
for most (or even all) i and for any M which is not T. That is, once we have seen what Xi is, the probability of Xi taking the value it took given we see Xi is 1 or 100%, but the probability the model predicted this value is in general something less than 100%. With me? I hope so, else we will have troubles with what follows.
1If your objection is that T might contain “randomness” (quantum or “normal”), wait until Part IV.
Update Link to the data (CSV file), for those who like to touch.