Explanation Vs Prediction

The IPCC, hard at work on another forecast.

Introduction

There isn’t as much space between explanation and prediction as you’d think; both are had from the same elements of the problem at hand.

Here’s how it all works. I’ll illustrate a statistical (or probability) model, though there really is no such thing; which is to say, there is no difference in meaning or interpretation between a probability and a physical or other kind of mathematical model. There is a practical difference: probability models express uncertainty natively, while (oftentimes) physical models do not mention it, though it is there, lurking below the equations.

Let’s use regression, because it is ubiquitous and easy. But remember, everything said goes for all other models, probability or physical. Plus, I’m discussing how things should work, not how they’re actually done (which is very often badly; not your models, Dear Reader: of course, not yours).

We start by wanting to quantify the uncertainty in some observable y, and believe we have collected some “variables” x which are probative of y. Suppose y is (some operationally defined) global average temperature. The x may be anything we like: CO₂ levels, population size, solar insolation, grant dollars awarded, whatever. The choice is entirely up to us.

Now regression, like any model, has a certain form. It says the central parameter of the normal distribution representing uncertainty in y is a linear function of the x (y and x may be plural, i.e. vectors). This model structure is almost never deduced (in the strict sense of the word) but is assumed as a premise. This is not necessarily a bad thing. All models have a list of premises which describe the structure of the model. Indeed, that is what being a model means.

Another set of premises are the data we observe. Premises? Yes, sir: premises. The x we pick and then observe take the form of propositions, e.g. “The CO₂ observed at time 1 was c₁“, “The CO₂ observed at time 2 was c₂,” etc.

Observed data are premises because it is we who pick them. Data are not Heaven sent. They are chosen and characterized by us. Yes, the amount of—let us call it—cherishing that takes place over data is astonishing. Skip it. Data are premises, no different in character than other assumptions.

Explanation

Here is what explanation is (read: should be). Given the model building premises (that specified, here, regression) and the observed data (both y and x), we specify some proposition of interest about y and then specify propositions about the (already observed) x. Explanation is how much the probability the proposition about y (call it Y) changes.

That’s too telegraphic, so here’s an example. Pick a level for each of the observed x: “The CO₂ observed is c₁“, “The population is p”, “The grant dollars is g”, etc. Then compute the probability Y is true given this x and given the model and other observed data premises.

Step two: pick another level for each of the x. This may be exactly the same everywhere, except for just one component, say, “The CO₂ observed is c₂“. Recompute the probability of Y, given the new x and other premises.

Step three: compare how much the probability of Y (given the stated premises) changed. If not at all, then given the other values of x and the model and data premises, then CO₂ has little, and maybe even nothing, to do with y.

Of course, there are other values of the other x that might be important, in conjunction with CO₂ and y, so we can’t dismiss CO₂ yet. We have a lot of hard work to do to step through how all the other x and how this x (CO₂) change this proposition (Y) about y. And then there are other propositions of y that might be of more interest. CO₂ might be important for them. Who knows?

Hey, how much change in the probability of any Y is “enough”? I have no idea. It depends. It depends on what you want to use the model for, what decisions you want to make with it, what costs await incorrect decisions, what rewards await correct ones, all of which might be unquantifiable. There is and should be NO preset level which says “Probability changes by at least p are ‘important’ explanations.” Lord forbid it.

A word about causality: none. There is no causality in a regression model. It is a model of how changing CO₂ changes our UNCERTAINTY in various propositions of y, and NOT in changes in y itself.¹

Explanation is brutal hard labor.

Prediction

Here is what prediction is (should be). Same as explanation. Except we wait to see whether Y is true or false. The (conditional) prediction gave us its probability, and we can compare this probability to the eventual truth or falsity of Y to see how good the model is (using proper scores).

Details. We have the previous observed y and x, and the model premises. We condition on these and then suppose new x (call them w) and ask what is the probability of new propositions of y (call them Z). Notationally, Pr( Z | w,y,x,M), where M are the model form premises. These probabilities are compared against the eventual observations of z.

“Close” predictions means good models. “Distant” ones mean bad models. There are formal ways of defining these terms, of course. But what we’d hate is if any measure of distance became standard. The best scores to use are those tied intimately with the decisions made with the models.

And there is also the idea of skill. The simplest regression is a “null x”, i.e. no x. All that remains is the premises which say the uncertainty in y is represented by some normal distribution (where the central parameter is not a function of anything). Now if your expert model, loaded with x, cannot beat this naive or null model, your model has no skill. Skill is thus a relative measure.

For time series models, like e.g. GCMs, one natural “null” model is the null regression, which is also called “climate” (akin to long-term averages, but taking into account the full uncertainty of these averages). Another is “persistence”, which is the causal-like model y_t+1 = y_t + fuzz. Again, sophisticated models which cannot “beat” persistence have no skill and should not be used. Like GCMs.

More…

This is only a sketch. Books have been written on these subjects. I’ve compressed them all in 1,100 words.

———————————————————————————-

¹Simple causal model: y = x. It says y will be the value of x, that x makes y what it is. But even these models, though written mathematically like causality, are not treated that way. Fuzz is added to them mentally. So that if x = 7 and y = 9, the model won’t abandoned.

7 Comments

Terry Oldberg

August 12, 2014, 9:12 am

I prefer to define “prediction” for consistency with information theory. In particular, Y is a state-space each element of which is a state of nature. Likewise, X is a state-space. A “predictive inference” aka “conditional prediction” is an extrapolation from an unspecified state in X to an unspecified state in Y. A “prediction” is an unconditional predictive inference.
DAV

August 12, 2014, 9:20 am

Now regression, like any model, has a certain form. It says the central parameter of the normal distribution representing uncertainty in y is a linear function of the x

The normal distribution is often assumed but is the normal distribution actually required or can any convex distribution with at at least (and hopefully) one mode be used(e.g., Beta with a=2; b=5)?
DAV

August 12, 2014, 9:28 am

Just to be clear, the purpose of the regression would be to determine the alpha and beta parameters if a beta distribution were used.
Gary

August 12, 2014, 9:53 am

To compress still further: the difference between explanation and prediction is timing.
An Engineer

August 12, 2014, 9:58 am

To compress still further: even I understand…..
DAV

August 12, 2014, 5:14 pm

And even further: 1.
Briggs

August 12, 2014, 5:15 pm

An Engineer,

Hooray for me!

DAV,

The ordinary, as-she-is-practiced purpose of regression is as you say. The correct, I-wish-it-were-so goal is as I outlined above.

Explanation Vs Prediction

Related

7 Comments

Leave a Reply

Share this:

Related

7 Comments

Leave a Reply