The True Meaning Of Statistical Models

By Briggs October 2, 201442 Comments

It's catching. — It’s catching. (Image source.)

This came up yesterday (again, as it does often), so I figure one more stab is in order. Because the answer isn’t simple, I had to write a lot, which means it won’t get read, which means I’ll have to write about it again in the future.

Trust your eyes

You’re a doctor (your mother is proud) and have invented a new pill, profitizol, said to cure the screaming willies. You give this pill to 100 volunteer sufferers, and to another 100 you give an identically looking placebo.

Here are the facts, doc: 71 folks in the profiterol group got better, whereas only 60 in the placebo group did.

Now here is what I swear is not a trick question. If you can answer it, you’ll have grasped the true essence of statistical modeling. In what group were there a greater proportion of recoverers?

This is the same question that was asked yesterday, but with respect to the global temperature values. Once we decided what was meant by a “trend”—itself no easy task—the question was: Was there a trend?

May I have a drum roll, please! The answer to today’s question is—isn’t the tension unbearable?—more people in the profitizol group got better. The answer to yesterday’s question was (accepting the definition of trend therein): no.

These answers cause tremendous angst, because people figure it can’t be that easy. It doesn’t sound sciency enough. Well, it is that easy. You can go on national television and trumpet to the world the indisputable inarguable obvious absolute truths that more people in the drug group got better, and that (given our definition of trend) there hasn’t been a trend these twenty years.

Question two: what caused the difference in observed recovery rates? And what caused the temperature to do what it did?

My answer for both: I don’t know. But I do know that some thing or things caused each person in each group to get better or not. And I know that some thing or things caused temperature to take the values it did. I also know that “chance” or “randomness” weren’t the causes. They can’t be, because they are measures of ignorance and not physical objects. Lack of an allele of a certain gene can cause non-recovery, and the sun can cause the temperature to increase, but “chance” is without any power whatsoever.

Results are never due to chance, they are due to real causes, which we may or may not know.

The IPCC claims to know why temperature did what it did. We know the IPCC is wrong, because their model predicted things which did not happen. That means the causes it identified are wrong in some way, either by omission or commission. That’s for them to figure out.

Clever readers will have noticed that, thus far, there was no need for statistical models. But if our goal was only to make the statement which group got better at greater rates or if there was a trend, no model was needed. Why substitute perfectly good reality with a model? That is to commit the Deadly Sin of Reification (alas, an all too common failing).

Enter the models

The classically trained (Bayesian or frequentist) statistician will still want to model, because that is what statisticians do. In the drug trial they will invent for themselves a “null hypothesis”, which is the proposition, “Profitizol and the placebo cause the exact same biological effects”, which they ask us to “accept” or “reject”.

That means, in each patient, profitizol or a placebo would do the same exact thing, i.e. interact with the relevant biological pathways associated with the screaming willies such that no measurement on any system would reveal any difference. But given you are a doctor, aware of biochemistry, genetics, and the various biological manifestations of the screaming willies, it is highly unlikely this “null” proposition holds. Indeed, to insist it does is to abandon or willfully ignore all this knowledge and cast all your attention on only that which can be quantified (the Sin of Scientism).

Of course, you might have made a mistake and created a substance which was (relative to the SW) identical with the placebo. Mistakes happen. How do we tell? Do we have any evidence that profitizol works? That’s the real question, the question everybody wants to know. Well, what does “works” mean?

Uh oh. Now we’re into causality. If by “works” we mean, “Every patient that eats profitizol is cured of the SW” then profitizol does not work, because why? Because not every patient got better. If by “works” we mean, “Some patients that eat profitizol are cured of the SW” then profitizol works, and so does the placebo, because, of course, some patients who ate the drug got better. Defining properly what “works” is not an easy job, as this series of essays on a famous statistical experiment proves. Here we’re stuck with the mixed evidence that patients in both groups got better. Clearly, something other than just interacting with a drug or placebo is going on.

What to do?

Remember the old saw about how the sale of ice cream cones was “correlated” with drownings? Everybody loves to cite—and to scoff at—this example because it is obviously missing any direct causal connection. But it’s a perfectly valid statistical model. Why?

Because a statistical model is only interested in quantifying the uncertainty in some observable, given clearly stated evidence. Thus if we know that ice creams sales are up, it’s a good bet that drownings will rise. We haven’t said why, but this model makes good predictions! (I’m hand-waving, but you (had better) get the idea.)

Statistical models do not say anything about causality. We’re not learning why people are drowning, or why people are getting better on profitizol, or why the temperature is doing what it’s doing. We are instead quantifying our uncertainty given changes in certain conditions—and that is it.

If we knew all about the causes of a thing, we would not need statistics. We would feed the initial and observed conditions into our causal model, and out would pop what would happen. If we don’t know the causes, or can’t learn them, but still want to quantify uncertainty, we can use a statistical model. But it’s always a mistake to infer (without error; logically infer) causality because some statistical model passes some arbitrary test about the already observed data. The ice cream-drowning model (we assume) would pass the standard tests. But there is no causality.

Penultimate fact: To any given set of data, any number of statistical or causal or combination models can be fit, any number of which fit that observed data arbitrarily well. I can have a model and you can have a rival one, both which “fit” the data. How do we tell which model is better?

Last fact: Every model (causal or statistical or combination) implies (logically implies) a prediction. Since models say what values, or with what probability what values, some observable will take given some conditions, all we do is supply those conditions which indicate new circumstances (usually the future)—voilà! A prediction!

It’s true most people who use statistical models have no idea of this implication (they were likely not taught it). Still, it is true, and even obvious once you give it some thought.

Not knowing this implication is why so many statistical models are meager, petty things. At least the IPCC stuck around and waited to see whether the model they proposed worked. Most users of statistics are content to fit their model to data, announce measures of that fit (and since any number of models will fit as well, this is dull information), and then they run away winking and nudging about the causality which is “obvious.”

Not recognizing this is why we are going through our “reproducibility crisis”, which, you will notice, hits just those fields which rely primarily on statistics.

Last updated on October 2, 2014

Briggs

Briggs is an internationally reviled thoughtcriminal, listed as One Of The Top 7 Dangerous Minds by the Hague.

View All Posts

42 Comments

DAV

October 2, 2014, 9:19 am

At least the IPCC stuck around and waited to see whether the model they proposed worked.

That’s being generous. More accurate to say they unfortunately lasted long enough for others to see if they worked. They (well, not the IPCC per se) keep changing the parameters which produces a new model every time and coincidentally pushes the determination farther into the future.

You can bet the effort to come up with better parameters has been going on for the last 18 years. The effort must not have been successful or it would have been trumpeted far and wide and a new determination period would have been established.

It’s the same tactics employed by any long-lived oracle: never make a prediction which can be verified. Someone screwed up.
Terry Oldberg

October 2, 2014, 10:06 am

This is an important and often misunderstood topic presented by Dr. Briggs with his usual accuracy and wit. For completeness it should be pointed out that “model” and “predict” are polysemic (have several meanings) being thereby capable of supporting applications of the equivocation fallacy that make a pseudoscience like, for example, most of global warming climatology appear to be a science (see https://www.wmbriggs.com/blog/?p=7923 for argument). Consequently a “model” may fail to infer the outcomes of events of the future yet be said to “predict.” Such a “model” provides us with no information about the outcomes of the future yet seems to provide us with this information. It is incapable of supporting the control of the associated system but appears to be capable of doing so. “Models” having this characteristic underlie repeated attempts by governments to control Earth’s global surface temperature.
Ken

October 2, 2014, 10:11 am

You keep picking on the IPCC, etc. … but the matter, the science, is “settled”–there’s a “consensus” that “settles” it … a PhD [among many others] with a blog & who shows up on sciency shows says so: http://www.slate.com/blogs/bad_astronomy/2013/05/17/global_warming_climate_scientists_overwhelmingly_agree_it_s_real_and_is.html

Of course, “settled science” has an unsettling way of getting “unsettled,” as history shows, a summary, for example: http://www.deftnews.com/2014/a-brief-history-of-settled-science/

About such history, consider what an early Leftist authority Karl Marx observed: “History repeats itself, first as tragedy, second as farce.”

That [“farce”] seems applicable given that what’s observed to be not happening (continued warming) as predicted kinda unsettles the “settled” aspect.

Though, technically, history doesn’t “repeat” precisely exactly, as ole Marx (& others have said), it is similar: â€œHistory doesn’t repeat itself, but it does rhyme.â€ Mark Twain.

I.E., rhyming…its the “Same Old Song & Dance” — not exactly on point (but then how much “point” is really necessary*), but a still a great tune: http://www.youtube.com/watch?v=4rScnRMbaF4

However, here’s a good read by Michael Crichton that is “on point” describing how this IPCC topic is, effectively, the “Same Old Song & Dance” in a conceptually rhyming sort of way: â€œWhy Politicized Science is Dangerous” at: https://www.msu.edu/course/lbs/332/bellon/R0124b.pdf .
Terry Oldberg

October 2, 2014, 10:25 am

Ken:

Among the words which, in global warming climatology, are polysemic is “science.” Often the word means “pseudo-science.” Thus the ambiguously worded phrase “the science is settled” disambiguates to “the pseudo-science is settled.”
Sheri

October 2, 2014, 10:31 am

Profitizol works if we can sell it to the public!
VftS

October 2, 2014, 10:41 am

“… 71 folks in the profiterol group got better …”
Cream puffs are good for you! Are your enemies in league with Big Baker?
Chip Knappenberger

October 2, 2014, 11:28 am

Matt,

Your discussion of randomness is insufficient. While it is true that â€œâ€˜chanceâ€™ or â€˜randomnessâ€™ werenâ€™t the causesâ€ of individual outcomes, they do play a role in the assignment of individuals to the groups. A statistical model gives you some indication as to how big an influence such â€œrandomnessâ€ has on the outcome.

Also your null hypothesis is incorrect.

â€œProfitizol and the placebo cause the exact same biological effectsâ€, which they ask us to â€œacceptâ€ or â€œrejectâ€.

I donâ€™t think that is how you set up your original experiment. Your original experiment was set up to test the null hypothesis â€œProfitizol and the placebo produce that same outcome measured in terms of recovery from the screaming willies.â€ (it didnâ€™t test for biological pathways).

Consequently, your statement â€œIndeed, to insist it does is to abandon or willfully ignore all this knowledge and cast all your attention on only that which can be quantified (the Sin of Scientism)â€ is going too far.

Statistical models, in this case, should be designed to test the outcome of whether or not someone is â€œcuredâ€ of the screaming williesâ€â€¦isnâ€™t that what your experiment was designed to revealâ€¦that someone â€œgot betterâ€? If â€œgetting betterâ€ is subjective, than this is a poorly set up experiment and I think youâ€™d have to redesign your statistics.

Statistical tests help to provide more information than that gathered by our eyes alone.

But, it is probably worth keeping in mind that both can be deceiving.

-Chip
Briggs

October 2, 2014, 11:52 am

Chip.

Nope. Nope everywhere, I mean.

It is a well known proof (to Baysesians, anyway) that randomization does nothing and is not needed. Control does. Experiments must be controlled. Randomization is in fact a form of mysticism, a form of “blessing” which must be present for an experiment to be valid. It, in the form of unpreditability, is only possibly useful in preventing people from cheating. This is why coin flips, which are perfectly predictable if one knows the initial conditions and the equations of motion, are used to determine some decisions—decisions (like football games) that must be made here-and-now, and where people do not know the initial conditions etc.

See the links under Randomness on the Classic Posts page for more.

Your statements about the “null” are also wrong. I said the “null” was “Profitizol and the placebo cause the exact same biological effects” and you say it is “Profitizol and the placebo produce that same outcome measured in terms of recovery from the screaming willies.” These are equivalent. I think you failed to see this because of the word (you used) produce, which implies cause (the word I used). Obviously, or rather implicitly, profitizol (and possibly the placebo) are involved in the causal biological pathway of curing the screaming willies. How else would it work except by causing something in the body?

As I have written many times, we statisticians teach or think about causality very badly. Think about the ice cream example more and you’ll see what I mean.

You’re also distracting yourself with the “getting better.” Make it as hard and quantitative as you want, and everything I wrote holds (plus, I tacitly assumed “getting better” was rigorously defined).

And I was also right about the ubiquity of the Sin of Scientism (I’ve also written about this many times; see the Classic Posts page). Since we can’t put numbers of all that knowledge the doctor has (not only about the drug, but disease is general), it plays no part in quantitative statistical models. This is why statistical models are over-confident. They are only a small part of the picture, over-emphasized because they are quantitative and numbers feel more sciency.
Terry Oldberg

October 2, 2014, 12:12 pm

Briggs:

Randomization serves the purpose of ensuring approximate homogeneity of the composition of the sample with the composition of the underlying population. As I understand the matter, it is this homogeneity that makes it possible for one to generalize from observations of the past to observations of the future thus creating knowledge. Do you disagree?
Briggs

October 2, 2014, 12:15 pm

Terry,

Yes, sir. I do disagree, because what you said was false. See the articles on “random.”

Too often we only think of experiments in terms of statistics, where we turn a blind eye to causality. Statistics is full of mysticism. We should, as I have said repeatedly, think more like physicists. Control is what is important.
Tom Scharf

October 2, 2014, 1:06 pm

One example I have seen a couple times in actual released products is the mistake of over-fitting calibration data to elaborate statistical models to get the fit just so good. 5th order polynomial!

Then release the product and someone tries to measure something out of band with the original calibration (unit was calibrated to 20F to 70F, user takes a measurement at 90F). This can result is some wildly wrong answers. A very, very, bad mistake in a medical device.

Failing to expand anything beyond a linear fit in forward and reverse directions can be a major mistake.
Sheri

October 2, 2014, 1:09 pm

Terry: In drug trials, I thought that matching the characteristics of the control and drug groups were most important, not randomization. Sometimes the assumption is made that if one randomizes enough, you can overcome the need for this, but matching the two groups is what I was taught was the proper way to set up the trial.
Chip Knappenberger

October 2, 2014, 2:06 pm

Matt,

You wouldnâ€™t need statistics if all models were perfect. But since we arenâ€™t omnipotent, i.e., our â€˜controlâ€™ isnâ€™t perfect, we resort to random trials and then apply statistics.

I must continue to disagree about your null hypothesis, it does not accurately reflect your experiment design as described.

The null is that there is no difference in the *outcome* as measured by the experiment between the two groups. It implies nothing about the manner that the outcome came to pass, i.e., which of the many various biological (or otherwise) pathways that may be involved either through the treatment or through the placebo. The measurement, as you set up the experiment, was that the patient â€œgot betterâ€ â€“ it was not â€œthe details of how they got better.â€
It is only because you mis-specified the â€œnullâ€ in your experimental set up, that you are able support the followingâ€¦ â€œBut given you are a doctor, aware of biochemistry, genetics, and the various biological manifestations of the screaming willies, it is highly unlikely this â€œnullâ€ proposition holds. Indeed, to insist it does is to abandon or willfully ignore all this knowledge and cast all your attention on only that which can be quantified (the Sin of Scientism).â€

You set up the experiment to simply â€œquantifyâ€ whether the patient â€œgot betterâ€ or not. Why is there a Sin of Scientism to focus on such results? If you want to investigate something else (like the details of why the got better), set up the experiment differently!

-Chip
Terry Oldberg

October 2, 2014, 2:23 pm

Briggs:

Thanks for sharing. I searched your site on “random” and came up with several of your essays but not enough material to be able to understand your basis for claiming I made a false statement . Does it work for you to substitute the phrase “probability sample” for the phrase “random sample” or do you propose an alternative to probability sampling that accomplishes the same end? If the latter, what is this alternative?
Briggs

October 2, 2014, 2:43 pm

Chip,

No, sir, your “null” here is incorrect (it was right last time, because it was logically equivalent to mine), though the misconception is common enough (so I hope other people are paying attention).

If the null was the proposition “there is no difference in the *outcome* as measured by the experiment between the two groups” then the null here is false, as it would be false for any other combination of outcomes except an equal number of recovered per group. Thus if 51 drug and 50 placebo patients recovered, the null would be proven—certainly proven—false. And we’re done.

Plus, there is no avoiding talk of causality, as it looks like you want to do. Some thing or things caused people to get better. Was it rofitizol? Something else? If it was the drug, it had to do it on the body, of course, so we naturally speak of causing changes in just those systems associated with the screaming willies.

Everybody: make sure you understand the ice cream-drowning example!

(Incidentally, there were no details of “getting better”, just as there were no details in the “screaming willies.” I hope we can agree the example still works even though we are premising on fictional diseases.)

The Sin of Scientism, as I have written about many times, and evinced here, is to only incorporate the quantitative knowledge (the 71 & 60 recovered) into the statistical model and to eschew all other knowledge. Or rather, since statistical models as normally envisioned can only accommodate quantitative data, the Sin comes in assuming that the pronouncements of the model should receive the sole or vast majority of the weight of evidence. This especially happens in “soft” fields, like sociology or education.

What we’re really after here is to quantify, or rather express the uncertainty of, the proposition “If I give this patient profitizol, he gets better.” The statistical model can provide this (this is a prediction or forecast), but even if one uses, as it were, the best equipment, the answer will still probably be too sure of itself, because the models (as we normally have them) can’t incorporate unquantifiable information.

Of course, in real life, doctors go beyond simplistic models and, if wise, rarely give a precise number to a prediction. How can you quantify what can’t be quantified? Too bad the IPCC can’t follow suit.

Again (Terry, you too), I urge you to go to the Classic Posts page and search for “random”. Don’t just search the site, because it’ll turn up hundreds of articles.
Terry Oldberg

October 2, 2014, 3:08 pm

Sheri:

Thanks for sharing!

To assign equal numbers of participants to drug or placebo is OK. To assign equal numbers to groupings that are based on age, gender or other characteristics as well has a shortcoming. This is that different groupings than the ones assumed may provide greater information about the outcomes of the events of the future, e.g. whether a person will survive five years after his diagnosis of pancreatic cancer. It is information that is needed in controlling a system.
Sheri

October 2, 2014, 4:22 pm

Terry: What I was taught for setting up drug trials and in various social science-type studies, one wants the drug and the control groups to match as much as possible so the only difference is the drug/variable. If you use random assignments, there is a possibility that the difference between the groups is something other than the drug. When setting up the groups, one tries to get as much variety as possible. I know that in the past drug companies used mostly males and mostly adults in studies, then found that women and children should have been included because their reactions were different. Probably the best way to find if the drug is the difference is to run more than one study, using as many different groupings as you can. In the absence of that, you can “randomly” choose people for one group and match them with similar people in the other group. When researchers look for participants in drug trials, the ones I have seen have very strict requirements involving years one has had the disease, sex, age, medications used in the past, current medications, etc. It’s not at all random.

Your comment about pancreatic cancer made me think you may be talking about metastudies here and not actual experiments with control and experimental groups. In that case, randomization in theory may improve the usefulness of the study. (I say in theory because in practice, that’s not what always happens.)
Ye Olde Statisician

October 2, 2014, 5:12 pm

My old boss, Ed Schrock, used to say, “Trouble comes in bunches, so take your sample in bunches.” That is, stratified samples rather than simple random samples. For example, in sampling wax “pillows” (they look like raviolis) he recommended taking pillows from the top, middle, and bottom of the shipper carton, from the left side and right, and so on. In manufacturing, defects tend to be concentrated in certain subgroups of the population, so you want to make sure you look at all the subgroups.

Random allocation is sometimes useful is as an antidote for judgment sampling. This is where you select pieces after you have known something about them. For example, you might identify deeds in the county deed book by randomly selecting page numbers and pulling the deed if the random page is the first page of a deed. (Deeds varied in number of pages, so page number alone would over-represent longer deeds.) This avoided the issue of the sampler “favoring” some deeds over others.

But not always: Cans to be inspected for damage were taken on a loading table where they ran onto a pallet. The inspector could see the can before pulling it, and would see if it is obviously damaged. The inspector then had to make a conscious decision whether to include the can or not and would inevitably impose his own judgment onto the sample. In effect, he would be telling the sample what the process is doing, rather than vice versa. The solution was to temporarily block the flow of cans coming from the necker-flanger. This left ten cans visible in the tracks. The block was removed, the inspector counted off ten cans, then took the next 24 in sequence. Since the flanger had 12 flanging heads, this meant a sample stratified at 2 cans/head which reflected the proportion in the overall population.

These sorts of techniques are generally preferable to the simple random sample.

But it is quite true that when a difference is found to be “statistically significant” there is no thought of causation. For example, a statistically significant difference in the failure rate of polymer batches among four reactors, does not tell us why reactor C accounts for half the failures instead of the expected 25%. But it does indicate where to look for causes. Something is different about reactor C. Similarly, two groups of women who differed in cancer rates: one worked with CRT screens (remember them?), the other did not. But there were other differences between the two groups: smoking, diet, exercise, and so forth. So the discovered difference did not mean that CRT monitors were “causing” cancer.

Part of the problem stems from the abandonment of a good, manly Aristotelian metaphysic of causation for the girly, Humean metaphysic of correlation. A recent discussion with a fellow who denied causation outright — everything is correlation — revealed how deep the rot has gone, and why people confuse the failure to reject the null hypothesis with “confirming” the null hypothesis.
Briggs

October 2, 2014, 5:50 pm

YOS,

“Part of the problem stems from the abandonment of a good, manly Aristotelian metaphysic of causation for the girly, Humean metaphysic of correlation.”

Amen to that, brother. Amen.
Sheri

October 2, 2014, 6:00 pm

YOS and Briggs: Guess that tells me where my “you don’t know whether the sun will rise tomorrow just because it did today” professor stands on the “girly” scale. And that was back in the 70’s.
Terry Oldberg

October 2, 2014, 6:05 pm

Sheri:

To control the outcomes of events one needs a model. A model is a procedure for making inferences. In building a model the builder must decide upon the inferences that will be made by this model. This decision may be made by optimization or by the method of heuristics. A heuristic is an intuitive rule of thumb that delivers a solution to the problem of selecting an inference but is unlikely to find the optimal solution to this problem.

I gather that you were taught to use the method of heuristics. That’s traditional. Sometimes one gets much better results by using optimization.
Terry Oldberg

October 2, 2014, 6:23 pm

Hume famously argued that the problem of induction had no solution. Today we have one in information theoretic optimization. It takes us as far as possible along the road toward causation under incomplete information. Usually we are forced to make decisions under incomplete information even though this is undesirable. Consequently rather than being true the conclusions of our arguments have only probabilities of being true.
Sheri

October 2, 2014, 7:23 pm

Terry: Rather than continue here, I will read the blog posts and comments you have elsewhere concerning the problem of induction, information theoretic optimization, and reasoning. I fear I’m just wandering about here trying to understand something based on a couple of comment exchanges and I don’t want to have you spend time answering things that I might understand reading Climate, etc and others.
Alan Cooper

October 2, 2014, 8:16 pm

Two comments:

1. re “Results are never due to chance, they are due to real causes, which we may or may not know.” Although I would not say that anything is “caused by” chance, there are many things which, to the best of our knowledge, happen by chance. By this I mean that our best available physical theories assert the existence of events for which there is not even the possibility of finding prior events or circumstances which determine their timing or occurrence. The location where an electron passing through a narrow slit will be observed, or the time of decay of an excited atom are matters for which we have no good reason to expect any “cause” – unless of course, you have a different definition of “cause” than I do. What definition of “cause” allows you to make the claim quoted above?

2. You seem to think that the reason people suspect that CO2 emissions cause global warming is because of statistical modelling of some kind or other. This is false. The reason for being concerned about CO2 emissions is the basic thermodynamic fact, first noted by Fourier and analysed in more detail by Arrhenius, that increasing the atmospheric concentration of CO2 should (in the absence of any known countervailing effect) raise the energy of the Earth’s surface environment – most likely by warming, but also possibly by some combination of phase changes and/or chemical effects – all of which have potentially significant adverse environmental consequences.
Terry Oldberg

October 2, 2014, 8:18 pm

Thanks Sheri. That sounds like a good plan.
Sheri

October 2, 2014, 8:54 pm

Alan: You’ve been reading the latest climate change advocate talking points, haven’t you? Pretend models have nothing to do with this. CO2 raising the temperature is based on thermodynamics, but without models, forcings, feedbacks and numerous other assumptions, you can’t possibly show how much influence CO2 has on the planet. You have to create a full-blown model. If you just plug in CO2, you don’t even get the current average temperature of the earth. If this were straight thermodynamics, it would be very, very simple. (Your parenthetical “in the absence of any known countervailing effect” would indicate you know full well it’s not straightforward thermodynamics and you are trying for a CYA on the whole deal.)

Why is it that people jump into quantum mechanics in the middle of a discussion of the macro world? (I see this a great deal of the time on psuedoscience sites) Are you really that uninformed or just being a pain in the neck? Cite an example in the macro world if you’re going to make claims about chance when we are discussing the macro world. Surely you know the difference between the two.
Gary

October 3, 2014, 8:04 am

Then there’s propensity score matching https://en.wikipedia.org/wiki/Propensity_score_matching, but Briggs is sure not to like it.
Milton Hathaway

October 3, 2014, 3:11 pm

Professor Briggs, I’m left a bit confused about the implications of the use of the concept of “random”. I’m wondering if this is one of those things like the proper usage of versus that my computer is so concerned about, or if there are deeper implications, like getting the wrong answer.

Specifically, should I stop using the Monte Carlo Method? If so, what should I use instead?
Milton Hathaway

October 3, 2014, 3:14 pm

Make that . . . the proper usage of (its) versus (it’s) . . .
Briggs

October 3, 2014, 5:09 pm

Milton,

You’re not using “randomness” with MCMC. You’re using a deterministic set of numbers to feed an algorithm to simplify, or rather approximate, (say) computing an integral. The answers MCMC gives are, as well known, fairly accurate. Yet the theological basis (for lack of a better phrase) underpinning it is wrong. It’s not the “randomness” which ensures the accuracy. It’s that the deterministic algorithm is well constructed.

I have to do a post on this with examples. If I forget, remind me.
Sheri

October 3, 2014, 5:44 pm

Briggs: Please do a post on this (MCMC). It would be very helpful. I will definately remind you.
Milton Hathaway

October 3, 2014, 8:05 pm

MCMC? Markov Chain Monte Carlo?
Alan Cooper

October 3, 2014, 9:32 pm

Sheri, your response to my second point is like saying that one shouldn’t be concerned about having a gun pointed at you without having a “full-blown model” of exactly how the bullet will actually harm you.

And as for my first point, it was not intended to be about the climate issue in particular but just about the general claim that everything has a “cause” – which Briggs seems to make quite frequently in a wide range of contexts.
Sheri

October 3, 2014, 10:20 pm

No, Alan, it is NOT in any way like having a gun pointed at me and saying I need a full-blown model. It’s like having a few cells in one’s body that look kind of like cancer but we really can’t tell, so we rush in and do a full three months of chemo and cut out anything near the cells. After all, it’s odd looking cells that cause cancer so it must be dangerous to have said cells in your body. It’s just as straight-line as your CO2 model. Why is it that people always choose a concrete model that we understand fully and then try to BS that it’s somehow “like” playing with a complex climate system and claiming ONE element is the problem? Not very scientific nor rational as far as I can see.

Throwing in quantum mechanics when discussing cause on the macro level is doing exactly what the pseudoscience people do all the time–use something totally irrelevant to the discussion. I must say, it may be true you don’t know the difference. Can’t tell you how many times I was told quantum mechanics explains ESP, telekinesis, teleportation, and so forth. Maybe we need a fallacy for this type of thing: “Argument using Quantum Mechanics in the Macro world” (Quantum confusience?).
Terry Oldberg

October 3, 2014, 11:40 pm

Alan Cooper (2 Oct. at 8:26 PM) accurately states that other thing being equal increasing the atmospheric CO2 concentration increases the energy of Earth’s surface environment. Should we base public policy upon an “other things being equal” model? Not in my opinion.

A model is a procedure for making inferences. Each time an inference is made there are several alternatives for being made. Logic contains the rules by which the one correct inference may be distinguished from the many incorrect ones.

“Other things being equal” is a heuristic, that is, an intuitive rule of thumb that selects for being made an inference that is unlikely to be the optimal one. The selected inference should be the optimal inference.
Brandon Gates

October 4, 2014, 2:09 am

Terry,

It is incapable of supporting the control of the associated system but appears to be capable of doing so. â€œModelsâ€ having this characteristic underlie repeated attempts by governments to control Earthâ€™s global surface temperature.

It’s a pretty pathetic worldwide conspiracy which fails to publish observations that convincingly match predictions.
Terry Oldberg

October 4, 2014, 10:33 am

Thanks for taking the time to respond and giving me a chance to amplify.

With reference to the climate models that were used in making current public policy on CO2 emissions the appearance that they predict at all is a consequence from an application of the equivocation fallacy that exploits the polysemic nature of “predict” in the literature of climatology. The count of the events underlying the models that made these “predictions” was nil with the result that the predicted relative frequencies of the outcomes of events could not have been compared to the observed relative frequencies. For these two relative frequencies to match is required for validation of the model.

Also, a conclusion from cybernetics theory is that a system is insusceptible to being controlled if the mutual information between state-spaces containing the observed and the inferred states is nil. For the above referenced models the mutual information is nil. Experience suggests that the count of observed independent events must exceed about 150 for construction of a validated predictive model. Here the count is nil.

This situation may be changing. For the most part, arguments made in AR5 are based upon old-style models with their uselessness in controlling the climate obscured through applications of the equivocation fallacy. However, in 1 1/2 pages of Chapter 11 of its report, IPCC Working Group I presents the view that to be predictive a model needs the underlying events and for a predictive model to be validated the predicted relative frequencies must match the observed relative frequencies. That “predict” is used here in a properly disambiguated way provides hope for the future. Meanwhile, it would be well if statistically learned colleagues of ours, including Briggs, were to use “predict” in this same disambiguated way.
Terry Oldberg

October 4, 2014, 10:34 am

Whoops. I meant to address my previous post to Brandon.
Sheri

October 4, 2014, 12:53 pm

Brandon: It may be a hint, but hints are not sufficient for science and certainly not reduce society to preindustrial status (unless we go nuclear, which appears unlikely). It only suggests an hypothesis, it in no way provides any proof what so ever. The only thing that is “proof” is lack of correlation–it shows you missed the cause entirely.

My professor stated that the parts may work but not the whole. He said nothing about what happens when one part of the theory is clearly wrong. If part of the theory is wrong, you have to discard the theory. You can try taking that part out and see if the theory then works (indicating you included a variable that was not correct, so to speak). But it’s now a DIFFERENT theory. You have to discard the original as being wrong.

The numbers are not impressive, but maybe I’m missing something. CO2 is NOT a straw man. Really, if it’s not CO2, then we really can keep right on burning fossil fuels and who cares? If CO2 is irrelevant, as global warming advocates are now apparently trying to say or imply (I’ve seen it elsewhere too), then we have nothing to fear. So can I go back to burning coal for electricity and driving my SUV knowing CO2 is my friend or not? If not, then it’s not a strawman. Every theory of AGW I read says CO2 is the factor that is causing the warming. If you have a paper that says there’s another cause, I’d love to see the link.

I don’t think objects dispersing energy do so omni-directionally. There’s something called Rayleigh scattering and then stretch, asymmetrical and in sync molecular vibrations. These do not appear to be omni-directional in their energy release.

The thermometers are fine as far as I can see. Could be a few malfunctioning, but I don’t really have access to said instruments, so I am join with they are okay for now. Are they a representative sample–seems unlikely. Should the data be “homogenized” and “adjusted”–as little as possible. Of course, if the thermometers are fine, they wouldn’t be adjusting temperatures, so the scientists do not apparently think they are fine.

(Concerning your comment to Terry: All conspiracies are not successful. Some are very poorly put together and difficult to maintain. Assuming they exist at all. The lack of success is not proof there is no conspiracy, however.)
Brandon Gates

October 5, 2014, 11:07 pm

Sheri,

The only thing that is â€œproofâ€ is lack of correlationâ€“it shows you missed the cause entirely.

What it means is that the null hypothesis cannot be rejected. Stats 101.

But itâ€™s now a DIFFERENT theory. You have to discard the original as being wrong.

I agree with that, especially the part about it being a different theory.

The numbers are not impressive, but maybe Iâ€™m missing something. CO2 is NOT a straw man.

“[Recent] observations falsify CO2’s contribution, full stop” is the strawman.

If CO2 is irrelevant, as global warming advocates are now apparently trying to say or imply (Iâ€™ve seen it elsewhere too) …

What non-climatologist advocates have to say on the science is not relevant to me when discussing what the climatologists themselves publish about their observations.

If you have a paper that says thereâ€™s another cause, Iâ€™d love to see the link.

http://data.giss.nasa.gov/modelforce/ There are three papers to read on that page alone, all available as non-paywalled .pdfs. Here’s another one from the same group, a bit older (2002): http://pubs.giss.nasa.gov/docs/2000/2000_Hansen_etal_2.pdf The money quote from the abstract:

A common view is that the current global warming rate will continue or accelerate. But we argue that rapid warming in recent decades has been driven mainly by non-CO2 greenhouse gases (GHGs), such as chlorofluorocarbons, CH4, and N2O, not by the products of fossil fuel burning, CO2 and aerosols, the positive and negative climate forcings of which are partially offsetting.

Those links cover radiative forcings, but that’s only the beginning of the story. The following 2013 paper is much discussed, but there are many others: http://scholarspace.manoa.hawaii.edu/bitstream/handle/10125/33072/Kosaka&Xie2013.pdf?sequence=1 From the abstract:

Here we show that accounting for recent cooling in the eastern equatorial Pacific reconciles climate simulations and observations. We present a novel method to unravel mechanisms for global temperature change by prescribing the observed history of sea surface temperature over the deep tropical Pacific in a climate model, in addition to radiative forcing. Although the surface temperature prescription is limited to only 8.2% of the global surface, our model reproduces the annual-mean global temperature remarkably well with r = 0.97 for 1970-2012 (a period including the current hiatus and an accelerated global warming).

Only 8.2% global coverage is a weakness of this particular study. The authors properly and explictly state that their correlations are fubared elsewhere: “The model fails to simulate the SAT and sea-level pressure (SLP) changes over Eurasia, suggesting that they are due to internal variability unrelated to tropical forcing … ” Not “proving”, suggesting. Elsewhere in the same paper:

While radiative-forced response will be come increasingly important, deviations from the forced response are substantial at any given time, especially on regional scales. Quantitative tools like our POGA-H are crucial to attribute the causes of regional climate anomalies. The current hiatus illustrates the global influence of tropical Pacific SST, and a dependency of climate sensitivity on the spatial pattern of tropical ocean warming, which itself is uncertain in observations and among models. This highlights the need to develop predictive pattern dynamics constrained by observations.

Emphasis added. None of this is any news, just incremental advances along what has already been on the table since … well … forever.

Thereâ€™s something called Rayleigh scattering and then stretch, asymmetrical and in sync molecular vibrations. These do not appear to be omni-directional in their energy release.

Asymmetry affects the directional distribution of the emissivity (and absorptivity), but it does not make them uni-directional. GHGs do not all decide to adopt the same orientation. A low, thick cloud layer will be directionally biased compared to a high thin one. But a small homogeneous chunk of atmosphere radiates all directions evenly.

Rayleigh scattering is a different animal entirely, mostly dependent on the celestial orientation of the sun. Here’s a 1993 paper on how that can screw up atmospheric radiative models Rayleigh scattering is ignored in favor of computational simplicity: http://pubs.giss.nasa.gov/docs/1994/1994_Mishchenko_etal_1.pdf Note the first paragraph on pg. 493 (pg. 3 of the document): ” … for an entirely gaseous atmosphere consisting of randomly oriented anisotropic molecules … ”

Are [thermometers] a representative sampleâ€“seems unlikely.

No sample is ever perfectly representative. We’ve talked about all this before though, and the usual answer I get about what resolution of coverage would be sufficient is a vague, “more than we’ve got now”. Bob Kurland’s standard is the distance between his driveway and rose garden, which will never happen. What’s yours?

Should the data be â€œhomogenizedâ€ and â€œadjustedâ€â€“as little as possible.

But as much as considered necessary. Which are judgment calls subject to bias whenever they’re done, and subject to change in light of newer evidence, a novel technique or discovery that someone screwed up. That’s what re-analysis is about.

Of course, if the thermometers are fine, they wouldnâ€™t be adjusting temperatures, so the scientists do not apparently think they are fine.

Of course they don’t think they’re fine. They want more of them and better ones at least as much as you do. When they’re not writing grants to get more of them, they write papers about how to handle what data they’re already got to work with. Nothing would ever get done if everyone sat on their hands waiting for “more than we’ve got now”.

All conspiracies are not successful. Some are very poorly put together and difficult to maintain.

The bigger they are, and the more people involved, the higher the probability of detection.

Assuming they exist at all.

That’s a popular one, particularly among the types of folk who think that well-established physics a poor assumption.

The lack of success is not proof there is no conspiracy, however.

lol, you really walked into this one: The lack of skillful prediction is not proof that there is no CO2 effect on surface temperature, however.
Brandon Gates

October 6, 2014, 2:24 am

Terry,

With reference to the climate models that were used in making current public policy on CO2 emissions the appearance that they predict at all is a consequence from an application of the equivocation fallacy that exploits the polysemic nature of â€œpredictâ€ in the literature of climatology.

I re-read your guest post on this blog, and as I see it, the crux of your argument is this: A predictive inference is made by a model but not a modÃ¨le. On the other hand, a modÃ¨le is capable of making projections while a model is incapable of making them. The â€œprojectionâ€ of global warming climatology is a mathematical function that maps the time to the projected global average surface air temperature. And elsewhere, this: As it has no underlying statistical population, a modÃ¨le is insusceptible to being validated.

For sake of discussion, I accept your definitions as being good as any. What I’m concerned about in my response is whether the … I need a neutral term … tools (and techniques) used by climatologists fit your definition of modÃ¨le.

You lead with this claim: No statistical population underlies the models by which climatologists project the amount, if any, of global warming from greenhouse gas emissions weâ€™ll have to endure in the future.

Right off the bat, I’m at a total loss because “statistical population” has not been defined, so I’ll submit one via the Wikipedia article: In statistics, a population is a complete set of items that share at least one property in common that is the subject of a statistical analysis. For example, the population of German people share a common geographic origin, language, literature, and genetic heritage, among other traits, that distinguish them from people of different nationalities.

One implication of that definition is that defining a statistical population is an arbitrary exercise … it can be whatever one chooses. Which I think is fine so long as the population is adequately and unambiguously defined. A statistical population can be the set of jelly beans within the borders of Zimbabwe, from which we could take a (hopefully) representative sampling and then make statistical inferences about whatever chosen attributes of all jelly beans in Zimbabwe.

I don’t understand how a particular layer of atmosphere fails any reasonable definition of a statistical population so long as somewhere the parameters of that layer of air are explicitly described. Are you saying that they aren’t?

Now you say: A model is said to be â€œvalidatedâ€ when the predicted relative frequencies of the outcomes of events are compared to the observed relative frequencies in a sample that is randomly drawn from the underlying statistical population, without a significant difference being found between them.

Sure. In my jelly bean example, we could predict that another random sampling a year from now would yield up relative frequencies we recorded in our original descriptive analysis. If I’m understanding you correctly it’s only at that point that the descriptive stats gleaned from our initial sampling becomes a model.

So we propose a null hypothesis of change, an experimental hypothesis of no change and do our next random sampling, and do the descriptive stats again. Now we do a statistical test and find that the relative frequencies have changed such that they fail our predetermined significance threshold.

We must conclude that our experimental hypothesis of no change in relative frequencies of jelly bean attributes in Zimbabwe isn’t supported by observation, or in other words, we can’t reject the null hypothesis of change.

The great debate in this forum is whether that means our model is falsified. I’d hedge because I recall Type II errors from stats for dummies. In the jelly bean case, however, I’d recognize that our prediction offered no plausible explanatory mechanism by which relative frequencies should remain static over time, so after a cautionary note that it might be sampling or measurement error or other kind of screwup, I’d call the model falsified.

In the real world, I’d not build a model which primarily relied on past performance as an indicator of future results for any non-trivial purpose. But that isn’t the point of this exercise.

You go on: In an evaluation, projected global average surface air temperatures are compared to observed global average surface air temperatures in a selected time series.

Just as we did with the jelly beans. Do you disagree with me calling what we constructed a model?

More from you: A â€œpredictionâ€ is an unconditional predictive inference. For example, â€œThe probability of rain in the next 24 hours is thirty percent.â€ Notice there is no condition.

I stumbled over that one for a minute until I consulted Wikipedia again: Predictive inference is an approach to statistical inference that emphasizes the prediction of future observations based on past observations. Which is what we did with our jelly bean experiment.

A predictive inference is made by a model but not a modÃ¨le. On the other hand, a modÃ¨le is capable of making projections while a model is incapable of making them. The â€œprojectionâ€ of global warming climatology is a mathematical function that maps the time to the projected global average surface air temperature.

The first two sentences are again definitional. Fine. I think the final sentence is problematic. It is unequivocally true that we see graphs of time plotted against global SAT, but so what? Other than your claim that global SAT has no underlying statistical population, I see no material differences in the following statements:

1) Given that it is cloudy: the probability of rain in the next 24 hours is thirty percent.

2) Given that for the past 10 years, red jelly beans constituted 5% of all jelly beans in Zimbabwe, the probability is 95% that next year the percentage of red jelly beans will be 5+/-0.04%.

3) Given that for the past 100 years the global average SAT change has been 0.0418C/yr, the probability is 95% that global SAT anomaly (1950-1981 baseline) will be 1.8645+/-0.5C in 2034.

You also imply that global SAT projections are a function of time and only time lacking any other conditions, which is patently incorrect. Not much credible literature would make statement (3) because the output of state of the art GCMs are largely not generated probabilistically. I’m going to end here because already some of my arguments could be based on misunderstandings of your definitions, and I’d like to give you a chance to clarify. However, I believe you’ve made a number of factual misstatements that might render your main argument moot according to your own definitions. We shall see.

Actually I lied; I do have one ending note. There are some parameters in various physical simulations which are inherently non-observable, i.e., they don’t represent directly measurable quantities or phenomena.
Sheri

October 6, 2014, 9:07 am

Brandon: I don’t remember my saying anything about what you put in quotes on CO2. You have me confused with someone else. Reading further down your comment, there are more examples of what I did not say. I’ll try to answer what I did sayâ€”

You are either deliberately misinterpreting my question or you are not understanding. I thank you for the papers on various other views. So I think you are saying that at least some papers certainly say my SUV and propane furnace are okay and we’re fine with modern society. Actually, I have a pretty extensive collection of papers that say just that, though most put them in the skeptic campâ€¦.

My use of the term “global warming advocates” refers to all those who believe AGW is real and a problem. Check my blog for the etiology of the term. It includes scientists. (Hansen is out there saying the oceans will boil and he is a scientistâ€”yet you insist on quoting him and pretending he doesn’t believe what he clearly tells everyone who will listen and writes books on. I don’t understand why you pull 12 year old papers out and try to claim Hansen says there are other drivers. Maybe I just need to say “climate change scientists say we must get rid of all fossil fuels, modern life and live in caves”? Really, Brandon, you make no sense here.)

You really don’t understand the difference between a primary cause and a secondary cause, do you? What you now seem to be saying is what skeptics say–lots of things affect climate and none are really the driver. So you are no longer worried about CO2 and modern life? I really can’t tell.

Homogenized as “much as necessary”. That’s like saying “tax as much as necessary”. It’s pretty much purely subjective and not actually quantifiable except person to person. If scientist A is biased to colder, he adjusts downward. B wants warm adjusts upward. Except most “adjustments” reported in temperature data make the AGW theory stronger (revising downward in the past, upward now), a clear sign the data is being manipulated, not adjusted. I would have been flunked out of chemistry if I had tried the same thing, but today that is somehow supposed to be “science”. No, if the majority of the “adjustments” go in the direction of supporting the theory, that is what was quaintly called “CHEATING” in the days of real science.

You believe in conspiracies? I can’t tell by your physics comment.

No, I didn’t “walk into this one”. As stated before, lack of skillful prediction is not proof there is no CO2 effect. What you have is an Hypothesis that CO2 affects the temperature and no proof thereof. Which is a nice science project, but nothing to announce to the press or shout from the mountain tops. There are hypotheses on the sun, the ocean, atmospheric pressure, etc all out there. The lack of proof is not indicative that these are not the primary drivers either, if there is a primary driver. Which is why currently our understanding of climate sits at a level of “scientific guess”, something to continue studying but nothing to claim we actually understand.

The True Meaning Of Statistical Models

Related

42 Comments

Leave a Reply

Share this:

Related

42 Comments

Leave a Reply