Skip to content

Category: Philosophy

The philosophy of science, empiricism, a priori reasoning, epistemology, and so on.

November 22, 2009 | 22 Comments

Climategate Peer Review: Science red in tooth and claw

See also see this story on proxies

I am a scientist and I have lived around fellow scientists for many years and I know their feeding habits well. I therefore know that the members of our secular priesthood are ordinary folk. But civilians were blind to this fact because our public relations department has labored hard to tell the world of our sanctity. “Scientists use peer review which is scientific and allows ex cathedra utterances. Amen.”

But the CRU “climategate” emails have revealed the truth that scientists are just people and that peer review is saturated with favoritism, and this has shocked many civilians. It has shaken their faith and left them sputtering. They awoke to the horrible truth: Scientists are just people!

Now all the world can see that scientists, like their civilians brothers, are nasty, brutish, and short-tempered. They are prejudiced, spiteful, and just downright unfriendly. They are catty, vindictive, scornful, manipulative, narrow-minded, and nearly incapable of admitting to a mistake. And they are cliquey.

Thus, we see that the CRU crew define a “good scientist” as one who agrees with them, a “bad scientist” or “no scientist” as one who does not agree with them, and a “mediocre scientist” as somebody who mostly agrees with them. Further, these judgments are carried to the peer-review process.

Claiming lack of peer review was once a reasonable weapon in scientists’ argument armamentarium. After climategate, all can see that this line of logic is as effective as a paper sword.

Alfred's Global Warming Poem

For example: the CRU crew publicly cry, “If our skeptics had anything to say, let them do it through peer review, otherwise their claims don’t count.” Never mind that this parry is a logical fallacy—an argument is not refuted because it was uttered outside a members-only journal. Pay attention to what they say privately:

Proving bad behavior [about peer review] is very difficult. If you think that [Geophysical Research Letters editor] Saiers is in the greenhouse skeptics camp, then, if we can find documentary evidence of this, we could go through official AGU channels to get him ousted.1

They say that this journal or that one, because it dared publish peer-reviewed work that did not agree with the CRU consensus should be banished from the fold, and that its editors should resign or be booted, and that everybody should agree not to cite papers from those journals, and so on.

In other words, use muscle and not mind if you don’t like the results. Get rid of the editor and put an agreeable apparatchik in his place.

Another popular thrust: claim that it wasn’t real, genuine, honest-to-goodness peer review that led to skeptical findings being published. Something must have gone horribly wrong for those papers to have seen the light of day! Peer reviewed is thus implicitly defined as that process which publishes only those views that agree with prior convictions.

Sensing that that tactic could fail, some said, “Aha!, let’s see if we can disparage the authors of those skeptical papers: if we can successfully savage and malign them, then their findings are wrong.”

Yes, sir, dear reader, you guessed it. Another logical fallacy. It is absolutely no argument whatsoever to say a finding is wrong because its purveyor is “not a real climatologist” or “has not published much” or that he “has few citations from previous papers.”

It is also a fallacy to say that because a skeptical argument has appeared on a website—and could not pass through the gauntlet of the good-old-boy peer review system—that it need not be answered.

Here’s some advice to my fellow scientists: If an argument appears on a website, or on FOX news, or in a newspaper, or even on the back of the t-shirt, and that argument fails, then simply say so and say why. And then be done with it. Do not make an ass of yourself by claiming that answering criticisms that do not come from your circle of friends is beneath you.

If an argument that is old and has been well refuted elsewhere, say so, and say where a reliable refutation may be found. It makes you look desperate and foolish to say that the argument came from a blogger and is therefore suspect. And it makes people believe the blogger.

Anyway, do not cry foul over skeptical blogs and then simultaneously publish your own blog to disseminate your own beliefs. “They can’t publish a blog but we can.” That just looks stupid.

But don’t let’s get too carried away, everybody. These kind of behind-the-scenes activities, perhaps more heated in some respects, are the same in every field. Climate scientists are people and so are scientists in other areas. Bad behavior is nothing new and will never change, because people will always be people.


1I wrote to the author of those words and asked, “I can understand that you feel strongly about the matter, but does your conviction run to harming the career of a fellow scientist merely because he disagrees with you?” I’ll let you know if I receive and answer.

See also see this story on proxies

August 1, 2009 | 21 Comments

Models, theories, consistency, and truth

Ready? Put on your best straight face, recall that global temperatures have not increased for a decade, and that it’s actually been getting cooler, then repeat with Brenda Ekwurzel, of the Union of Concerned Scientists—what they’re concerned about, heaven knows; perhaps replace “concerned” with “perpetually nervous”—repeat, I say, the following words: “global warming made it less cool.”

Did you snicker? Smile? Titter? Roll your eyes, scoff, execrate, deprecate, or otherwise excoriate? Then to the back of the class with you! Because what Ekwurzel said was not wrong, because it is true that the theory of global warming is consistent with cooler temperatures. The magic happens in the word consistent.

To explain.

While there might be plenty of practical shades of use and definition, there is no logical difference between a theory and a model. The only distinctions that can be drawn certainly are between mathematical and empirical theorems. In math, axioms—which are propositions assumed without evidence to be true—enable strings of deductions to follow. Mathematical theories are these deductions, they are tautologies and, thus, are true.

Empirical theories, while they might use math, are not math, and instead say something about contingent events, which are events that depend on the universe being in a certain way, outcomes which are not necessary, like temperature in global warming theory. Other examples: quantum mechanics, genetics, proteomics, sociology, and all statistical models: all models that are of practical interest to humans.

Just like with math, empirical models start with a list of beliefs or premises, again, some of which might be mathematical, but most are not. Many premises are matters of observation, even humble ones like “The temperature in 1880 was cooler than in 1998.” The premises in empirical models might be true or uncertain but taken to be certain; except in pedantic examples, they are never known to be false.

It is obvious that the predictions of statistical models are probabilistic: these say events happen with a probability different than 0 or 1, between certainly false and certainly true. Suppose the event X happens, where X is a stand-in for some proposition like “The temperature in 2009 will be less than in 2008.” Also suppose a statistical theory of which we have an interest has previously made the prediction, “The probability of X is very, very small.” An event which was extraordinarily improbable with respect to our theory has occurred. Do we have a conflict?

Global warming cools things off

No, we do not. The existence of X is consistent—logically compatible—with our theory because our theory did not say that X was impossible, merely improbable. So, again, any theory that makes probabilistic predictions will be consistent with any eventual observations.

Global warming is a statistical theory. Of course, nowhere is written down a strict definition of global warming; two people will envision two different theories, typically at the edges of the models. And this is not unusual: many empirical theories are amorphous and malleable in exactly the same way. This looseness is partly what makes global warming a statistical theory. For example, for nobody I know, does the statement “Global warming says it is impossible that the temperature in any year will fall” hold true. The theory may, depending on its version, say that the probability of falling temperatures is low, and as low as you like without being exactly 0; but then any temperature that is eventually observed—even dramatically cold ones—are not inconsistent with the theory. That is, the theory cannot been falsified1 by observing falling temperatures.

It is worth mentioning that global warming, and many other theories, incorporate statistical models that give positive probability to events that are known to be impossible given other evidence. For example, given the standard model in physics, temperature can fall no lower than absolute zero. The statistical global warming model gives positive probability to events lower than absolute zero (because it uses normal distributions as bases; more on this at a later date). But even so, the probabilistic predictions made by the model are obviously never inconsistent with whatever temperatures are observed.

Incidentally, even strong theories, like, say, those used to track collisions at the Large Hadron Collider, which are far less malleable than many empirical models, are probabilistic because a certain amount of measurement error is expected; this ensures its statistical nature (space is too short to prove this).

Now, since, for nearly all models, any observations realized are never inconsistent with the models’ predictions, how can we separate good models from bad ones? Only one way: the models’ usefulness in making decisions. “Usefulness” can mean, and probably will mean, different things to different people—it might be measured in terms of money, or of emotion, or by combination of the two, or by how the model fits in with another model, or by anything. If somebody makes a decision based on the prediction of a model, then they have some “usefulness” or “utility” in mind. To determine goodness, all we can do is to see how our decisions would have been effected if the model had made better predictions (better in the sense that its predictions gave higher probability to the events that actually occurred).

Unfortunately for Ekwurzel, while she’s not wrong in her odd claim, global warming theory has not been especially useful for most decision makers (those that make their utility on the basis of temperature and not on the model’s political implications). It is trivial to say that the theory might be eventually useful, and then again it might not. So far, the safe bet has been on not.


1Please, God, no more discussions of Popper and his “irrational” (to quote Searle) philosophy. This means you, PG!

July 15, 2009 | 24 Comments

The strange insignificance of statistical significance

Who is more likely to support the death penalty: college undergraduates from a “nationally-ranked Midwestern university with an enrollment of slightly more than 20,000” majoring in social work, or those majoring in something else?

This question was asked by Sudershan Pasupuleti, Eric Lambert, and Terry Cluse-Tolar at the University of Toledo to 406 students, 234 of which were social work undergraduates. The answer was published in the the .Journal of Social Work Values and Ethics

“58% of the non-social work majors favored to some degree capital punishment” and only “36% of social work students” did. They report that these percentages (58% vs. 36%) represent a statistically “significant difference in death penalty support between social work and non-social work majors.” The p-value (see below) was 0.001.

What does statistically significant mean? Before I tell you, let me ask you a non-trick question. What is the probability that, for this study, a greater percentage of non-social work majors favored the death penalty? The probability is 1: it is certain that a greater percentage of non-social work majors favored the death penalty, because 58% is greater than 36%. The answer would be the same if the observed percentages were 37% and 36%, right? The size of the difference does not matter: different is different. Significance is not a measure of the size of the difference. Further, the data we observed tells us nothing directly about other groups of students (who were not polled and whose opinions remain unknown). Neither does significance say anything about new data: significance is not a prediction.

Since significance is not a direct statement about data we observed nor is it a statement about new data, it must measure something that cannot be observed about our current data. This occultism of significance begins with a mathematical model of the students’ opinions; a formalism that we say explains the opinions we observed: not how the students formed their opinions, only what they would be. Attached to the model are unobservable objects called parameters, and attached to them are notions of infinity which are so peculiar that we’ll ignore them (for now).

A thing that cannot be observed is metaphysical. Be careful! I use these words in their strict, logical sense. By saying that some thing “cannot be observed”, I mean just that. It is impossible—not just unlikely—to measure its value or verify its existence. We need, however, to specify values for the unobservable parameters or the models won’t work, but we can never justify the values we specify because the parameters cannot be seen. This predicament is sidestepped—not solved—by saying, in a sense, we don’t care what values the parameters take, but they are equal in all parts of our model. For this data, there are two parts: one for each of non-social majors and social majors.

Significance takes as its starting point the model and the claim of equality of its parameters. It then calculates a probability statement about data we could have seen but did not, assuming that the model is true and its parametric equalities are certain: this probability is the p-value (see above) which has to be less than 0.05 to be declared “significant.”

Remember! This probability says nothing directly about the actual, observed data (nor does it need to, because we have complete knowledge of that data), nor does it say anything about data we have not yet seen. It is a statement about and conditional on metaphysical assumptions—such that the model we picked is true and its parameters equal—assumptions which, because they are metaphysical, can never be checked.

Pasupuleti et al. intimated they expected their results, which implies they were thinking causally about their findings, about what causes a person to be for or against capital punishment. But significance cannot answer the causal question: is it because the student was a social work major that she is against capital punishment? Significance cannot say why there was a difference in the data, even though that is the central question, and is why the study was conducted. We do not need significance to say if there was a difference in the data, because one was directly observed. And significance cannot say if we’d see a difference (or calculate the probability of a difference) in a new group of students.

There was nothing special about Pasupuleti’s study (except that it was easy to understand): any research that invokes statistical significance suffers from the same limitations, the biggest of which is that significance does not cannot do what people want it to, which is to give assurance that the differences observed will persist when measured in new data.

Statistical significance, then, is insignificant, or insufficient, for use in answering any practical question or in making any real decision. Why, then, is significance used? What can be done instead? Stick around.

Update: Dan Hughes sends this story, a criticism of a study that purports the “statistical significance” of increased NIH funding and decreased death rates.

July 7, 2009 | 45 Comments

Randomness isn’t in charge of anything: the “hot hand” in basketball

The Wall Street Journal is helping Leonard Mlodinow tout his book The Drunkard’s Walk: How Randomness Rules Our Lives. Among other things, Mlodinow, like academics Tversky, Kahneman, and Gilovich before him, wants to show that streaks in games like basketball don’t exist. Or, rather, they do exist, but they can be “explained by randomness.”

Listen: randomness can’t explain anything.

Statisticians imagine—I choose this word carefully—a basketball player has an ineffable probability of making a free throw, and they try to guess the probability’s value through modeling. Suppose a guess is 80% for a particular player and then suppose our player has just made his last 10 shots. A fan might say our man has a hot hand. Mlodinow:

If a person tossing a coin weighted to land on heads 80% of the time produces a streak of 10 heads in a row, few people would see that as a sign of increased skill. Yet when an 80% free throw shooter in the NBA has that level of success people have a hard time accepting that it isn’t. [Tversky and others] showed that despite appearances, the “hot hand” is a mirage. Such hot and cold streaks are identical to those you would obtain from a properly weighted coin.

This statement is confused. Each time a “properly weighted coin” is tossed something makes it fall heads or tails, some physical cause. “Randomness” does not make the coin choose a side. Spin and momentum cause it to land on one side or the other. There is nothing “random” in a coin toss: there is only physics. If you knew the amount of force propelling the coin upwards, and the amount of spin imparted, you can predict with certainty the outcome of the flip. (Persi Diaconis and Ed Jaynes—both non-traditional statisticians—have written multiple papers on this subject.)

“Randomness” is not a physical property; it does not exist inside the coin. Mlodinow acknowledges this in the words “weighted coin” used to describe his thought experiment. He is aware implicitly that modifying a physical property of a coin like the weight changes whether it shows head or tail. But he fails to realize that there is no difference in philosophy between changing the weight or modifying the spin or the momentum. Like Nelson reading the signal flags, he has turned a blind eye to the physics and has taken refuge in “randomness” to explain how the coin behaves.

Similarly, something, some physical—and biological and mental—process is causing the basketball player to make his shot. Again, the spin, the momentum, the aim, and the mental pathways that give rise to those properties are what determines whether the shot falls through the hoop or misses.

Our man has made his last 10 and is setting up for the 11th. Now the fans behind the basket distract him, or maybe he starts thinking too much about the shot, or there is excess sweat on his finger, or whatever, but he misses his shot and his streak ends. Randomness does not explain why the streak ends: physics and biology do.

Random means unknown and nothing more. Before the player takes the shot, or Mlodinow flips his weighted coin, we do not know what the outcome will be because we do not know what the values of the physical properties that determine the outcome are: it is these properties that change from shot to shot and from flip to flip that cause the different endings. If we did know the physics—like we can if we practice with coins—we can predict the outcome. That is, the outcome becomes certain, or known, and is therefore not random.

Our knowledge any outcome depends on what information we condition on. What might be random to you might not be random to me if I have different information than you. For example, right now my cell phone is either in my left- or right-hand pocket. To you, the outcome (finding out which pocket), is random because your conditional information consists solely of the knowledge that it might be in one or other pocket. The information I condition on allows me to know with certainty the outcome.

Same thing in basketball: if we knew what amount of force a player uses etc. we can predict whether his shot will go in. But that sort of information is hard to measure, so we look for proxies, like statistical models of the player’s past performance. Conditional only on those models, we can say “There is an 80% chance the next shot will go in.”

If the 11th shot is sunk, our man’s streak continues. The mental state of the player certainly played a part in that shot and so did his “hot hands.” Because we cannot predict who will have a hot hand or when, does not mean that hot-hand streaks do not exist. We should not mistake our models for reality.