Feb 04 2012

On Corrections In Science

Max PlanckSomebody attributed to Max Planck, a constant1 source of wisdom, the saying that science advances funeral by funeral. This is a pithy condensation of his more famous quotation:

A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die and a new generation grows up that is familiar with it.

Every scientist laughs along with Planck, thinking to himself how silly were those people of yore who refused to believe what was now so obvious. When scientists gather they often tell each other cautionary tales about the simpleminded stubbornness of their forefathers. Things were different then, always then. Not for a moment does it cross their minds that Planck’s wisdom could possibly apply to them.

Yet Planck was wrong: at least, if he meant that nobody ever changes his mind. Some do. But only very, very few. Einstein famously did not change his. And it is no refutation to say that perhaps Einstein will be right after all, because that would imply that Einstein’s intellectual enemies were wrong not to have changed their minds.

It is also clear that Planck had in mind foundational questions. The more a new idea conforms to whatever the current consensus in science is, the more likely it will be accepted. The new idea says, “What you believe is indeed so”, which is comforting. But the stronger a novel philosophy thumps the base of the Old Way, the more vociferous the opposition. It is saying, “You are wrong,” words few can stomach. The Wegeners and Semmelweises who arise occasionally must expect their thrashings.

Once more we have Planck:

The man who cannot occasionally imagine events and conditions of existence that are contrary to the causal principle as he knows it will never enrich his science by the addition of a new idea.

It is true that new foundational ideas are radical departures, as Planck suggests, but then so are the flood of crank theories that wash over science. An idea’s novelty is thus not an argument in its favor. To think it is is to employ what Philosopher David Stove called the Columbus argument. They did all laugh at old Christopher, and laugh wrongly, but they were right to dismiss the vast majority of novel thought.

We often hear—it is part of the standard propaganda folder—that science is self-correcting. Is it? Well, this statement is either always true, or it is always false, or somewhere in between. If we claim it is always true, we claim too much, because it is to claim all wrong ideas will always be corrected, and where is the proof for that? If you believe this, you do so on faith and in opposition to history. After all, science has at times not progressed but actively regressed. So how do we know that some of our beliefs will never be challenged successfully? It is logically possible that we hold certain ideas that are false but we can never prove false.

Then it cannot always be false that science is self-correcting, because, as is obvious, science has often progressed. So it must be somewhere in between: science often but not always and not in all places self corrects. And this says nothing about the rate at which science self corrects. For trivial, small facts, the correction is quick, as any working scientist will tell you. Yet as Planck told us, self correction is painfully, even fatally, slow for foundational ideas.

The test for regression, the opposite of self-correction, appears to be how closely aligned a science is to politics. Trofim Lysenko leaps to mind as the man who halted biology and ordered it to about face, and then marched it along the path dictated by his socialist masters. On a smaller scale, there was that infamous bill proposed (but not passed) in Indiana which would make it the law of the land that the circle could be squared (and thus the value of pi should be changed). In recent years, we have had a spate of frauds which otherwise would have been caught had the results the frauds put forth not been what their audience wanted to hear.

Science is like the branch of a tree that twists and grows in the direction of the strongest sunlight and nutrients, i.e. money. But not only that. Scientists are just people and they like to get along with others, especially colleagues. They will thus often hold an idea more strongly just because others hold it, too. Which brings us right back to Planck.

——————————————————————–

1There’s got to be a better way to put these in.

No responses yet

Feb 03 2012

Should Scientists Be Held Legally Responsible for Their Results?

That title is lifted from Popular Science’s brief article. The idea is that scientists—as philosopher Christopher Essex reminded me, just like doctors and accountants and businessmen and engineers and everybody else who offers opinions for consideration already do—should put their money where their pronouncements are. Sue them

Pop Sci reminds us of how Bernardo De Bernardinis, the vice-director of Italy’s Department of Civil Protection

told reporters that citizens should not worry, and even agreed with a journalist who suggested that people should relax with a glass of wine.

Six days later, a major earthquake struck L’Aquila, a city in Abruzzo, killing more than 300 people. Soon after, citizens requested an investigation into the panelists’ findings, and the public prosecutor obliged. De Bernardinis and the panelists were charged with manslaughter and now face up to 15 years in prison. The L’Aquila judge who determined that the case could go to court said the defendants provided “imprecise, incomplete and contradictory information” and effectively “thwarted the activities designed to protect the public.”

That’s a lotta years! All for a busted forecast. (We covered the story here; it’s more complicated than the quotation suggests.)

And then, “In 1989 a scientific advisory group reported that it was unlikely that BSE could be transmitted to humans. Through the early 1990s, government ministries reassured the public that it was safe to eat homegrown beef.” It wasn’t. People died, grief ensued, but suing lawyers were not unleashed. (Government bureaucrats were instead.)

South African lawmakers are proposing to make forecasting the weather illegal unless one has a license to do so (link). Easy to scoff at this one, since as Mark Twain said, etc. Then, think how seriously weather forecasts are taken in, say, Oklahoma. Somebody there says a tornado’s a comin’, and people take action. Expensive action, too. So the forecaster better be damn sure of himself. And woe betide the weatherman who says all is well when it isn’t.

But sue him for a mistake? Well, why not? Much as we hate to encourage unslakeable lawyers and their symbiotic gelatinous bottom feeders like JG Wentworth who will buy out (among other things) structured settlements won in law suits, these scavengers (sometimes) provide the useful function of eating the necrotic flesh of capitalism.

Instilling the fear of buzzards into scientists might sharpen their wits. It might for instance stem the flow of purple, hyperbolic prose and Chicken Little-ing from the environmentalist crowd if they knew that their words were going to be checked against the facts.

But how do we decide who gets sued? Should we sue those guys who a few months ago predicted, via statistical modeling, that the neutrino had no mass? Should we pull to the bar the group who got us all excited, via statistical modeling, that the Higgs was finally found, but who were (probably) premature? What about all the champagne that was bought and probably consumed after the first happy but ultimately wrong announcement? Should the crew who manned the accelerator be legally responsible for the bill?http://wmbriggs.com/blog/wp-admin/post-new.php

Can we sue those fellows who swore that brief exposure to a 72 × 45 pixels American flag turned people into Republicans? They learned this “fact” via statistical modeling. And what about all those especially earnest folks at the EPA who will protect us no matter what, who create statistical models aplenty proving that exposure to some barely detectable chemical will increase our risk of cancer from 20% to 20.001%? Can we haul them off to jail after it proves that their fretting was false?

How about the climatologists who swore, by golly and by gum, that the temperatures now should have been warmer than they are? Tar and feather them, litigationally speaking? After all, lots of money was spent believing these forecasts were accurate. Who should pay now that we have learned that they weren’t? We needn’t arrest James Hansen, incidentally, because he’s developed the habit of hauling himself off to jail from time to time.

Why, if we were allowed to sue scientists for failed predictions, the courts would have to run twenty-four hours a day, every day of the year, even Christmas, and that would be just for the sociologists, like those guys who claimed, via statistical modeling, that brief exposure to a 4th of July parade turned people into Republicans. We’d have to build special holding pens for the climatologists.

Scientists have a special right to be wrong, don’t they? They’re better than ordinary people somehow, aren’t they? If we held them accountable, they might be too scared to think of new theories. And the world always needs new theories. And, hey!, somebody might even sue me!

————————————————-

Thanks to an anonymous reader for suggesting this topic.

See this real-life possibility of a meteorologist being sued from Brazil, a case where lots of money was involved.

16 responses so far

Feb 02 2012

On Global Warming Apoplexy: Temperature Trends

Published under Politics,Statistics,Wx & Climate

It is a sure sign that Sanity has packed her bags and headed for the door when otherwise sober scientists begin slinging around terms like “denier” and “denialist.” Language like this displays willful, pretended, or real ignorance of the historical context of these words. Anybody who talks like this makes himself an ass. They’s fightin’ words which start any discussion on an angry footing, their presence a certain indication we are dealing with zealotry, not science.

Let’s look again at the claim made by the scientists at the Wall Street Journal, over which many have popped their corks:

The lack of warming for more than a decade—indeed, the smaller-than-predicted warming over the 22 years since the U.N.’s Intergovernmental Panel on Climate Change (IPCC) began issuing projections—suggests that computer models have greatly exaggerated how much warming additional CO2 can cause.

There are two claims made here. Given the observational evidence we have, both claims appear true. The first (A) is that for the last ten years it has not grown warmer. Since it has grown warmer in some places and colder in others, this is evidently a claim about some global average and not any individual station. The second claim (B) says that the IPCC forecasts have been systematically too large: it is also concerned with some global average.

Both of these claims are quantitative and subject to easy verification. A person’s politics surely has no bearing on whether they are true or false claims. Now, the “global average” referenced is not a static thing, in the sense that, say, measurements from identical (and identically situated) thermometers at fixed locations are averaged together and called (arbitrarily, of course), the global average. Instead, the global average as it is operationally defined mixes sources and locations freely each year (and even within years). Therefore, when the “average” is computed there will be some uncertainty in it. Further, the uncertainty is larger in times historical than in times present. (There is even some uncertainty at individual locations, because no measurement apparatus is perfect, but this is generally small, though not always, especially in the past or when using proxies: see this series.)

The BEST people, for instance, recognized this and attempted to account for measurement uncertainty by speaking not just of averages, but of averages plus-or-minus. We can, and I did, argue over the better way to calculate and display this uncertainty. All we need to understand here is that some techniques underestimate this uncertainty. Actually, we don’t even need to agree about that: but we do need to see that some uncertainty is present, however small.

This is necessary because if we make claim (A), as the WSJ fellows did, we need to take uncertainty over the global average into account or we cannot know whether the claim is true or false. It is at this point when a lack of understanding of statistics can become a real hindrance. Sloppy language also hurts immeasurably. Let’s work through this slowly.

Suppose we have ten years of uncertainty-free global average temperature measurements. We can line them up and ask questions of this series. Was the temperature ten years ago warmer or colder than the temperature this year? All we have to do is look: it will be true or false at a glance. Was the temperature nine years ago warmer or colder than this year? True or false at a glance. And so on.

What does this mean in the context of claim (A)? Well, (A) says that temperatures have not gone up over the last decade. To verify this, all we need do is look to see if any of the temperatures of the last decade are lower than they are this year. If any are, the claim is false. If none are, the claim is true.

Maybe. Because claim (A) can also be taken to mean that at no time over the last decade have the temperatures increased (they could have stayed constant from year-to-year). Again, we can verify this claim with a glance at the data.

Which of these definitions is right? Evidently neither, because we all understand that the temperatures have some uncertainty in them. Because of that, we cannot just look at the data to say whether it has gone up or down; we instead have to speak of changes in probabilistic terms. And that means hauling in some kind of model.

The simplest (but not so good) model is to imagine each year’s data is irrelevant to knowing each other years’ data. That is, we take this year’s data and display it as an average with so, a plus-or-minus attached to indicate our uncertainty in it. That plus-or-minus can only come from some kind of probability model, meaning that the range of uncertainty will change when the model changes. Which is the best and most proper model? Nobody knows. But let’s imagine we all agree on one, such that displayed before us is a temperature series of averages and plus-and-minuses.

Now, if claim (A) means that temperatures this year are less than or equal to temperatures ten years ago, then we can make a comparison as before, but our comparison will be accompanied by a measure of uncertainty. Using predictive techniques (yes, this is the proper word: see this series), we can ask questions like, “Given the data and assuming our model is true, what is the probability this year’s temperature is less than or equal to temperatures ten (or nine, etc.) years ago?” Notice that this is not the same as a “t-test” or any other kind of statement about parameters of probability models: it is a statement about observable temperatures.

Or, if claim (A) means that temperatures did not increase even once over ten years, then we can get the probability of this just as simply. In support of either version of claim (A), I said that we cannot know with probability greater than 90% that temperatures have increased (over this last decade). In other words, it is likely that claim (A) is true.

This is so using the probability model I indicated. But what if we instead change the model to a linear regression—i.e. a straight line—drawn through the data? Well, we could go through the same steps and ascertain claim (A) in light of this model. But before we can begin we have several things to decide. Why a straight line? Just because it’s easy? Lazy, that. From what year do we start? See this post for the ways that choice can lead you wrong. Do we start with a date (as I joked) in the Jurassic? Or, for fun, in 1973? Every different start date will give a different answer. I will repeat that: every different start date will give a different answer. It is also a stretch, to say the least, to assume temperature always has been increasing in a straight line from whatever start date we pick. (Before the politicization of this subject, every physical scientist would have agreed with that last statement.)

But suppose we do agree on a date: 1964, say, a very fine year. Are we done? No, because we cannot forget that the data that goes into the straight-line model is still measured with uncertainty. We must, just as we did in the first model, account for this uncertainty. That means drawing any kind of naive line (even bold red ones) guarantees over-certainty.

Even if we were to agree on a date—in real life we do not—we could use a model of the measurement error, incorporate that into the model of straight-line change, and then assess claim (A): it is still probably true.

The best thing to do is to model the data in an intelligent way, taking into account the correlations of year-to-year (both auto-regressive and moving average), the measurement error, etc., etc. Hard work! As Doug Keenan has pointed out (often), it’s too much like work for anybody to do. I’d do it myself, but my check from Big Oil hasn’t yet arrived.

Whatever else you do in life, you must not, you must never, look at the pretty red (or blue, etc.) straight line you have just drawn and claim it is, or think of it as, the real data. (It is only in climatology where I have seen scientists forget error bars, and then pitch a fit when somebody points out the omission. You at least have to put predictive, and not parameters-based, error bars on the line, even ignoring measurement uncertainty of the data.)

What about claim (B)? Also likely true, as is generally recognized. We still have to incorporate the uncertainty in the global temperature measurements—there is no or little uncertainty in the forecasts—but this is no different than before.

What about the counter-claim (C) that the 2000′s where the “warmest years on record” or the like? It is trivially false. The 2000s simply were not the warmest. Four billion years ago, Earth was much hotter. “Wait! It’s obvious we weren’t talking about billions of years ago. Cheater! Denier!” Well, it isn’t obvious. What years did you have in mind as comparators? Ah, that’s the real question, isn’t it.

Did we mean just the last century? The last 1000 years? The last 10,000? What? You must supply a starting year. To make the claim (C) that it’s hotter now than before, you must tell us what you mean by before. If you say “before” means the last ten years, then claim (C) is identical with claim (A). If you say the last 200 years, then you have to do what BEST tried and incorporate the non-parameter error bars, otherwise there is no way to compare what happened a century ago with what happened last year. Obviously, the further you go back, the larger those uncertainty bars become, therefore the more difficult it becomes to claim (with any certainty) that now was hotter than then.

As I often say, over-certainty abounds in this field. People speak of models (statistical and physical) as if they were truth, as if the data that goes into them were granted some kind of special immunity from ordinary criticism. And when the critiques come, that’s when the asinine language breaks out. All sense of humor evaporates.

You would think that because both claims (A) and (B) are likely true (and claim (C) is unproved or likely false) that we have found a reason to celebrate! Perhaps our worst fears won’t be realized after all. This is good news! Wouldn’t it be great if we really did over-emphasize feedback in climate models and that whatever changes we do make to the climate are easily mitigated and not as horrific as posited?

Why so glum that things are so good?

Update See this cartoon which shows that the IPCC has been known to employ the technique of variable start dates.

106 responses so far

Feb 02 2012

How To Maximize The Chance Of Winning The Office Super Bowl Pool

Published under Fun,Statistics

Forget climatology. It’s time for something really controversial. How to fill in those grid squares on the Super Bowl office pool. An example of one is shown below.

The full details are over at Edgehogs. Be sure to also make your picks for the game itself. The most especially prescient personage will pull home an electronic gizmo gratis.

Super Bowl Grid

Super Bowl Grid

Sometimes you get to see the labels on the grid rows and columns first, but sometimes you don’t and those labels are written in after everybody has bought their box. If you get to see them first, then use the strategy outlined. If you don’t see them until after, then at least you will be able to help figure your chances before the game.

Not surprisingly, it turns out some squares were seen more often than others. The green X’s show the best locations, and the red O’s show the worst.

Perhaps we should have some sort of poll: how many will watch the game? how many watch more for the commercials? how many genuinely follow the game? how many remember that spring training, and therefore a return to real sports, is only a month away?

Or maybe we should have a game-day temperature forecast contest?

9 responses so far

Feb 01 2012

Bad Astronomer Does Bad Statistics: That Wall Street Journal Editorial

Published under Statistics,Wx & Climate

Remember when I said how you shouldn’t draw straight lines in time series and then speak of the line as if the line was the data itself? About how the starting point made a big difference in the slope of the line, and how not accounting for uncertainty in the starting date translates into over-certainty in the results?

If you can’t recall, refresh your memory: How To Cheat, Or Fool Yourself, With Time Series: Climate Example.

Well, not everybody read those warnings. As an example of somebody who didn’t do his homework, I give you Phil Plait, a fellow who prides himself on exposing bad astronomy and blogs at Discover magazine. Well, Phil, old boy, I am the Statistician to the Stars—get it? get it?1—and I’m here to set you right.

The Wall Street Journal on 27 January 2012 published a letter from sixteen scientists entitled, No Need to Panic About Global Warming, the punchline of which was:

Every candidate should support rational measures to protect and improve our environment, but it makes no sense at all to back expensive programs that divert resources from real needs and are based on alarming but untenable claims of “incontrovertible” evidence.

Plait in response to these seemingly ho-hum words took the approach apoplectic, and fretted that “denialists” were reaching lower. Reaching where he never said. He never did say what a “denialist” was, either; but we can guess it is defined as “Whoever disagrees with Phil Plait.”

The WSJ‘s crew said, “Perhaps the most inconvenient fact is the lack of global warming for well over 10 years now.” This allowed Plait to break out the italics and respond, “What the what?” I would’ve guessed that the scientists’ statement was fairly clear and even true. But Plait said, “That statement, to put it bluntly, is dead wrong.” Was it?

Plait then slipped in a picture, one which he thought was a devastating touché. He was so exercised by his effort that he broke out into triumphal clichés like “crushed to dust” and “scraping the bottom of the barrel.” You know what they say about astronomers. Anyway, here’s the picture:

Global warming

See that red line? It’s drawn on a time series—wait! No it isn’t. Those dots are not what Plait thinks they are. They are not—they most certainly are not—global temperatures. Each dot instead is an estimate of global temperature: worse, most dots are also different kinds of estimates from each other. That is, the first dot was estimated using data X and method A, and the second dot was estimated using data Y and method B, and so forth. Well, maybe the first and second dot were the same, but older dots are different than the newer ones.

With me so far? All you have to remember is these dots are estimates, results from statistical models. The dots are not raw data. That means the dots are uncertain. At the least, Plait should have shown us some “error bars” around those dots; some kind of measure of uncertainty.

Now—here’s the real tricky part—we do not want the error bars from the estimates, but from the predictions. Remember, the models that gave these dots tried to predict what the global temperature was. When we do see error bars, researchers often make the mistake of showing us the uncertainty of the model parameters, about which we do not care, we cannot see, and are not verifiable. Since the models were supposed to predict temperature, show us the error of the predictions.

I’ve done this (on different but similar data) and I find that the parameter uncertainty is plus or minus a tenth of degree or less. But the prediction uncertainty is (in data like this) anywhere from 0.1 to 0.5 degrees, plus or minus. That is, prediction uncertainty is about five times larger.

I don’t know what the prediction uncertainty is for Plait’s picture. Neither does he. I’d be willing to bet it’s large enough so that we can’t tell with certainty greater than 90% whether temperatures in the 1940s were cooler than in the 2000s. And also such that, just as the WSJ‘s scientists claim, we can’t say with any certainty that the temperatures have been increasing this past decade.

In other words, the scientists were right and Plait was wrong. Or, as he might phrase it, he blatantly misinterpreted long term trends. Notice old Phil (his source, actually) starts, quite arbitrarily, with 1973, a point which is lower than the years preceding this date. If he would have read the post linked above, he would have known this is a common way that cheaters cheat. Not saying you cheated, Phil, old thing. But you didn’t do yourself any favors.

Somewhat amusingly, Plait ends his semi-random venting by telling us that Michael Mann has been “tweeting furiously” about this. Good grief! This isn’t helping his case. Mann’s understanding of statistics may be likened to an overly enthusiastic undergraduate who left the lecture early.

———————————————————————————

1I’m here all week.

P.S. Hey, Phil. Since you brought it up: the total consideration I’ve received for my work in global warming from Big Oil (or anybody) is number so small that dividing by it is forbidden. How much do you get for your blog or other environmental work, including government funds?

P.P.S. I didn’t forget about that “warmest years on record” stuff. Those “warmest years” are still estimates and have to be compared to the old data, which itself must be accompanied by uncertainty measures. And anyway, it has been much hotter in the past than it is now. Jurassic anybody?

Update Thanks for all the comments, everybody! 100+ and no signs of slowing. I will read them in all, in time, but for now, since many of them repeat odd claims and misunderstanding of statistical methods, let me point you to the BEST project posts (here and here). BEST had parameter-based error bars, but not predictive ones. But some acknowledgment of uncertainty is better than none! Also look under the Start Here tab and pay attention to the smoothing time series posts, the homogenization of temperature series posts, and read this weeks’ All of Statistics series. You may also read, inter alia, the Probability Leakage post which describes the Bayesian predictive approach I am using. A lot of confusion and frank unfamiliarity for some of you.

Update to the Update See this brand-spanking new post that clarifies some of the statistics some of you couldn’t be troubled to look up.

Update See this cartoon which shows that the IPCC has been known to employ the technique of variable start dates.

148 responses so far

Jan 31 2012

All Of Statistics: Part III

(B) New data

It might surprise you, but in classical (both frequentist and Bayesian) practice, if we expect to see new X, the procedure is almost always no different than the procedure when we expected no new data. That is, an M = Mθ is proposed, calculations are done, certain θ are set to 0, and Mθ’ is then said to describe X, finis. In the vast majority of cases of statistical analyses, Mθ’ is just assumed true; discussion centers around the parameters, and uncertainty all but disappears.

Contrast this with the procedure physics, chemistry, or even mathematics usually follows. Some evidence E is used to proposed a limited set of M—usually a historical M0 and one or more new theories, M1, M2, etc. These are all, as in classical statistical practice, assessed in light of the historical X. These M also sometimes have unobservable parameters (think of Planck’s constant, etc.) which are guessed using statistical methods. Discussion occurs over these parameters, but only when M has been verified (to some extent) by its “closeness” to historical X.

In many of the physical sciences, the analysis does not stop at discussing the models’ closeness to historical data, nor is the focus just on the parameters (usually). These sciences instead use the models to predict new data: these predictions will say that new X, given each M, will take certain values at such-and-such probability. It is usually the case that the probabilities of these new observations differ for each model (if they did not, the models cannot be distinguished).

Time passes, new data is collected, and the models are assessed in light of the predictions which were made. The models are then ordered by how well they predicted this new data. “How well” is a subjective measure: it can and does differ, meaning that models might be useful to some but not to others. Verification can be done formally, as in statistics, by calculating the probability each model in the set is true, in light of the new X, old X, and the given E. But usually, this ordering is done informally (this informality does not invalidate the findings; when I opened I claimed not all probabilities can—nor should—be quantified).

These new models are not always accepted; often they are rejected (even mathematical proofs are sometimes found to have flaws). Perhaps newer still models arise from the ashes of these rejects, but these phoenixes are subject to the same pitiless confirmation process. This procedure has worked out rather well for these fields (excepting climatology, for its lack of verified forecasts). We are not certain sure each physical model is true, but most of them are very probably sure.

Now consider the so-called softer sciences like sociology where the situation is markedly different; classical statistical procedure (both frequentist and Bayesian) is used as if no new data were expected, as explained. Because the models are never tested to make predictions, the models proposed by individuals are taken as true. The data is used, at best, to say something about the unobservable parameters of M. Over-certainty abounds.

The conjectures in these fields are rarely put to the test of verification. When new data is anticipated or is collected, the statistical procedure begins anew, as if the old data did not exist. The form of the model is the same, and discussion again centers on parameters. Worst of all, the certainty that is felt to lie in the parameters is said to lie in any new data that is expected. If new data is sought it is often collected only to confirm the M. This search is usually rewarded, not necessarily because the M assumed true are true, but more because of the wisdom in the saying, “Seek and ye shall find.” Confirmation bias creeps in and sticks to everything.

Contrast again the situation in the physical sciences. New data is sought that will confirm the M, but also sought is data that would disconfirm or invalidate the M. I need only say the words “cold fusion” to show how rigorous and routine this process is. This search does not happen, or happens rarely, in the soft sciences: people there are comfortable sticking with their preconceptions. Because they expose their models to new data, the physical sciences are usually (a word which implies “but not always”) trustworthy: ships float, cameras take pictures, lasers cut, and so on. The soft sciences do not have such a fund of success to point to.

The one area of statistics in which future data is considered is time series, where it is acknowledged from the start that X is part of a stream of data. Unfortunately, the procedure differs little from ordinary statistics except that it is acknowledged that the models belong to a more limited class than in ordinary statistics. Discussion still centers on (and ends with) the parameters (see this post for what can happen). The models can be, and are to a greater extent than usual, put to the test, but not still not often. The models are just assumed true, the parameters are said to be “it.”

All statistical procedure should be seen as “time series;” at least, when new data is expected, but in the way the physical sciences treat old and new data. Models should be put to rigorous, unforgiving tests of validation. Except when absolutely necessary (which will be rare times indeed), discussion should move away from parameters and focus on uncertainty of actual observables (or testable conjectures). This is the only way to eliminate over-certainty.

2 responses so far

Jan 30 2012

All Of Statistics: Part II

Published under Philosophy,Statistics

(A) No new data (cont.)

If we want to know how that data arose, and we are not satisfied by X itself, we need to propose a model—a fully causal to fully probabilistic, to somewhere in between, M. This puts us in a jam because, for any X, there will not exist a unique model which explains X. That is, for any X, we can always create any number of M which explain X; for any X, we can always invent an explanation M (from fully to partially causal) for why X took the values it did. It matters not how fanciful M is compared to evidence not in X—in relation to some E not used to infer M—it only matters that such M exist (you could always say M = “Venusians caused X”, which to many E is absurd).

Anyway, in classic (frequentist and Bayesian) statistics, an M is proposed. We now have a problem, because if our model is indexed by parameters, M = Mθ, we have to supply a guess for the θ (possibly multidimensional). We usually provide this guess by using the X itself; but this is not necessary and a guess can be supplied via external evidence or subjectively.

Frequentist theory often begins (and ends) with a “plug in” guess of the parameters. The truth of the model is assumed, and inference about X is made indirectly by discussing the parameter guesses as if the guesses were certain. More often, a subset of the parameters is set to a subjectively chosen predetermined value; usually at least one of the θ= 0, but any number besides 0 may be (subjectively) chosen. It then computes

     (2) Pr( T(X) | Mθ[0] ),

where Mθ[0] indicates the model with the predetermined value of the parameter(s) supplied and T() is any function of the data (T(X) is also a proposition). The function T() is subjectively chosen and is not unique; for any given Mθ and X, there are any number of T() that can be used, with each T() giving different answers to (2). This equation is called the “p-value”; thus p-values are not unique and are a function of the base model M, the values substituted into the parameters, and the “statistic” T().

Now, if (2) is (subjectively) thought “too large”, the guess of θ is then “confirmed” and then formally substituted into Mθ. Usually this means setting the relevant θ = 0 (but again, any number may be used). Surprisingly, this setting parameters (in the fixed M) to the pre-chosen values is the end result or goal of frequentist analysis. This result of this operation is said to explain X; that is, the discussion focuses on whether the unobservable θ were set to 0 or not.

Bayesian statistics inverts (1) and computes

     (3) Pr(Θ | X & M),

where the M is taken to be fixed except for the value of the parameters, and Θ = “θ takes a specified value.” This is called the “posterior” and it may be derived in a formal way.

It is at this point that the typical Bayesian analysis matches frequentist procedure. That is, if in (3) some of the Pr ( |θ| > c | X & M) (where c = 0, typically) are “small”, then these θ are set to some (subjectively chosen) predefined level c (0 usually). Needless to say, what is “small” is subjectively chosen.

Once again, the M is taken as fixed and the goal is to say which of the θ should be set to their predefined levels (usually 0). The slight advantage the Bayesian analysis enjoys are two: (one) it eliminates the arbitrary step of choosing a T(); and (two) it allows probability language in discussing the parameters. But, in practice, at least for common problems, the Bayesian and frequentist end result is the same or similar, an Mθ whittled down to some Mθ’ where cardinality(θ) > cardinality(θ’).

To clean up loose ends, both theories will sometimes “tack on” a guess of the remaining θ, but this is usually a half-hearted effort. Probably because these guesses can never be checked (parameters cannot be observed). Anyway, X is said to be “explained” by Mθ’.

Recall that we are still in the case that we expect that no new X will obtain. We are using M to say how the only X we will have arose. We subjectively pick an M and then, if it is indexed by parameters, we go through a procedure to set some of these parameters to predefined levels, usually 0. We then announce to ourselves that our theory of how X arose is true or false depending on whether certain θ are set to 0 or not. Again, the Bayesian theory enjoys a slight advantage because it allows us to say with what probability these θ are near the predefined levels. Frequentist theory just states they are zero, period.

These analyses both assume the truth of M, which you might recall was what we wanted to know in the first place. Remember we already knew X and we were after the “best” M which explains X. But since there is no unique M that is “best”, we just have to (subjectively) pick some M, and we are left playing with its parameters. We picked an M and set some of its parameters to 0. The Mθ’ we are left with is said to be true. Since we will see no new data, we will never be able to confirm this.

Now this conclusion would be the same if we had started with a different model (necessarily with different parameters). This new model with a reduced set of parameters would also be claimed as the true explanation of X. There would be no way to check this claim, either.

We could on ad infinitum, claiming each new model is the “true” explanation of X. Remember: we can’t use how well any M from this inexhaustible list explains X, because we can always find many M which explains X perfectly, or to any level of closeness we desire.

So unless we are in a “jury trial”-type situation, where we have a strong E which delineates the set of rival models in advance, if we do not expect new data, there is no solution to finding “the” model which best explains X. Or, rather, the solution is to fix E (independently of X) so that the set of models is fixed in advance. But even then, unless we coalesce on one model which, given X, is true “beyond a reasonable doubt” there will always exist, well, reasonable doubt about which model is true.

Next time: new data.

3 responses so far

Jan 29 2012

All Of Statistics: Part I

Published under Philosophy,Statistics

Statistics is the collection and modeling of data. By “modeling” I mean using probability to describe our uncertainty in values that data may take. Statistics, then, is applied probability. Probability is the quantitative branch of epistemology. Data are propositions of the sort, “We observe X to take the value x,” where X is usually some tangible, real-world object. We use probability to quantify the chance these propositions are true, i.e. that X takes the values x.

When we observe data, we assume that something caused this data to take the values it did. We have from no to full knowledge of this causality, depending on the circumstance. We call this knowledge a model, which may be anywhere from purely mathematical-logical to completely probabilistic. If our model is purely mathematical-logical, the values the data will take are rigidly determined; there is no uncertainty. If our model is completely probabilistic, the values the data will take are unknown to a specified extent. Most models are somewhere in between. This is general and applies to models of electrons to electorates.

Call the model for your data (X) at hand M, where X = “The data takes a specified value x.” Probability is used to say things like this:

     (1) Pr(X | M);

that is, given the model, this is the probability the data takes certain values. If a rival model is proposed, it is not guaranteed that Pr(X | M1) equals Pr(X | M2) for all the possible values X can take, but even if these probabilities do match it could be that M1 and M2 are not logically equivalent.

It is extremely important to understand that the choice of the model is subjective. That is, there may be external evidence about X (call it E; evidence which is not X) which dictates a form or partial form of M, but in practice people are free to choose whatever M they wish. This is because the E that (supposedly) gives credence to M is also subjectively chosen. That is, we usually reason Pr(M | E) = 1. Nevertheless, however M is decided, the probability statements (1) are fixed, true, and are not subjective.

Now, it is the case in formal statistics models are usually indexed by unobservable parameters. Values have to be supplied for these parameters before equations like (1) can be calculated. That is, for a fixed M, indexed by parameters, equation (1) takes different values for every different value of the parameters.

There are now two situations: (A) no new data will ever be taken, and (B) new data will be taken. By “new” I do not necessarily mean data that will arise in the future, though this is the most usual case; “new” is data that was not used before.

(A) No new data

If no new data will ever be taken, we again have two possibilities. We might want to know how the data arose, or we might have competing models that we want to assess in light of X.

Now it might make sense to ask how X arose. But it might not, either. After all, if no new data is coming, then everything we need to know about X is in X itself. If we want to know how many of the X are less than this number, or greater than that, all we need do is look. Was X increasing? Just look. Was it decreasing. Just look. This approach has been greatly underused.

It is often the case that we have to decide which of a set of competing models is most likely given X. For example, a jury trial. We have two competing models, M0 = “The guy didn’t do it” and M1 = “The guy did it”. We use X (the trial data, evidence, and arguments) to compute

     (2) Pr( M1 | X ),

with the assumption that Pr( M1 | X ) + Pr( M0 | X ) = 1 (probabilities of models sum to one over any set of models). Notice that the set of M is chosen by us in advance, supplied by external evidence E (such as “There is a man in the dock who is on trial, and either he did the did or did not”).

The usual mistake here (in science, not courtrooms) is to assume that exact quantifications of (2) always exist. They surely sometimes do exist, but not always. Now, if there are an infinite number of Mi, then it is possible that (2) will equal 0 (the usual case). Thus, in order to make sense of the world, we need to impose finiteness and select from a limited number of explanations for any X.

Next: classical statistical procedure; clarifications, because I’m not entirely happy with the language.

One response so far

Jan 28 2012

To Be, Or Not To Be…Free: Sam Harris & Jerry Coyne On Free Will — Guest Post by Mariano Grinbank

Published under Culture,Philosophy

mariano-grinbank.jpgMariano Grinbank is a Judeo-Christian apologist who knows when to say he’s not sorry. See this video).

Lack of free will is a trope that is growing in importance in scientific circles. Men like Sam Harris travel to lectures and announce, “I have no choice but to tell you that you have no choice. If you don’t believe this, you’re foolish.” Grinbank identifies what he sees are flaws in arguments like this.

Skeptics of free will are welcome to submit rebuttals (if they choose to do so).

In March of 2012 AD Sam Harris will publish a new book titled, Free Will. He and Jerry Coyne have been stoking the fires of polemics in anticipation.

Sam Harris is known for his Atheist activism and is also a biased neuroscientist1. Jerry Coyne is also an Atheist activist and professor of ecology and evolution at the University of Chicago. These gentlemen represent the deleterious effects of turning Darwinism, and science in general, into world-views. Darwinism is supposed to be a theory about biology and science is a tool with which we explore the material realm. However, some turn these into world-views and thereby construct blinders.

The effect of these blinders is a restriction of thought: the opposite of freethinking. This is because anything that goes against, or is outside of the world-view parameters, is simply a priori ruled out. Thus, Sam Harris and Jerry Coyne take an Atheistic, materialistic, mechanistic, reductionist (by any other name) view of life, the universe and everything. On their collective views we and our brains can do none but blindly follow the dictates of the laws of thermodynamics.

Jerry Coyne refers to our brains as “meat computers” and in this case, you do not get a choice as to whether you get yours rare, well done, or anything in between. Now, there is a notable distinction between Coyne and Harris. They both claim that we do not have free will but Coyne claims that we do not make choices whilst Harris claims that we do.

Coyne emphasizes that even though we do not make free will choices, we are still morally culpable and judiciously accountable. He notes that we already make provision for personages who commit crimes whilst being categorized as temporarily insane or having some such mental incapacity (or, as per a Seinfeld episode, “differently advantaged”). Yet, the most interesting, and potentially troubling, conclusion is that while we, ourselves, cannot change our minds, as it where, outside influences can change them.

Jerry Coyne claims that we cannot “step outside of our brain’s structure and modify how it works” because “‘we’ are simply constructs of our brain” and that “We can’t impose a nebulous ‘will” on the inputs to our brain that can affect its output of decisions and actions.”

However, environment can accomplish it. The incarceration of criminals, for instance, “makes it less likely you’ll behave badly in the future.” So, we cannot change ourselves but environment, other people a.k.a. society and/or the government, can change us. How other meat computers can change our meat computers when we cannot change our own meat computer is something which does not compute.

How does incarceration make it “less likely you’ll behave badly in the future”? It would still come down to the individual as they would instigate or otherwise elicit some change in the meat computer that resides within its cranium—perhaps a touch of “Mrs. Dash” would do the trick.

Sam Harris concludes that our subconscious brain merely spits out “determined” data (determined, or predetermined, by the laws of thermodynamics) about which we then make choices via our conscious brain. But how is making conscious choices about unconscious data not free will?

We may experience an instinct to do this or that but we then chose the course of action. But what about instincts, in and of themselves, and/or reflexes? Well, some of these are learned such as when you burned yourself and seek to not do it again. Others appear to be more foisted upon us such recoiling from a hot object. Just where is the line between the rapid reaction of a reflex and a forced action? After all, you can keep your hand over a flame—if you so choose.

There seems to be a vast difference between a reaction such as a reflex, on the one hand, and purposing to ponder a course of action, on the other hand. We may sift through options but we have the experience of choosing between them. The Coyne/Harris retort of claiming that this perceived choice is a mere illusion is merely begging the question. As Sam Harris puts it when referring to the idea of lacking free will, “Most people find that idea intolerable, so powerful is our illusion that we really do make choices.”

This is tantamount to asking someone whether they have ever been abducted by aliens. If they respond that they have not, you then tell them that of course they have but the aliens erased their memory.

The concept of lacking free will is certainly not new nor is it exclusive to the Atheists who hold to it. However, Harris and Coyne are appealing to “science”; particularly, neuroscience. Not surprisingly, they conclude that what we think of as free will is brain stuff because we can see how segments of the brain light up in MRIs when we engage in what we think of as making decisions.

Well, neuroscience is a soft enough science so as to allow for malleable interpretations such as:

  • Portion X of the brain lights up when…
  • Portion X pertains to…
  • Therefore, the lighting up of portion X means that…

As William Briggs rightly noted whilst reviewing Sam Harris’ research “The Neural Correlates of Religious and Nonreligious Belief”:

Ignore religion and answer this: do the brains of the affronted and angry operate differently in those heightened states of emotion than in those who are placid, smug, or contented?

Could it not be that the “emotion centers” of the brain light up for Christians in this experiment not because they are Christians but because they have just been repeatedly poked by a sharp rhetorical stick?

The “sharp rhetorical stick” refers to the questions posed during Harris’ pseudo-experiment.

Now, of course, Harris’ and Coyne’s neuroscientific conclusions are tantamount to concluding that color and shape are merely brain stuff that does not exist out there in the real world because segments of our brains light up when we view color and shape.

What reason, really, is there to deny our common knowledge, our common experience and well, our common sense conclusion that we have free will? In this case, it is that some Atheists are interpreting lights flashing on a screen. Moreover, their interpretations are based upon materialism, mechanism, reductionism in short: based upon their particular, and peculiar, Atheistic world-views. But why should we believe that their world-view is accurate? After all, they claim that it cannot be proven and since they are making extraordinary claims they must provide evidence that is more extraordinary than expecting us to believe their personal interpretations of “data.”

———————————————————–

1He is referred to as biased because before becoming a neuroscientist he was asked “What do you believe is true even though you cannot prove it?” by Edge – The World Question Center and his response was:

What I believe, though cannot yet prove, is that belief is a content-independent process. Which is to say that beliefs about God—to the degree that they are really believed—are the same as beliefs about numbers, penguins, tofu, or anything else…

What I do believe, however, is that the neural processes that govern the final acceptance of a statement as ‘true’ rely on more fundamental, reward-related circuitry in our frontal lobes—probably the same regions that judge the pleasantness of tastes and odors…

Once the neurology of belief becomes clear, and it stands revealed as an all-purpose emotion arising in a wide variety of contexts (often without warrant), religious faith will be exposed for what it is: a humble species of terrestrial credulity. We will then have additional, scientific reasons to declare that mere feelings of conviction are not enough when it comes time to talk about the way the world is.

The only thing that guarantees that (sufficiently complex) beliefs actually represent the world, are chains of evidence and argument linking them to the world…Understanding belief at the level of the brain may hold the key to new insights into the nature of our minds, to new rules of discourse, and to new frontiers of human cooperation.

Thus, he comes into science already believing that which he seeks to prove, “What I believe, though cannot yet prove.”

77 responses so far

Jan 27 2012

Low IQ & Liberal Beliefs Linked To Poor Research?

Published under Culture,Statistics

Watch out Sam Harris, Gordon Hodson and Michael A. Busseri of Brock University are giving you competition for the worst use of statistics in an original paper.

Their “Bright Minds and Dark Attitudes: Lower Cognitive Ability Predicts Greater Prejudice Through Right-Wing Ideology and Low Intergroup Contact” published in Psychological Science1—headlined in the press as Low IQ & Conservative Beliefs Linked to Prejudice—is a textbook example of confused data, unrecognized bias, and ignorance of statistics.

Hodson and Busseri on are track to beat out Harris’s magnificent effort, and they might also triumph over the paper which “proved” brief exposure to the American flag turns one into a Republican and the peer-reviewed work “proving” exposure to 4th of July parade turns one into a Republican.

Let’s see how they did it.

The authors intimate that “individuals with lower cognitive abilities may gravitate toward more socially conservative right-wing ideologies that maintain the status quo and provide psychological stability and a sense of order”. They say that this “is consistent with findings that less intelligent children come to endorse more socially conservative ideologies as adults”.

How did they prove that idiots and conservatives are racists? They gathered two large data sets from the UK, one started in 1958 (NCDS), the other in 1970 (BCS); about 16,000 individuals in total, roughly equal numbers of males and females. The quizzed the groups when they reached 11 and 10 years old on their “intelligence”; they then came back to these individuals when they were 33 and 30 and asked them about their “socially conservative ideology and racism.”

The authors do not say how many people they used in their analysis; how many individuals were lost in the 20 years between surveys is not noted in their paper. My read of the NCDS website (pdf) makes the loss about 30%. That leaves about 11,000.

Intelligence was defined in one database as scoring well on matching the similarity between 40 pairs of words, and on matching the similarity of between 40 pairs of shapes and symbols. On the other database, this changed to drawing 28 missing shapes, recalling digits from 34 number series, identifying the definitions of 37 words, and “generating words that are semantically consistent with presented words” 42 times.

Thus the two samples measure similar but different abilities. The NCDS (pdf) also had available the Peabody Individual Achievement Test Math and Reading sub-scales which were not used as intelligence measures. Why?

When the kids became 33 and 30 year olds, they were asked whether they agreed with 13 or 16 questions like, “Schools should teach children to obey authority”, “Family life suffers if mum is working full-time.”

Another was, “People who break the law should be rehabilitated.” Just kidding! It’s actually, “People who break the law should be given stiffer sentences.” The bias in the question wording is ignored.

Another question was, “None of the political parties would do anything to benefit me.” Is agreeing or disagreeing with that a “conservative” position? What would the Occupy people say? Another, “Being single provides more time to experience life and find out about yourself.” Conservative or liberal?

According to the NCDS (pdf), there were about 50 questions, of which only 13 were used. A “conservative”, then, is whatever Hodson and Busseri say it is. The same thing goes for what a “racist” is.

For these questions “reliabilities ranged from .63 to .68.” This means the questions are imprecise and imperfect, so that if you use the raw results in subsequent analysis, you must “carry forward” the uncertainty in reliability. Did Hodson and Busseri do this? No.

One would have guessed from the title, that the authors looked at how the scores on the intelligence questions correlated with the scores on the attitude and racism questions, taking into account the uncertainty in the reliability. You would be wrong.

They first modeled the intelligence questions to create one “latent” (unobserved) measure, called “g”. The uncertainty in creating “g” is then ignored in all subsequent analysis. They did the same for the attitude questions, creating a “latent” (actually unobserved) variable called “conservative ideology.” Uncertainty in its creation is also ignored. Then the individuals’ education and socioeconomic status and separately their parent’s socioeconomic status (which again were the results of models) were put into a model with “g” and “conservative ideology” to predict “racism” (the uncertainty of which, as was already said, was ignored). The picture below summarizes their findings.

hodson.jpg

Lo, they found small p-values. The authors appear unaware that samples of this size are practically guaranteed to spit out small p-values.

What makes the study ludicrous, even ignoring the biases, manipulations, and qualifications just outlined, by the authors’ own admission the direct effect size for “g” on “racism” is only -0.01 for men and 0.02 for women. Utterly trivial; close enough to no effect to be no effect, their results statistically “significant” only because of the massive sample size.

The effect size for “conservative ideology” directly predicting “racism” is higher (0.69 and 0.51). But all that means is that the questions the authors picked for these two attitudes are roughly correlated with one another. In other words, “None of the political parties would do anything to benefit me” is crudely correlated with “I
wouldn’t mind working with people from other races” and so forth.

Yet the authors have the temerity to conclude, “These results from large, nationally representative data sets
provide converging evidence that lower g in childhood predicts greater prejudice in adulthood and, furthermore, that socially conservative ideology mediates much of this effect.”

Truly, statistics can “prove” anything.

—————————————————————————

1doi:10.1177/0956797611421206

Thanks to reader Jonathan Woolley who suggested this study.

Update I saw, on one website which linked to my criticism, a criticism of my criticism (get it?): “The subjects in the test were given a fifty question questionnaire and only 13 questions are used, and this jackass is complaining about that?” I am the “jackass.”

This articulate person (language warning on the link) says that social scientists mix in red herring questions with “real” ones so that interviewees can’t figure out what’s going on. This person also says that I was unaware of this. Not true. But even if I was, it would have been irrelevant.

The point I made was we do not know how the questions the authors did use—it doesn’t matter how many others were rejected and why these were chosen—were used to create “conservative” and “racist” indexes. I have given examples of two questions which are at least ambiguous; there are more. “Conservative” and “racist” are defined as how the authors see them, and not necessarily how civilians and other scientists would see them.

See also my comments below: the models fit by the authors result in very small effects. These effects mostly have small p-values, but as I said above, small p-values are practically guaranteed in large samples (> 1000). And remember, none of the uncertainty in creating the latent “g” and other indexes are carried forward in their models: if if was, the effect sizes would decrease further (and p-values would increase).

And for the real kicker, if we then “integrated out” the parameters (the βs) and tried to predict whether a person with a low “g” would be “racist”—the reason given for the study—the effects would be lower still, probably negligible. The “direct effect” was already trivial, the “total effect” barely marginal.

Incidentally, if you don’t know, “latent” means unobservable (and uncheckable). Social scientists love using these kinds of models—structural equation models, factor analysis, etc.—because they are so fertile. Sprinkle a little data on them and publishable p-values a plenty will sprout instantly.

13 responses so far

Older Entries »