Category: Statistics

The general theory, methods, and philosophy of the Science of Guessing What Is.

May 8, 2008 | 17 Comments

The Sean Bell shooting and probability

Yesterday, there were several protests in New York City. The participants were “outraged” over the recent acquittal of two black cops and one Lebanese cop who shot and killed Sean Bell, who was black.

Much was made about the fact that the three cops shot at Bell’s car 50 times. This number was touted repeatedly by some as evidence that the cops had used excessive force.

Let’s look at this from the probabilistic viewpoint. It turns out that when a cop fires his weapon at a person, he only hits his target about 30% of the time. Anybody who has ever fired a weapon before, especially in an altercation, will know that this is a pretty good rate, but of course not good enough to guarantee that just one shot will be enough to stop a target.

So about how many times must a cop fire so that he is at least 99.9% sure of hitting his target?

Well, if he fired just once, he has a 30% of hitting, or a 70% chance of missing. If he fired twice, what is the chance of hitting at least once? Hitting at least once can happen in three ways: hitting with the first bullet and missing with the second; missing with the first and hitting with the second; or hitting with both. The only other possibility is missing on both. The probability of all these scenarios is 1 (something has to happen). So the chance of hitting at least once is 1 minus the chance of missing both. Or `1 - (0.7)(0.7) = 1 - 0.49 = 0.51`.

This means that only firing two shots gives the officer a 50/50 chance of hitting his target. Not very good odds. He must fire more times to increase them.

It turns out that the same formula can be used for any number of shots. The probability of hitting at least once in three shots is `1 - (0.7)^3 = 1 - 0.34 = 0.66`. The probability of hitting at least once in `n` shots is then `1 - (0.7)^n`.

We want `1 - (0.7)^n` to be at least 0.999. Or, written mathematically, `1 - (0.7)^n > 0.999`. Now we have to recall high school algebra and solve for `n`. Subtract 1 from both sides and cancel the negative signs, which gives `(0.7)^n > 0.001`.

Now the hard part. If you don’t remember, just take my word for it, but now we use logarithms. So that we get `n log(0.7) > log(0.001)`, or `n > log(0.001)/log(0.7) = 20 `(rounding to the nearest shot).

That’s right. In order for the cop to be pretty sure of hitting his target (and therefore ensuring his target does not hit him), a copy has to shoot at least 20 times.

Thus, given that three cops were firing, 50 total shots does not seem that unusal.

Note: one cop shot 31 times, on 11, and the other 8. Of course, the above analysis ignores all external evidence, such as how the probability of hitting decreases when aiming at a moving target, awareness by one cop of shots fired by another, whether the cops were well motivated, etc.

April 21, 2008 | 73 Comments

CO2 and Temperature: which predicts which?

Parts of this analysis were suggested by Allan MacRae, who kindly offered comments on the exposition of this article which greatly improved its readability. The article is incomplete, but I wanted to present the style of analysis, which I feel is important, as the method I use eliminates many common errors found in CO2/Temperature studies. Any errors are, of course, entirely my own.

It is an understatement to say that there has been a lot of attention to the relationship of temperature and CO2. Two broad hypotheses are advanced: (Hypothesis 1) As more CO2 is added to the air, through radiative effects, the temperature later rises; and (Hypothesis 2) As temperature increases, through ocean-chemical and biological effects, CO2 is later added to the atmosphere. The two hypotheses have, of course, different consequences which are so well known that I do not repeat them here. Before we begin, however, it is important to emphasize that both or even neither of these hypotheses might be true. More on this below.

The source of monthly temperature data is from The University of Alabama in Huntsville, which starts in January 1980. Temperature is available at different regions: global, Northern Hemisphere, etc. The monthly global CO2 is from NOAA ERSL.

We want to examine the CO2/temperature processes at the finest level allowed by the data, which here is monthly at the time scale, and Northern and Southern Hemisphere and the tropics at the spatial scale. The reason for doing this, and not looking at just yearly global average temperature and CO2, is that any processes that occur at times scales less than a year, or occur only or differently in specific geographic regions, would be lost to us. In particular, it is true that the CO2/temperature process within a year is different in the Northern and Southern hemispheres, because, of course, of the difference in timing of the seasons and changes in land mass. It is also not a priori clear that the CO2/temperature process is the same, even at the yearly scale, across all regions. It will turn out, however, that the difference between the regional and global processes are minimal.

The question we hope to answer is, given the limitations of these data sets, with this small number of years, and ignoring the measurement error of all involved (which might be substantial), does (Hypothesis 1) increasing CO2 now predict positive temperature change later, or does (Hypothesis 2) increasing temperatures now predict positive CO2 change later? Again, this ignores the very real possibility that both of these hypotheses are true (e.g., there is a positive feedback).

During the course of an ordinary year, both Hypotheses 1 and 2 are true at different times, and sometimes neither is true: in the Northern Hemisphere, the temperature and CO2 both increase until about May, after which CO2 falls, though temperature continues to rise. In the Southern Hemisphere, temperature falls in the early months, while CO2 rises, and so on. These well known differences are due to combinations of respiration and changes in orbital forcing.

There are, then, obvious correlations of CO2 and temperature at different monthly lags and in different geographic regions (I use the word “correlation” in its plain English meaning and not in any statistical sense). We are not specifically interested in these correlations, which are well know and expected, and whose role in long-term climate change is minimal. The existence of these correlations present us with a dilemma, however. It might be that, for either Hypothesis 1 or 2, the time at which either CO2 or temperature changes in response to changes in forcing is less than one year, but disentangling this climate forcing with the expected changes due to seasonality, is, while possible, difficult and would require dynamical modeling of some sort (in the language of time series, the seasonal and long-term signals are possibly confounded at time scales less than 1 year).

Therefore, instead of looking at intra-year correlations, we will instead look at inter-year correlations. This introduces a significant limitation: any real, non-seasonal, correlations less than 1 year (or at other non-integer yearly time points) will be lost and it will be possible that we are misled in our conclusions (in the language of time series, the “power” on these non-integer-year lags will be aliased onto the 1 year lag). What is gained by this approach, however, is that there is no chance of misinterpreting lags less than one year as being due to a process other than seasonality. However, the main purpose of this article is not to identify the exact dynamical and physical CO2/temperature relationship, nor to identify the lag that best describes it; we just want to know is Hypothesis 1 or Hypothesis 2 more likely on time scales greater than 1 year?

Most of us have seen pictures like this one, which shows the monthly CO2 for 1980-1984; also shown in the Northern Hemisphere (NH) temperature anomaly (suitably normalized to fit on the same picture).

You can immediately see the intra-year CO2 “sawtooth”. This sawtooth makes it difficult to find a functional relationship of CO2 and temperature. I do not want to model this sawtooth, because I worry that whatever model I pick will be inadequate, and I do not immediately know how to carry the uncertainty I have in the model through to the final conclusion about our Hypotheses. I also do not want to smooth the sawtooth, or perform any other mathematical operation on the observed CO2 values within a year, because that tends to inflate measures of association.

Instead, let’s look at CO2 in a different way:

This is yearly CO2 measured within each month: each of the 12 months has its own curve through time. It doesn’t really matter which is which, though the two lowest curves are from the winter months (for those in the NH). What’s going on is still obvious: CO2 is increasing year by year and the rate at which it is doing so is roughly constant regardless of which month we examine.

Looking at the data this way show that the sawtooth has effectively been eliminated, as long as we examine year-to-year changes within each month through time.

Suppose we were only interested in Decembers and in no other months. Let us plot the actual December temperature from 1980 to 2006 on the x-axis and on the y-axis plot the increase in CO2 for the years 1981 to 2007. Shown in the thumbnail below is this plot: with black dots for the Southern Hemisphere (SH), red dots for the NH, and green dots for the tropics (redoing the analyses with global or sea surface temperatures instead of separating hemispheres produces nearly indistinguishable results). For example, in one year, the NH temperature anomaly was -0.6: this was followed in the next year by an increase of about 1.5 ppm of CO2 (this is the left-most plot on the figure).

The solid lines estimate the relationship between temperature and the change in CO2 (the dCO2/dt on the graph). These are loess lines and estimate the relationship between the two variables. If the loess lines were perfectly straight (and pointed in any direction), we would say the two measures are linearly correlated. The lines aren’t that straight, so the data does not appear to be that well correlated, linearly or otherwise.

Click on the figure (do this!) to see the same plot for each of the 12 months (right click on it and open it in a new window so you can follow the discussion). Notice anything? Generally, when temperature increases this year CO2 tends to increase in the following year. Hypothesis 2 is more likely to be true given this picture.

The loess lines are not always straight, which means that a straight-line model, i.e. ordinary correlation, is not always the best model. For example, in Januaries, until the temperatures anomalies get to 0 or above, temperature and change in CO2 have almost no relationship; after this point, the relationship becomes positive, i.e., increasing temperatures leads to increases in the change of CO2. The strength of the relationship also depends on the month: the first six months of the year show a strong signal, but the later six show a weakening in the relationship, regardless of where in the world we are.

Coincidence? Now plot the actual December CO2 from 1980 to 2006 on the x-axis and on the y-axis plot the change (increase or decrease) in temperature for the years 1981 to 2007. For example, in one year, the NH CO2 was 340 ppm: this was followed in the next year by a temperature decrease of about -0.5 degrees (this is the bottom left-most plot on the figure). No real signal here:

Again, click on the figure (do this!) to see all twelve months. There does not appear to be any relationship in any month between CO2 and change in temperature, which weakens our belief in Hypothesis 1.

It may be that it takes two years for a change in CO2 or temperature to force a change in the other. Click here for the two-year lag between temperature and change in CO2; and here for the two-year lag between CO2 and change in temperature. No signals are apparent in either scenario.

As mentioned above, what we did not check are all the other possibilities: CO2 might lead or lag temperature by 9.27, or 18.4 months, for example; or, what is more likely, the two variables might describe a non-linear dynamic relationship with each other. All I am confident of saying is, conditional on this data and its limitations etc., that Hypothesis 2 is more probable than Hypothesis 1, but I won’t say how much more probable.

It is also true that, over this period of time and using this data, CO2 always increased. The cause of this increase sometimes was related to temperature increases (rising temperatures led to more CO2 being released) and sometimes not. We cannot say, using only this data, why else CO2 increased, although we know from other sources that CO2 obviously increased because of human-cased activities.

April 8, 2008 | 120 Comments

Why multiple climate model agreement is not that exciting

There are several global climate models (GCMs) produced by many different groups. There are a half dozen from the USA, some from the UK Met Office, a well known one from Australia, and so on. GCMs are a truly global effort. These GCMs are of course referenced by the IPCC, and each version is known to the creators of the other versions.

Much is made of the fact that these various GCMs show rough agreement with each other. People have the sense that, since so many “different” GCMs agree, we should have more confidence that what they say is true. Today I will discuss why this view is false. This is not an easy subject, so we will take it slowly.

Suppose first that you and I want to predict tomorrow’s high temperature in Central Park in New York City (this example naturally works for any thing we want to predict, from stock prices to number of people who will vote for a certain USA presidential candidate). I have a weather model called `MMatt`. I run this model on my computer and it predicts 66 degrees F. I then give you this model so that you can run it on your computer, but you are vain and rename the model to `MMe`. You make the change, run the model, and announce that `MMe` predicts 66 degrees F.

Are we now more confident that tomorrow’s high temperature will be 66 because two different models predicted that number?

Obviously not.

The reason is that changing the name does not change the model. Simply running the model twice, or a dozen, or a hundred times, does not give us any additional evidence than if we only ran it just once. We reach the same conclusion if instead of predicting tomorrow’s high temperature, we use GCMs to predict next year’s global mean temperature: no matter how many times we run the model, or how many different places in the world we run it, we are no more confident of the final prediction than if we only ran the model once.

So Point One of why multiple GCMs agreeing is not that exciting is that if all the different GCMs are really the same model but each just has a different name, then we have not gained new information by running the models many times. And we might suspect that if somebody keeps telling us that “all the models agree” to imply there is greater certainty, he either might not understand this simple point or he has ulterior motives.

Are all the many GMCs touted by the IPCC the same except for name? No. Since they are not, then we might hope to gain much new information from examining all of them. Unfortunately, they are not, and can not be, that different either. We cannot here go into detail of each component of each model (books are written on these subjects), but we can make some broad conclusions.

The atmosphere, like the ocean, is a fluid and it flows like one. The fundamental equations of motion that govern this flow are known. They cannot differ from model to model; or to state this positively, they will be the same in each model. On paper, anyway, because those equations have to be approximated in a computer, and there is not universal agreement, nor is there a proof, of the best way to do this. So the manner each GCM implements this approximation might be different, and these differences might cause the outputs to differ (though this is not guaranteed).

The equations describing the physics of a photon of sunlight interacting with our atmosphere are also known, but these interactions happen on a scale too small to model, so the effects of sunlight must be parameterized, which is a semi-statistical semi-physical guess of how the small scale effects accumulate to the large scale used in GCMs. Parameterization schemes can differ from model to model and these differences almost certainly will cause the outputs to differ.

And so on for the other components of the models. Already, then, it begins to look like there might be a lot of different information available from the many GCMs, so we would be right to make something of the cases where these models agree. Not quite.

The groups that build the GCMs do not work independently of one another (nor should they). They read and write for the same journals, attend the same conferences, and are familiar with each other’s work. In fact, many of the components used in the different GCMs are the same, even exactly the same, in more than one model. The same person or persons may be responsible, through some line of research, for a particular parameterization used in all the models. Computer code is shared. Thus, while there are some reasons for differing output (and we haven’t covered all of them yet), there are many more reasons that the output should agree.

Results from different GCMs are thus not independent, so our enthusiasm generated because they all roughly agree should at least be tempered, until we understand how dependent the models are.

This next part is tricky, so stay with me. The models differ in more ways than just the physical representations previously noted. They also differ in strictly computational ways and through different hypotheses of how, for example, CO2 should be treated. Some models use a coarse grid point representation of the earth and others use a finer grid: the first method generally attempts to do better with the physics but sacrifices resolution, the second method attempts to provide a finer look at the world, while typically sacrificing accuracy in other parts of the model. While the positive feedback in temperature caused by increasing CO2 is the same in spirit for all models, the exact way it is implemented in each can differ.

Now, each climate model, as a result of the many approximations that must be made, has, if you like, hundreds (even thousands) of knobs that can be dialed to and fro. Each twist of the dial produces a difference in the output. Tweaking these dials, then, is a necessary part of the model building process. The models are tuned so that they, as closely as possible, first are able to produce climate that looks like the past, already observed, climate. Much time is spent tuning and tweaking the models so that they can, at least roughly, reproduce past climate. Thus, the fact that all the GCMs can roughly represent the past climate is again not as interesting as it first seemed. They better had, or nobody would seriously consider the model as a contender.

Reproducing past data is a necessary but not sufficient condition that the models can predict future data. Thus, it is also not at all clear how these tweakings affect the accuracy in predicting new data, which is data that was not used in any way to build the models, that is, future data. Predicting future data has several components.

It might be that one of the models, say GCM1 is the best of the bunch in the sense that it matches most closely future data. If this is always the case, if GCM1 is always closest (using some proper measure of skill), then it means that the other models are not as good, they are wrong in some way, and thus they should be ignored when making predictions. The fact that they come close to GCM1 should not give us more reason to believe the predictions made by GCM1. The other models are not providing new information in this case. This argument, which is admittedly subtle, also holds if a certain group of GCMs are always better than the remainder of models. Only the close group can be considered independent evidence.

Even if you don’t follow—or believe—that argument, there is also the problem of how to quantify the certainty of the GCM predictions. I often see pictures like this:

Each horizontal line represents the output of a GCM, say predicting next year’s average global temperature. It is often thought that the spread of the outputs can be used to describe a probability distribution over the possible future temperatures. The probability distribution is the black curve drawn over the predictions, and neatly captures the range of possibilities. This particular picture looks to say that there is about a 90% chance that the temperature will be between 10 and 14 degrees. It is at this point that people fool themselves, probably because the uncertainty in the forecast has become prettily quantified by some sophisticated statistical routines. But the probability estimate is just plain wrong.

How do I know this? Suppose that each of the eight GCMs predicted that the temperature will be 12 degrees. Would we then say, would anybody say, that we are now 100% certain in the prediction?

Again, obviously not. Nobody would believe that if all GCMs agreed exactly (or nearly so) that we would be 100% certain of the outcome. Why? Because everybody knows that these models are not perfect.

The exact same situation was met by meteorologists when they tried this trick with weather forecasts (this is called ensemble forecasting). They found two things. The probability forecasts made by this averaging process were far too sure—the probabilities, like our black curve, were too tight and had to made much wider. Second, the averages were usually biased—meaning that the individual forecasts should all be shifted upwards or downwards by some amount.

This should also be true for GCMs, but the fact has not yet been widely recognized. The amount of certainty we have in future predictions should be less, but we also have to consider the bias. Right now, all GCMs are predicting warmer temperatures than are actually occurring. That means the GCMs are wrong, or biased, or both. The GCM forecasts should be shifted lower, and our certainty in their predictions should be decreased.

All of this implies that we should take the agreement of GCMs far less seriously than is often supposed. And if anything, the fact that the GCMs routinely over-predict is positive evidence of something: that some of the suppositions of the models are wrong.

April 5, 2008 | 11 Comments

Spanish Expedition

I have returned from Madrid, where the conference went moderately well. My part was acceptable, but I could have done a better job, which I’ll explain in a moment.

Iberia Airlines is reasonable, but the seats in steerage were even smaller than I thought. On the way there, I sat next to a lady whose head kept lolling over onto me as she slept. The trip back was better, because I was able to commandeer two sets. Plus, there were a large, boisterous group of young Portuguese men who apparently had never been to New York City before. They were in high spirits for most of the trip, which made the journey seem shorter. About an hour before landing they started to practice some English phrases which they thought would be useful for picking up American women: “Would you go out with me?”, “I like you”, and “You are a fucking sweetheart.”

My talk was simultaneously translated in Spanish, and I wish I would have been more coherent and that I would have talked slower. The translator told me afterwards that I talked “rather fast.” I know I left a lot of people wondering.

The audience was mostly scientists (of all kinds) and journalists. My subject was rather technical and new, and while I do think it is a useful approach, it is not the best talk to present to non-specialists. My biggest fault was my failure to recognize and speak about the evidence that others found convincing. I could have offered a more reasonable comparison if I had done so.

I’ll write about these topics in more depth later, but briefly: people weight heavily the fact that many different climate models are in agreement in closely simulating past observations. There are two main, and very simple problems with this evidence, which I could have, at the time, done a better job pointing out. For example, I could have asked this question: why are there any differences between climate models? The point being that eight climate models agreeing is not eight independent pieces of evidence. All of these models, for instance, use the same equations of motion. We should be surprised that there are any differences between them.

The second problem I did point out, but I do not think I was convincing. So far, climate models over-predict independent data: that is, they all forecast higher temperatures than are actually observed. This is for data that was not used to fit the models. This means, this can only mean, that the climate models are wrong. They might not be very wrong, but they are wrong just the same. So we should be asking: why are they wrong?

There was a press conference, conducted in Spanish. I can read Spanish much better than I can hear it, which is a fault I should work harder to correct, but it meant that I could not follow most of the comments or questions well. I was the critical representative, and a Professor Moreno was my foil. The most pertinent question to me was (something like) “Do I think it is time for new laws to be passed to combat global warming?” I said no. Professor Moreno vehemently disagreed, incorrectly using as an example the unfortunate heat wave in Spain that was responsible for a large number of deaths. Incorrect, because it is impossible to say that this particular heat wave was caused by humans (in the form of anthropogenic global warming). But the press there, like here (like everywhere), enjoyed the conflict between us, so this is what was reported.

Here, for the sake of vanity, are some links (in Spanish) for the news coverage. We were also on the Spanish national television news on the first night of the conference, but I didn’t see it because we were out. Some of these links may, of course, expire.

Madrid itself was wonderful, and my hosts Francisco Garc?a Novo y Antonio Cembrero were absolute gentlemen, and I met many lovely people. I was introduced to several excellent restaurants and cervesaria. The food was better than I can write about—I nearly wept at the Museo del Jamon. I felt thoroughly spoiled. Dr Novo introduced me to La Grita, a subtle sherry that is a perfect foil for olives. I managed to find some in the duty free shop, and I recommend that if you see some, snatch it up.

Come back over the next few days. By then, I hope to have written something on the agreement of climate models.