Skip to content

Category: Statistics

The general theory, methods, and philosophy of the Science of Guessing What Is.

March 20, 2008 | 4 Comments

Homework #1: Answer part I

A couple of days ago I gave out homework. I asked my loyal readers to count how many people walked by them and to keep track of how many of those people wore a thinking-suppression device like an I-pod etc. Like every teacher, my heart soared like a hawk when some of the students actually completed the task. Visit the original thread’s comments to see the “raw” data.

The project was obviously to recreate a survey of the kind which we see daily: e.g. What percent of Americans favor a carbon tax? What fraction of the voters want “change”? How many prefer Brand A? And so on.

Here is how a newspaper might present the results from our survey:

More consumers are endangering their hearing than ever before, according to new research by WMBriggs.com. Over 20% of consumers now never leave the house without an I-pod or I-pod-like device.

“Music is very popular” said Dr Briggs, “And now it’s easier than ever before to listen to it.” This might help explain the rise in tinnitus reports, according to some sources. Dr So Undzo of the Send Us Money to Battle Tinnitus Foundation was quoted as saying, “Blah blah blah.” He also said, “Blah blah blah blah blah.” &tc. &tc.

Despite its farcical nature, this “news” report is no different than the dozens that show up on TV, the radio, and everywhere else. In order to tell a newsworthy story, it extrapolates wildly from the data at hand, it gives you no idea who collected the original data or why (for money? for notoriety?) or how (by observation? by interview?), or of any of the statistical methods used to manipulate the data. In short: it is very nearly worthless. The only advantage a story like this has is that it can be written before any data is actually taken, saving time and money to the news organization issuing it.

But you already knew all that. So let’s talk about the real problem with statistics. Beware, however, that some of this is dull labor, requiring attention to detail, and probably too much work for too little content. However, that’s how the get you, by hoping you pass by quickly and say “close enough.”

We had five to six responses to the homework so far, but we’ll start with the first one from Steve Hempell. He saw n=41 people and counted m=1 wearing a thinking-suppression device (TSD). He sat on a bench in a small town during spring break to watch citizens pass by.

The first thing we need to have securely in our minds is what question we want to answer with this data. The obvious one is “How many people regularly wear a TSD?” This innocent query begins our troubles.

What do we mean by “people”? All people? There are a little over 6 billion humans now. Do we want an estimate from that group? What about historical, i.e. dead, people, or those yet to be born? How far back into the future or past do we want to go? Are we talking of people “now”? Maybe, but we still have to define “now”: does it mean in a year or two, or just the day the survey was taken or a few days into the future? Trivial details? Well, we’ll see. Let’s settle on the week after the survey was taken so that our question becomes “How many people in the week after our survey was taken regularly wear a TSD?”

We’re still not done with “people” and haven’t decided whether it was all humans or some subset. The most common subset is “U.S. Americans” (as Miss Teen South Carolina would have phrased it). But all U.S. citizens? Presumably, infants do not wear TSDs, nor do many in nursing homes or in other incarcerations. Were infants even counted in the survey? Older people in general, experience tells us, do not often wear TSDs. As I think about this question, I find myself unable to rigorously quantify the subset of interest. If I say “All U.S. citizens” then my eventual estimate would probably be too high, given this small sample. If I say, “U.S. citizens between the ages of 15 and 55” then I might do better, but the survey is of less interest.

To pick something concrete, we’ll go with “All U.S. citizens” which modifies our question to “How many U.S. citizens in the week after our survey was taken regularly wear a TSD?”

Sigh. Not done yet. We still have to tackle “regularly” and the bigger question of whether or not our sample represents fairly the population we have in mind, and would still leave the largest, most error-prone area: what exactly is an TSD? I-pods were identified, but how about cell phones or Blackberries and on and on? Frankly, however, I am bored.

Like I said, though, boredom is the point. No one wants to invest as much time as we have for this simple survey to each survey they meet. No matter how concrete the appropriate population in a survey seems to you, it can mean something entirely different to somebody else; each person can take away their own definition. This ambiguity, while frustrating to me, is gold to marketers, pollsters, and “researchers.” So vaguely worded are surveys that the reader can supply any meaning they want to its results. Although they usually consciously aware of it, people read surveys like they read horoscopes or psychic readings: they always seem accurate or to confirm people’s worst fears or hopes.

An objection might have occurred to you. “Sure, these complex surveys are ambiguous. But there are simple polls that are easy to understand. The best example is ‘Who will you vote for, Candidate A or B?’ Not much to confuse there.”

You mean, since a poll is a prediction of ballot results, besides trusting that the pollster found a population representative of people who will actually vote on election day? That no event between the time the poll was taken and the election occurs that will cause people to change their minds? And—pay attention here—nobody lied to the pollster?

“Oh, too few people lie to make a difference.” Yeah? Well, I live in New York City and I like to tell the story of the exit polls taken for the presidential race between Kerry and Bush. Those polls had Kerry ahead by about 10 to 1, a non-surprising result, and one which confirmed people’s prior beliefs. The pollsters asked tons of voters and were spread throughout the city in an attempt to obtain the most representative sample they could. Not everybody would answer them, of course, and that is still another problem which is impossible to tackle.

But when the actual results were tallied, Kerry won by only a margin about a little under 5 to 1. Sure, he still won, but the real shocker is that so many people lied to the pollster. And why? Well, this is New York City, and in Manhattan particularly, you just cannot easily admit to being a Bush supporter (then or now). At the least, doing so invites ridicule, and who needs that? Simpler just to lie and say, “I voted for Kerry.”

We have done a lot and we still haven’t answered the question of how to handle the actual data!

Here are the answers to part I of the homework.

  1. The applicability of all surveys is conditional on a population which must be, though rarely is, rigorously defined.
  2. All surveys have significant measurement error that has nothing to do with the actual numerical data.
  3. Because of this, people are too certain when reading or interpreting the results of surveys

In part II, if we are not already worn down, we will learn how to—finally!—handle the data.

March 14, 2008 | 10 Comments

CRITICAL ASSESSMENT OF CLIMATE CHANGE PREDICTIONS FROM A SCIENTIFIC PERSPECTIVE

Here is the link to the symposium which I mentioned a few weeks back. It is being sponsored by the Ram?n Areces Foundation and the Royal Academy of Sciences of Spain, and will be held in Madrid on the 2nd and 3rd of April. Part of the introduction says:

The Royal Academy of Sciences of Spain and the Ram?n Areces Foundation wish to contribute to the creation of an informed public opinion on global change in the country. To this end, they are organising a two-day symposium aimed at scientists from different fields, decision makers and general public. Existing facts and analysis tools will be discussed, and the robustness and uncertainties of predictions made on the basis of the former, critically assessed. The meeting will provide a scientific view of existing knowledge on climate change and its expected consequences. Existing physical, chemical and mathematical tools will be discussed and climate effects will be analysed together with other concurrent changes, which tend to be overlooked in the climate change scenarios.

Presentations by the different contributors will emphasise existing scientific evidence as well as the strengths and weaknesses of predictions made on the basis of available data and modelling tools. Contributors are encouraged to express their opinions on the most relevant problems concerning the topics they will present, including scientific issues, main threats and possible mitigation or adaptation strategies.

The program is now online. My talk is entitled “Robustness and uncertainties of climate change predictions”. The deadline for me to turn it in is today. I am still working on it and not at all satisfied that I have done a good job with my topic. I am simultaneously writing a paper and the talk, and I will post both of them here, not un-coincidentally, on 1 April.

The gist of my talk I have summarized:

Global warming is not important by itself: it becomes significant only when its effects are consequential to humans. The distinction between questions like “Will it warm?” and “What will happen if it warms” is under-appreciated or conflated. For example, when asking how likely are the results of a study of global warming’s effects, we are apt to confuse the likelihood of global warming as a phenomenon with what might happening because of global warming. When of course the two kinds of questions and likelihoods are entirely separate.

Because of the frequency of confusion, I want to follow the path to the conclusion of one particular study whose results state A = “There will be More kidney and liver disease, ambulance trips, etc. because of global warming.” I start from first principles, and untangle and carefully focus on the chain of causation leading up this central claims, and quantify the uncertainty of the steps along the way.

In short, I will estimate the probability that AGW is real, the probability that some claim of global warming’s effects is true given global warming is true, and the unconditional probability that the effect is true. That’s not too much to tackle, is it?

Thank God there will be simultaneous translation of the conference, because my Spanish is getting worse and worse the more I think about it. If I was going to play soccer, then I’d be on more familiar ground. I do know how to ask that a ball be passed to me because I am alone an unguarded, and how to offer constructive criticism to a fellow teammate for not recognizing this fact and for taking a ridiculous shot at goal himself. But I am not sure how this language would apply to global warming.

March 13, 2008 | 1 Comment

Another reason to leave academia

1. Repeat after me: “There are no innate biological differences between men and women…except, well, women are of course better nurturers, sympathizers, empathizers, and a score of other things.”

2. Now use the law (Title IX) designed to enforce equal numbers of girls playing sports as boys to mandate an even number of women and men in physics and math departments in universities that receive federal funding (which is all universities except one or two).

3. Then try applying for a grant with a male as PI.

Full story here
. With full props to Arts & Letters Daily.

Some hilarity from the article:

For one thing, the Title IX compliance reviews are already underway. In the spring of 2007, the Department of Education evaluated the Columbia University physics department. Cosmology professor Amber Miller, talking to Science magazine, described the process as a ?waste of time.? She was required to make an inventory of all the equipment in the lab and indicate whether women were allowed to use various items. ?I wanted to say, leave me alone, and let me get my work done.? But Miller and her fellow scientists are not going to be left alone.

“Say, are women allowed to use this slide rule?”

All this is fair enough, of course, because as we certainly must believe, “There are no innate biological…”. As for me, I cannot wait, if this law is passed, for the comedic opportunities when the first male sues a woman’s studies department, or English, or etc., to force them to hire more men. And naturally, lawyers will be brought in to judge the merit of promotions. Who better than a lawyer to judge differences in papers on string theory?

March 10, 2008 | 12 Comments

It depends what the meaning of mean means.

Yesterday’s post was entitled, “You cannot measure a mean”, which is both true and false depending—thanks to Bill Clinton for the never-ending stream of satire—on what the meaning of mean means.

The plot I used was a numerical average at each point. This implies that at each year there were several direct measures that were averaged together and then plotted. This numerical average is called, among other things, a mean.

In this sense of the word, a mean is obviously observable, and so yesterday’s title was false. You can see a mean, they do exist in the world, they are just (possibly weighted) functions of other observable data. We can obviously make predictions of average values, too.

However, there is another sense of the word mean that is used as a technical concept in statistics, and an unfortunate sense, one that leads to confusion. I was hoping some people would call me on this, and some of you did, which makes me very proud.

The technical sense of mean is as an expected value, which is a probabilistic concept, and is itself another poorly chosen term, for you often never expect, and cannot even see, an expected value. A stock example is a throw of a die, which has an expected value of 3.5.

Yesterday’s model B was this

B: y = a + b*t + OS

I now have to explain what I passed over yesterday, the OS. Recall that OS stood for “Other Stuff”; it consisted of mystery numbers we had to add to the straight line so that model B reproduced the observed data. We never know what OS is in advance, so we call it random. Since we quantify our uncertainty in the unknown using probability, we assign a probability distribution to OS.

For lots of reasons (not all of them creditable), the distribution is nearly always a normal (the bell-shaped curve), which itself has two unobservable parameters, typically labeled μ and σ^2. We set μ=0 and guess σ^2. Doing this implies—via some simple math which I’ll skip—that the unknown observed data is itself described by a normal distribution, with two parameters μ = a + b*t and the same σ^2 that OS has.

Unfortunately, that μ parameter is often called “the mean“. It is, however, just a parameter, an unobservable index used for the normal distribution. As I stressed yesterday (as I always stress), this “mean” cannot be seen or measured or experienced. It is a mathematical crutch used to help in the real work of explaining what we really want to know: how to quantify our uncertainty in the observables.

You cannot forecast this “mean” either, and you don’t need any math to prove this. The parameter μ is just some fixed number, after all, so any “forecast” for it would just say what that value is. Like I said yesterday, even if you knew the exact value of μ you still do not know the value of future observables, because OS is always unknown (or random).

We usually do not know the value of μ exactly. It is unknown—and here we depart the world of classical statistics where statements like I am about to make are taboo—or “random”, so we have to quantify our uncertainty in its value, which we do using a probability distribution. We take some data and modify this probability distribution to sharpen our knowledge of μ. We then present this sharpened information and consider ourselves done (these were the blue dashed lines on the plot yesterday).

The unfortunate thing is that the bulk of statistics was developed to make more and more precise statements about μ : how to avoid bias in its measurement, what happens (actually, what never can happen) when we take an infinite amount of data, how estimates of it are ruled by the central limit theorem, and on and on. All good, quality mathematics, but mostly besides the point. Why? Again, because even if we knew the value of μ we still do not know the value of future observables. And because people tend to confuse their certainty in μ with their certainty in the observables, which as we saw yesterday, usually leads to vast overconfidence.

From now on, I will not make the mistake of calling a parameter a “mean”, and you won’t either.