Skip to content

Category: Statistics

The general theory, methods, and philosophy of the Science of Guessing What Is.

July 25, 2008 | 33 Comments

On the difference between mathematical ability between boys and girls

Today’s headlines mostly got it wrong:

  • The New York Sun said “Study Shatters Myth That Boys Are Better At Math.”
  • The New York Post said “Girls = boys in math skills.”
  • The New York Daily News said “Math gender differences erased.”
  • The New York Times said “Math Scores Show No Gap for Girls, Study Finds.”

Only Keith Winstein at the Wall Street Journal got it right:

This is, of course, a political topic. This is evidenced by the Times beginning their take on the story by recalling the fate of Larry Summers, ex-president of Harvard, who dared to publicly wonder whether males and females have similar mathematical ability. In case you don’t recall, he surmised that they did not, and he was crucified for uttering such politically-incorrect heresy.

Janet Hyde, who is a professor at the University of Wisconsin, Madison, and who led the study, said the idea that boys might be better at math is a “stereotype.” Well, let’s see.

Hyde’s study, which is wholly statistical, is typical. And none of the headlines, save the WSJ, correctly describe what Hyde actually did. To explain it, I have to get a bit technical, but stay with me because this is very important.

Hyde fit a probability model to her data and then made an indirect statement about the value of that models’ parameters. What does this mean? She first assumed that the approximate uncertainty in math scores could be modeled by a normal distributions. Normal distributions have two parameters which must be specified. The first is usually (and mistakenly) called the “mean” and it describes where the peak or center of the normal distribution lies. The second is usually (and mistakenly) called the “variance” and it describes the spread of the distribution: larger variances mean that the data is more uncertain.

A statistical test is then run, asking “Are the mean parameters for boys and girls equal or unequal?” If the mean for the boys is larger than the mean for the girls, the implication is that boys are better at math than are girls. If the means are roughly equal, then people conclude—sometimes incorrectly—that the performance of boys and girls are “the same.”

It is important to emphasize that the study as reported in most newspapers only said something about the mean parameters for the boys and girls. These parameters were roughly equal, and this implied, all other things being equal, that boys’ and girls’ ability is equal.

But all things are not equal.

What all the news reports, except the WSJ, forgot was the variance. The following picture will help explain what I mean.

Boys and girls math ability

The top picture shows the normal distributions of what might be normalized math test scores for girls and boys: scores greater than 0 are better than average, scores less than 0 are worse than average (these data are just an illustration; I don’t have Hyde’s study data, but the point is the same). The girls are the solid line, the boys are the dashed. You can see that both have a peak in exactly the same place. This implies that the mean performance for both boys and girls is the same, that is, on average, their performance is the same.

But notice that the boy’s line is a little—only just a tiny—bit more spread out than the girls’. This is because the variance for the boys is larger than for the girls, but just a little larger. Can this make any difference to the performance on math tests? Yes, a huge difference.

The lower-left picture is just like the larger picture, but it blows up the area of high test scores (those greater than 3.5). The dashed line (the boys) is everywhere on top of the solid line (the girls), which means it is more likely for boys to outscore the girls at the highest levels of the test.

The picture on the lower-right shows how much more likely. For example, for test scores of 5 or higher, boys are over 9 times more likely to do better than the girls! This is not to say that there will not be any girls at the very top: there will be.

What this all means is that you will see many more boys than girls at the very top of the test scores. But it also means that you will see many more boys than girls at the very bottom of the test scores! We could draw a similar picture to the lower-right which shows those who do very badly at the math tests: boys outnumber girls here, too.

As the WSJ said “Girls and boys have roughly the same average scores on state math tests, but boys more often excelled or failed”. This is all because, for every grade and in every state, the mean of the boys and the girls is the same, but the boys are always more variable.

Now, if this difference—for it is a difference—persists at the college and post-graduate level, and if math professors are chosen by their ability, than males will outnumber females. Which is exactly what is found at actual colleges and universities.

Why the difference in variance exists is unknown, but it is again a political question. We could surmise, with Mr Summers, that the difference is due to innate tendencies, but to admit that is to admit that, at the top, men are better than women. But this also admits that, at the bottom, men are worse than women. The difference might be due to education: teachers could be singling out the best—and worst—boys and then treat them differently than the best and worst girls. But this is unlikely at the college level, and does not account for post-graduate performance either (number and quality of papers published, etc.).

It is more plausible that males and females are different in their abilities. Just don’t say this very loudly, or you will get yourself into some serious trouble, like Mr Summers, who, as the philosopher David Stove often said, “quickly rediscovered the definition of the word sacred“.

July 16, 2008 | 8 Comments

Shifting Baselines Syndrome

If you doubt any claim made about man-made global warming, Jennifer Jacquet thinks you are a “miscreant” and on par with those who deny that “smoking causes cancer.” She also draws the conclusion that since one “denialist” sports the same upper-lip fuzz as Snidely P. Whiplash, the rest of them are somewhere twirling their moustaches and up to now good.

Well, these kinds of childish insults are common by now. Sticks and stones, etc. What makes Jacquet’s edge-of-sanity ramblings interesting is her involvement in work called “Shifting Baselines.” Jacquet, and others at her web site, call “Shifting Baselines” a syndrome.

In case you weren’t paying attention, I said s-y-n-d-r-o-m-e. These are scary things and ordinarily require medical treatment, or even psychological counseling, so we are talking of serious things here. What are the signs of this dread malady? From Jacquet’s link to Wikipedia:

Shifting baseline (also known as sliding baseline) is a term used to describe the way significant changes to a system are measured against past baselines, which themselves may represent significant changes from the original state of the system.

For example (also on Wikipedia)

A cup of coffee may have only cost a $0.05 in the 1950’s, but in the 1980’s the cost shifted to $1.00 (ignoring inflation). The current (21st century) coffee prices are based on the 1980s model, rather than the 1950s model. The point of reference moved.

In other words, the term “shifting baselines” is based on the trivial observation that statements you make about the state of system will change depending on what you reference them to. Originally applied to fish stocks, an example might be that you could have said “Since 1950, fish species A stocks (in area B etc.) have decreased by 32%” or you could have said “Since 1930, fish stocks have decreased by 36%.” Both statements are true (we are supposing), because both have different baselines.

This wouldn’t be in the least interesting, except that people like Jacquet and her advisor Daniel Pauly at the University of British Columbia, are worried that some are not picking environmentally-approved baselines. They are suspicious that the non-Enlightened are picking incorrect baselines so that the can downplay the decline of, say, certain fish stocks.

And they might be right. But they have an easy rejoinder to any that pick a suspicious baseline: just pick a different one and justify it. Voila! If their opponents are trying to goose the statistics disingenuously by cherry picking a baseline, they can point that out too.

None of this behavior yet qualifies as a “syndrome”, however. What makes it one—in their minds—is the idea that there is one, fixed, Platonic, pre-human baseline where the stock of fish species A, and every other plant and animal, was pure and unadulterated. By refusing to recognize this undefiled baseline, we are refusing to admit the obvious: that the world is growing worse and worse through human behavior.

We can then say that the syndrome is “the tendency for each new generation to accept a degraded environment as normal/natural.”

It is a simple biological observation, however, that through all of Earth’s history the quantity of any species has never been fixed or static. There have been the genesis of new species and the extinction of others long before humans came on the scene. And there isn’t any species on this planet that isn’t food for some other species. Life is in constant flux. All this, I imagine, Jacquet and Pauly would admit.

The period of time they are aiming at, as their shifted baseline, is the one right before people got here, say a couple of hundred thousand years ago. But even Jacquet and Pauly must admit that humans have to have some influence on the number of each species alive. Jacquet, as I gather from the picture on her web site, certainly looks healthy and well fed, so I conclude she is eating some of the other species. I presume Pauly is, too.

Thus, since humans must have some influence, and Jacquet and Pauly and the rest of us are currently influencing the environment, it becomes only a question of how much. How much is too much? How much is enough? What is permissible? What is not? And so on.

In other words, the baseline to use as a reference is a moving—or shifting—target that can only be defined by reference to the whole range of human behavior. This means, of course, the best baseline to use in any situation is a political and scientific question. Disagreements about baselines would then appear to be normal human behavior.

But Jacquet and Pauly want to medicalize these disagreements. Any that disagree with “experts”—which I suppose are people so designated by them—are not just making a mistake, they are exhibiting abnormal behavior.

In one way, of course, labeling disagreements a “syndrome” is cheering because it implies that being in the state of disagreement is not the sufferer’s fault. Some thing or somebody is responsible for leading the patient astray. This implies there is a cure, which is simple: remove the thing or body responsible for causing the dissention.

We can count ourselves lucky that they have not yet called for re-education camps or for medicating those inflicted.

July 15, 2008 | 36 Comments

What’s Wrong with the Sun? (Nothing)

The headline comes from this article at NASA, sent in by reader “Mike D.”

The gist of the article is “that there’s nothing to report.” Says David Hathaway:

“There have been some reports lately that Solar Minimum is lasting longer than it should. That’s not true. The ongoing lull in sunspot number is well within historic norms for the solar cycle.”…Although minima are a normal aspect of the solar cycle, some observers are questioning the length of the ongoing minimum, now slogging through its 3rd year…Hathaway has studied international sunspot counts stretching all the way back to 1749 and he offers these statistics: “The average period of a solar cycle is 131 months with a standard deviation of 14 months. Decaying solar cycle 23 (the one we are experiencing now) has so far lasted 142 months–well within the first standard deviation and thus not at all abnormal. The last available 13-month smoothed sunspot number was 5.70. This is bigger than 12 of the last 23 solar minimum values.”

In summary, “the current minimum is not abnormally low or long.”

Let’s take a look at the actual data and see if the statements about the “normalness” of the sunspot number are accurate. And let’s keep in mind the real reason NASA made this press release, the purpose of which is never directly stated—can you see it?. I’ll come back to this later.

Here is a picture from NASA showing the “Yearly Averaged Sunspot Numbers 1610-2007.”
Sunspots through time

Solar cycle “number 1” peaked around 1760, the cycles and other behaviors before this time are ignored in the official counting. Well, that’s neither here nor there—the labels do not matter—but we should always remember that the sun’s sunspot activity has been taking place for at least 4 to 5 billion years, and we only have measurements on the last 400. Thus we are in a very poor position to say what is “normal” and what is not. We can, however, make statements conditional on the data observed so far.

Hathaway’s analysis starts with cycle number “1” and ignores the previous data, which, given the extended period of low to no sunspots from 1650 to 1700, actually weakens his case. This is because, conditional on all the available evidence, periods of time with no or low sunspots are not that unusual. These quiescent periods are more likely given all the evidence than they are just using the data from 1749. This is true based on the simple observation that all the data has more quiescent periods than does the later half. It is true regardless of the periodicities or other structures present. Because we have seen periods in the past with few or no sunspots is excellent evidence, after all, that we will see these periods in the future.

So why would he purposely ignore evidence that would have strengthened his case? Part of the reason is that there is the possibility that the data before 1749 is measured with error, and so should be discounted somewhat. However, this error is not especially large. The real reason has to do with the “Maunder Minimum” (shown on the graph), the period with few or no sunspots. This period does not fit the probability model Hathaway has in mind, so it is ignored. NASA says this about the Maunder:

For reasons no one understands, the sunspot cycle revived itself in the early 18th century and has carried on since with the familiar 11-year period. Because solar physicists do not understand what triggered the Maunder Minimum or exactly how it influenced Earth’s climate, they are always on the look-out for signs that it might be happening again.

But Hathaway thinks the “quiet of 2008 is not the second coming of the Maunder Minimum.”

Thus we have gone from “For reasons no one understands” to “the solar cycle is progressing normally.” The path from one statement to the other is indeed rocky. This is why I believe Hathaway’s statements are too certain. I believe that periods of low to no sunspots are more likely. I am not, however, disagreeing with Hathaway in the sense that it does not appear that we are in another Maunder: there is only scant, at best, evidence for this.

As a technical note: It is not clear that the uncertainty in length of time in months that the cycles last is best represented by a normal distribution, as used by Hathaway. Ignoring the Maunder makes his approximation a better one, but there is never a good reason to ignore part of the data it does not fit your expectations.

Anyway, back to the real purpose of this press release. Why are people so interested in the length of the solar cycle? Easy. Because for years, most climatologists insisted that the role the sun plays in the climate was minimal. That is to say, changes in the behavior of the sun were not thought to be related to changes in the Earth’s climate. The sun, which alone supplies all the energy that goes into creating the climate, was thought not to be important. Obviously, this attitude is starting to change. This press release is a tacit admission that some now admit some role of the sun in climatology.

I do not have time to talk here of the actual methods to predict sunspot number, which is an important activity in space weather. But take a look at the first picture in the press release and see if you can not anything odd.

July 13, 2008 | No comments

Actual footage

UPDATE: Christian Toto, over at Pajama’s Media, has seen the HBO Generation Kill and says “The new HBO miniseries on Iraq is well-executed, but its anti-war bias is clear.” Make sure to also read the comments.

This tip in from Kyle Smith, from today’s New York Post. Since the subject came up yesterday about fictional accounts of military action, we have here, at, hundreds of actual scenes filmed by the soldiers themselves. Smith’s story is called Wartube.

Some examples. One:


I had no idea of this site before today. But I would imagine that whatever Hollywood offers, no matter how “gritty and realistic”, cannot compare to the actual real reality as delivered directly by the soldiers. Of course, the soldiers’ own story suffers only one flaw when compared to fictionalized accounts: no slow motion (joking, just joking).