Suppose we have collected data on some measure deemed important to society. Examples are fireman entrance exams, standardized test scores in mathematics, income, IQ, and so forth. Higher measures are considered better. The raw statistics (whether as a whole or by “cuts”; say, age, etc.) indicate that the distribution of, let us call them “scores”, are shifted higher for whites/males than for blacks/women.

This rightwards-shift we can call a “gap.” Now, this being the universe in which we live, something caused this gap to be. This something cannot be “chance.” Chance isn’t a thing and thus cannot be a cause. Neither can “randomness” be a cause, and for the same reason. Instead, some real thing or, more likely, things caused each individual to have the score he did. Thus, the gap between collectives is a necessary outcome of that collective’s individuals, and, speaking exactly, nothing can be said to have caused the gap. Only the scores of the individuals themselves had causes.

It is logically possible that each individual had the same cause for his score. If we want to be fanciful, we can suppose that sunspots (via some mechanism) caused the scores to take the values they did. However, given our experience with these things in our own lives, it is more plausible that each individual’s score was brought about by a different combination of causes, some of which were different and some of which were the same across many or all individuals, but of varying degrees in each.

What can statistics tell us about these causes? Nothing. At least, nothing much and nothing directly.

But that hasn’t stopped people from claiming that the observed gap supports some theory. The two leading theories are that the gap is there because of some innate difference in ability between whites/males and blacks/females, or that the gap is there because of racism/sexism.

Now, innate ability makes sense as a cause: it could be that, all other things equal, whites/males are just better than blacks/females on standardized mathematical tests, say, where by “better” I mean that the probability that any white/male scores higher than any black/female is greater than fifty percent. The “all other things equal” is somewhat problematic, because in that phrase can lie plenty of indirect causes. But consider the analogy that innate ability favors those over six feet being better basketball “dunkers” than those under six feet. Even here we must speak of “all other things equal”, but we have no difficulty seeing that tallness is the major cause of being a better dunker.

But racism/sexism cannot be a direct cause; only actions as a result of these attitudes that can affect the score can be a cause. For example, a person might despise whites/males, but as long as she does not let this emotion influence her actions (along pathways that influence the score), then her racism/sexism is harmless (with respect to the score). The racism/sexism first has to “become active”; for example, by a teacher being more disapproving towards those in the group she disfavors.

We can envision many different ways racism/sexism may become active, each of differing strength, while the number of reasons for innate differences appear smaller (say, certain genetic combinations). Whether or not this is true, the data in front of us say nothing about it.

Again, we have collected data which shows a “gap”. Both theories, innate differences and racism/sexism, say that we should see a gap. Thus, the gap we see confirms both theories—as it would confirm any theory which predicted the gap. For example, the theory of cultural differences also predicts gaps. Even stronger, the data contain no information on which theory is confirmed to a greater degree.

This is important because the matter has long been political. The most common, even de facto, belief is that any gap is prima facie evidence of racism/sexism. Indeed, the burden of proof is on the organization that awards the scores to show that it, the organization, is not racist/sexist. Yet the observed gap could have also been caused by innate or cultural differences (or something else).

It is practically impossible that any organization can prove its innocence. This is because of what we noted above: the number of ways racism/sexism could have influenced an individual are legion. Further, these ways are often undefined, or if defined, they are unquantified. And even if the ways were defined and quantified, it is extraordinarily unlikely that these quantifications can be had for any individual (since racism/sexism operates over a long period of time and in many instances). Even worse, the organization’s inability to exculpate itself will be (incorrectly) taken as further evidence of racism/sexism.

Perhaps worst of all, if it is thought the innate difference theory is false, yet it is at least partly causative, then it will be impossible—not just unlikely, but impossible—to eliminate “racism/sexism”, because there will always be gaps caused by innate differences, and these will be ascribed falsely to racism/sexism.

Even though they are often used for the purpose, statistics cannot say which of many competing theories caused a gap in some socially important measure. At least, not without information that is external to the scores themselves. The best form of external evidence would be a controlled trial, in which all possible avenues of racism/sexism/cultural differences among individuals has been controlled (or eliminated). All experience of human nature argues that such a trial will never happen.


  1. I frequently read news articles that say that population A has a higher incedence of X than population B. After adjusting for income, education, and geography, the gap persists.

    It seems to me that there is a heck of a lot of information that gets burried in these adjustments. Any insights?

  2. Interesting and another blow against crude empiricism, non-operational variables and unspecified models.
    20 years ago, I recall doing some survey work for a large company who was concerned that it was losing its high potential females because they declined to move. We had some pretty extensive biographical and survey data and a large sample. Sure enough, women high potentials were much more likely to leave than male high potentials. I was frustrated by this finding since it appeared that the research did not allow for any possible means for addressing this issue. Then I had a thought – what would make anyone decline a career enhancing move and consider leaving a company? Notice, I said “anyone” as in male or female – since gender in itself seems an unlikely operational factor in this decision. Well it is pitifully obvious. If my partner is earning far more than me, then I am likely to make decisions based on my partner’s career prospects and vice versa. We created spousal income as a variable and wouldn’t you know we got a better fit (and for obvious and perhaps spurious reasons gender was no longer a relevant variable) – plus we could now recommend programs for ensuring that the spouses of high potentials – male or female – received proactive help in securing new employment.

    The bottom-line is that the nominal groups and categories like gender and race that reveal large gaps in some variable of interest, do not necessarily have explanatory value and, moreover, may mask the variables that do.

  3. Bernie,

    Havard Business School made the observation that a high percentage of its female graduates were out of the labor force 10 years after graduation. Their explanation was that they were married to HBS grads. HBS grads make gobs of money, and one income was sufficient.

  4. Doug M
    I am not surprised, but gender may not be the key explanatory variable. For example, if the focus was on the assumption of roles based on overall career/income prospects, this could explain why male or female HBS grads might leave the job market. After all, female HBS grads can make gobs and gobs of money.

  5. At the risk of being accused of stereotyping I can think of a possible explanation. When I was a young man between the ages of 8 and 12 I read almost every book in the local branch of my city library. I continued this habit at the main branch but the sheer number of books prevented my from reading them all. As result I did very well in school and can usually ace a test. But I was less successful at basketball and other sports. I had good friends who spent thier time playing sports but they struggled in school and often failed tests. Don’t misunderstand, in later years I played sports and enjoyed it immensely and I made sure my own children participated in sports. The point I am trying to make is there are people and groups who value sprots and do not value education. Given our public schools systems desire to graduate anyone who stays in the system without regard to having learned anything, it is not suprising that many seemingly qualified people might be unable to pass simple tests.

  6. Briggs:

    “All experience of human nature argues that such a trial will never happen.”

    Nor would be such a trial desired – or allowed. That the gap exists is convenient for “the narrative” supporters; any effort to quantify the factors involved would be either turned down or ignored. We have our story. We don’t need any facts.

    What do we want to do? Destroy another industry? Throw thousands more out of work? There are too many professional gender and race-advocates earning a living now, making frivolous accusations against others, to upset the societal apple cart now with actual facts. Go away.

  7. What would be even worse is that if the innate difference theory is true, just mentioning it would be considered racist/sexist.

    See Larry Summers.

  8. mt:
    While I agree that certain classes of variables are viewed as taboo – Murray’s Bell Curve, for instance – the bigger issue raised by Matt is that we blithely report and act on supposed research findings that are not really indicative of anything because the variables are so poorly defined/measured. In many instances, researchers operate like drunks looking for their car keys under the streetlights because that is where they can see.
    A good example is the recent piece in the WSJ on child-rearing behaviors of asian/chinese parents: It is hard to imagine that this kind of parental involvement, coaching and motivation would not have an impact on SAT scores and vice versa! Note that the operational variables of interest are ethnicity but involvement, coaching and motivation which translate into preparation and effort.

  9. The solution would seem to be, to stop measuring things.

    Mankind got a long way without measuring gaps or -isms.

  10. Of course, it’s not just done in the social sciences. See, for example, the
    Spencer / Desler debate regarding clouds and temperature.


    “I’m sorry, but finding some statistical relationship with a near-zero correlation in BOTH the satellite data AND in the climate model behavior is (in my opinion) nowhere near proving that climate models are useful for long-term predictions of the climate system.”

  11. Re a test in which “…all possible avenues of racism/sexism/cultural differences among individuals has been controlled (or eliminated).”

    I’m probably just being dense:

    – Many of these tests, including specifically, the firefighter’s test of recent memory, and several IQ tests, have been created with explicit input and approval from people of the races/sex/cultures that are ‘disproportionately affected’, precisely to eliminate in advance and to their satisfaction, any racism/sexism/cultural bias in the test. Yet when individuals take these tests, their scores still cluster to reveal race/sex differences, not only in the same direction but to the same degree as other similar tests.

    – IQ tests in particular have been subjected to the scientific rough-and-tumble of these and similar objections for decades, and over and over have successfully been able to account for these objections, strongly and repeatedly enough so that working researchers in psychometrics judge these type of objections to have been refuted. Which is to say, when the same old stuff that’s been conscientiously considered and debunked over and over gets thrown up against the wall yet again, it’s not making an Argument From Authority to judge that it won’t stick this time, either.

    – Strongly g-correlated tests such as the Armed Forces Qualification Test (AFQT) have repeatedly been shown by the Armed Services to correlate with real-world military performance in categories from tank driving to aircraft flying, irrespective of race.

    – Moreover, researcher Ronald Ferguson, an African American who believes that poor schooling or other external factors cause the difference, (I used to think that, too, but trans-national studies and other evidence have caused me to consider that my former thinking was really wishful thinking) nonetheless has shown that the black/white difference in earning power virtually disappears when adjusted for AFQT score.

    – In other words, Dr. Ferguson has shown that in our real US of A right now, there might be little or no economic racism remaining — racism might cause little to none of the black/white income difference — because differences that are proxied by an AFQT score might be causing the bulk of it.

    As Casey Stengel used to say, you could look it up.

    As a devout Catholic, I know with all my heart that every person’s dignity is given directly by God immediately and irrevocably, and is irreducible by any act of Man. And constitutionally, I’m not much up for the Clever, Sensitive People reigning beneficently over the rest of us. If our dignity depends on the actions, thinking, and moral soundness of the Clever, Sensitive People, I think we’re sunk.

    But facts is facts. So, knowing what I’ve just laid out above, my only defense when I ask the question is that I’m simply being dense: What more can there be out there to do, in order to control or eliminate “…all possible avenues of racism/sexism/cultural differences among individuals” on current good tests?

