Statistics

Manzi: What Social Science Does—and Doesn’t—Know

This article is nothing but an extended link to a must-read piece in City Journal. Internet still once daily. Thanks to reader I. for suggesting this topic.

If you haven’t already, you must read Jim Manzi’s City Journal article, What Social Science Does—and Doesn’t—Know: Our scientific ignorance of the human condition remains profound.

This man is my brother. Except for the common mistake in describing the powers of “randomization”, Manzi expounds a view with which regular readers will be in agreement:

[I]t is certain that we do not have anything remotely approaching a scientific understanding of human society. And the methods of experimental social science are not close to providing one within the foreseeable future.

His article will—I hope—dampen the spirits of the most ardent sociologist, economist, or clinician. For example, I cannot think of a better way of describing our uncertainty in the outcome of any experiment on humans than this:

In medicine, for example, what we really know from a given clinical trial is that this particular list of patients who received this exact treatment delivered in these specific clinics on these dates by these doctors had these outcomes, as compared with a specific control group. But when we want to use the trial’s results to guide future action, we must generalize them into a reliable predictive rule for as-yet-unseen situations. Even if the experiment was correctly executed, how do we know that our generalization is correct?

Amen and amen.

Manzi did a sort of meta analysis, in which he examined the outcomes of 122 sociologist-driven experiments. Twenty-percent of these had “statistically significant” outcomes; that is, had p-values that were publishable, meaning they were less than the magic 0.05 level.

Only four of the twenty percent were replicated and none provided joy. That is, there were no more magic p-values.

This is the problem with classical statistics: it lets in too much riff raff, wolves in statistically significant clothing. It is far too easy to claim “statistical significance.”

Classical statistics has the idea of “Type I” and “Type II” errors. These names were not chosen for their memorability. Anyway, they have something to do with the decisions you make about the p-values (which I’ll assume you know how to calculate).

Suppose you have a non-publishable p-value, i.e., one that is (dismally) above the acceptable level required by a journal editor. You would then, in the tangled syntax of frequentism, “fail to reject the null hypothesis.” (Never, thanks to Popper and Fisher, will you “accept” it!)

The “null hypothesis” is a statement which equates one or more of the parameters of the probability models of the observable responses in the different groups (For Manzi, an experimental and control group).

Now, you could “fail to reject” the hypothesis that they are equal when you should have rejected it; that is, when they truly are unequal. That’s the “Type II” error. Re-read this small section until that sinks in.

But you could also see a publishable p-value—joy!—“by chance”. That is, merely because of good luck (for your publishing prospects), a small p-value comes trippingly out of your statistical software. This is when you declare “statistical significance.”

However, just because you see a small p-value does not mean that null hypothesis is false. It could be true and yet you incorrectly reject it. When you do, this is a Type I error.

Theory says that these Type I errors should come at you at the rate at which you set the highest allowable p-value, which is everywhere 0.05. That is, on average, 1 in every 20 experiments will be declared a success falsely.

Manzi found that the 122 experiments represented about 40 “program concepts”, and of these, only 22 had more than one trial. And only one of these had repeated success: “nuisance abatement”, i.e. the “broken windows” theory. Which, it must be added, hardly needed experimental proof, its truth being known to anybody who is a parent.

The problem, as I have said, is that statistical “significance” is such a weak criterion of success that practically any experiment can claim it. Statistical software is now so easy to use that only a lazy person cannot find a small p-value somewhere in his data.

The solution is that there is no solution: there will always be uncertainty. But we can do a better job quantifying uncertainty by stating our models in terms of their predictive ability, and not their success in fitting data.

This is echoed in Manzi:

How do we know that our physical theories concerning the wing are true? In the end, not because of equations on blackboards or compelling speeches by famous physicists but because airplanes stay up.

Categories: Statistics

11 replies »

  1. Related to this blog essay are:

    – “Why Most Published Research Findings Are False,” by John P. A. Ioannidis

    – “Most Published Research Findings Are False—But a Little Replication Goes a Long Way,” by Ramal Moonesinghe*, Muin J. Khoury, A. Cecile J. W. Janssens

    – “When Should Potentially False Research Findings Be Considered Acceptable?” by Benjamin Djulbegovic*, Iztok Hozo

    – Editorial from PLoS: “Minimizing Mistakes and Embracing Uncertainty,” by The PLoS Medicine Editors (which addresses the issues raised by Ioannidis, etc.).

    I believe all of the above are on the PLoS website: http://www.plosmedicine.org and/or readily found using a keyword search using the titles & authors (I directly copied & pasted from my electronic copies of the articles into this blog entry).

  2. Just finished the article. Is anyone else a little unnerved by this statement at the end?

    Science may someday allow us to predict human behavior comprehensively and reliably. Until then, we need to keep stumbling forward with trial-and-error learning as best we can.

    It seems to me that being able to predict human behavior comprehensively and reliably would easily lead to many abuses. At some point you have to wonder: is this so different from mind control?

  3. Matt: I am not sure whether you recommended Freedman’s book or I just happened across it. However, I do recommend Wrong: Why experts keep failing us–and how to know when not to trust them It starts with a quick readable review of Ioannidis and goes on from there. Further evidence that, as you say. “too many people are too certain of too many things”, with the additional lemma that “too many experts are even more certain of even more things.”

    As for social sciences, my thesis advisor 35 years ago was Chris Argyris, a very famous organizational psychologist. In the 70s he wrote a little book titled The Applicability of Organizational Sociology. Its real title should be the Non-Applicability of Organizational Sociology. Chris makes the same points as Manzi: Sociologists do not actually test their theories or they test their theories in ways that are essentially unrealistic. In large measure it boils down to the notion that sociologists focus on units of analysis (e.g., family, community, organization, work group, class, etc.) that tend to exclude the primary actor, i.e., the individual, and that the implicit and, more rarely, explicit model of the individual is grossly underspecified. As a consequence any experiment is by definition insufficiently specifed – key variables are ignored. My own pet illustration of this comes from Milgram’s famous authoritarian experiments (or Zimbardo’s grotesque prisoner experiment) where he found that people when placed in the role of teacher with apparent access to a corrective stimulus (electic shocks) acted coercively and with roughly 65% willing to adminsiter 400 volt shocks to the “learner” – played by an actor (i.e., no horses were killed in the making of this movie). It always mystified me why they would not explore and account for those subjects who did not act in a coercive manner. Those individuals who refused from the beginning must represent some critical research factors? Surely this is by far the more interesting way to explore the roots of authoritarianism? But to do so, one must change one’s unit of analysis, the nature of the data you collect and the role of the subject in such experiments.

    Thanks to you and Manzi for triggering these old memories.

  4. As to the prediction of human behaviour .
    .
    The brain has 10^8 neurones . Each neurone is connected to 1000 other neurones .
    A brain state can be defined by N connected neurones ON the remaining OFF . The connections matter .
    How many different brain states are there ?
    You don’t need to calculate , it’s more than atomes in the universe .
    If one assumes , a reasonable assumption , that the brain states causally determine macroscopically observable emergent behaviours (you yawn , you zap the TV on channel 23 , you move the pawn E2-E4 , you kick Al Gore in the nuts etc) then a study of causes of observable behaviours of a GIVEN individual involves handling statistically quasi infinite sets for quasi infinite times .
    .
    Supposing that somehow miraculously you became quasi immortal and come up with a deterministic theory showing some predictive skill that is clearly statistically significant .
    Well having done that for Mr Müller (who was necessarily also quasi immortal) , you have to try to generalise to a population of N quasi immortal persons .
    As the topology of the N brains will be different , you cannot of course generalise the results obtained on Mr Müller . Also as your purpose is no more to explain an isolated brain but N interacting brains , the complexity is no more quasi infinite but quasi infinite power quasi infinite .
    This is a very big number . Of course you will try to reduce this number by grouping neurones in big areas , neglecting things , dramatically reducing the complexity and number of tested interactions etc etc .
    Several billions years later you will discover some partial statistical laws like “If area X is connected to the parietal subarea of Y and if at least 2 of the 12 frumungal conditions are realised then the temporal subarea of Z may be activated with a probability of 80 % and ….”
    what translates “If Bob tries to kick you and you see that Bob is bigger than you , then you will try to run in 80 % of cases .”
    .
    It follows that sociology is not a science and it is not even a good attempt at it .
    It is at best a union of trivial observations (like “people generally prefer to be living rather than dead”) and vague pseudo statistical correlations (like “the criminality rate is likely to be higher among a poor population than among a rich population if both are put in contact”) .
    From my personnal experience I can also come with such a correlation : “People who would like to be a scientist but who have little to no skills in physics and mathematics are likely to try sociology . Or psychology .”
    .
    P.S
    William it seems that your blog has again a problem . I am almost sure that yesterday my comment has been eaten by it .

  5. Tom:
    Your hypothesis
    “People who would like to be a scientist but who have little to no skills in physics and mathematics are likely to try sociology . Or psychology .”

    is certainly worth exploring. But it seems a bit over simple. Sometimes what intrigues the individual, their big question so to speak, drives their choice of discipline, which in turn drives the tools and analytical procedures needed. Once chosen, going back tends to be very difficult. That said, I was surprised and impressed when I heard this week an interview with Brian May, guitarist of the rock band Queen. Evidently at the age of 60 he finished his PhD in astrophysics! His knowledge of physics apparently came into play when he wrote the now famous football anthem, “We will rock you.”
    It is also puzzling that in psychology, for example, mathematical tools are extremely primitive even though the phenomena would suggest that calculus and differential equations are necessary for any realistic model of human behavior.
    Interestingly, while the phenomena you are interested in drives the tools you need to use, one needs to be careful that the tools (or the absence thereof, do not limit the phenomena explored. I recall one very accomplished mathematical economist saying of another economist something to the effect, “he was a fine economist until he fell in love with his mathematics.”

    Interesting piece at WUWT!

  6. I am delighted to find that I have been able to think through a little more that may be of interest to those who wonder whether ‘social’ science is ever able to rise to the level of ‘real’ science.

    Several hours after I referred readers to Steve Sailer’s take on Mr. Manzi’s piece by quoting Mr. Sailer (“the social sciences have come up with a vast amount of knowledge that is useful, reliable, and nonobvious, at least to our elites“), it occurred to me that while Mr. Manzi’s critique is spot on, his contrasting of social science to ‘real’ science is a move that some philosophers have called an ‘imagination pump’ – it suggests to our minds more than it can prove.

    Take a typical example, which Mr. Manzi himself uses: given the right conditions, air moving over an airplane wing induces lift and the airplane flies, and we quite literally trust our lives to that science (and engineering). But note that these truths never even remotely imply that every individual air molecule moving over the wing will behave in such a way as to induce lift. Rather, the real science (and engineering) say that a relevant preponderance of air molecules will behave in a way that will predictably induce a quantifiable amount of lift.

    But here notice that this is precisely the lineaments of what Steve Sailer argues about the real findings of social science in the piece I cited previously. These real findings of social science do not even remotely imply that a particular individual will do such-and-such, or meet such-and-such a criteria. But they do say that a relevant preponderance of individuals will predictably create a quantifiable amount of [behavior of a certain sort].

    To say it plainly: a wing predictably inducing a quantifiable amount of lift is a precise analog to the social science results Mr. Sailer (implicitly) points to. This doesn’t remove one ounce of the relevance of Mr. Manzi’s critique. It simply removes the ‘imagination pump’ from his argument, which goes like this: I can (correctly) point to a ton of social ‘science’ that does not give relevant, predictable, quantifiable results. Here are some examples of ‘real’ science that do give relevant, predictable, quantifiable results. [Insert imagination pump here].

    Thus, as we free ourselves from the ‘imagination pump’ tendency of Mr. Manzi’s example, we find that Mr. Sailer has identified scads of social science, already performed, that precisely meet Mr. Manzi’s own conditions for ‘real’ science.

    Of course, Mr. Sailer goes on to complain that (analogously) no matter how often our betters see a (social science) airplane fly — he is too kind to remind us that we ourselves frequently make the same error — our betters remain completely surprised by this phenomenon, may become deeply upset, even infuriated, by it, and will continue to insist that airplanes don’t fly, and will further continue to insist that people who say that airplanes do in fact fly are bad people.

    It does behoove us to be highly cautious about ‘scientific’ results, even when they are really scientific. As our host reminds us continually (and I hope he is paid handsomely to do this in real life, and listened to), it is appallingly easy to over-generalize a result, even results we will stake our lives on.

    We forget at our peril that the ‘scientification’ of life has an appallingly long history, among thinkers both great and weak. To stick to the great: for example, centuries ago, Thomas Hobbes (he of “nasty, brutish, and short,” etc.) intended his political philosophy to be a rigorous extension of the Galileo/Newton New Physics into social life. That is, Isaac Asimov had warrant to have made Hobbes out as Hari Seldon’s (intellectual) great-grandad.

    Still, as Mr. Sailer reminds us, we also lose, and risk, a great deal not only when we remain surprised that airplanes fly, and behave as if they don’t, but also when we treat people who say that yes, airplanes do in fact fly, as bad people who should not be allowed to speak in polite circles.

  7. JohnK
    At the risk of not rigorously analyzing your last point but having been or pretended to be a social scientist, there is a world of difference between the science involved in safe human flight and applying findings in social sciences to make things happen that do not already happen of their own accord. In social sciences, to extend the metaphor, I fear we are still at the stage where we make and strap on wings that are similar to those that birds have and jump off of tall places. I would be interested in an example from the social sciences that matches the creation of safe flight.

  8. There’s a saying in sales: “People say “no” because they don’t “know”” — which indicates that part of a sales-person’s job is to educate the prospect, at least enough to make a sale.

    In social sciences it seems that as soon as “we” (at least some of us) seem to think we “know” something we must invariably reach a conclusion and take some particular action — undertaken with blissfull confidence that what we [think we] “know” is sufficient & correct.

    Michael Crichton wrote a nice essay, using an example as analogy, that addresses this tendency & its inherent risks: “Why Politicized Science is Dangerous”: http://www.crichton-official.com/essay-stateoffear-whypoliticizedscienceisdangerous.html

    Unfortunately, that inclination seems to be intrinsic to the “human condition.”

  9. Manzi could make the same complaint about physicists as he does about social scientists: How come those physicists haven’t invented time travel, faster-than-light travel, and anti-gravity?

Leave a Reply

Your email address will not be published. Required fields are marked *