The Philosophy of Data

Vladimir Putin demonstrates a new theory

Writes David Brooks (from whom I filched the title):

[T]he rising philosophy of the day…is data-ism. We now have the ability to gather huge amounts of data. This ability seems to carry with it certain cultural assumptions — that everything that can be measured should be measured; that data is a transparent and reliable lens that allows us to filter out emotionalism and ideology; that data will help us do remarkable things — like foretell the future…

What kinds of events are predictable using statistical analysis and what sorts of events are not?

The tacit recognition that not everything can be measured should be savored, even if Brooks himself didn’t see it. That everything can be known if only we’d invent larger telescopes, finer microscopes, tighter and longer questionnaires, or hire more survey takers, put in place more spy cams, or input everything into databases and more assiduously track everything is the central conceit of empiricists; which is to say, of us, for we all subscribe to this false but promising philosophy.

Especially in trying to discern man’s behavior via “scientific instruments”, i.e. questionnaires, we have forgotten or never heard Wittgenstein’s Warning, “Whereof one cannot speak, thereof one must be silent.” Most people don’t know what’s on their own minds.

Whatever we have measured isn’t enough, unless it is for the most simplistic systems on the crudest scales. The more closely we peer into anything, the more layers we uncover; we learn, but relative precision is lost or is forever receding. This recognition does not imply futility, only caution and humility.

Regarding what is predictable, there’s no stopping an academic with a theory. The more “data” which is collected, the easier it will be to find things in it, including those things which support or generate new theories. The larger a dataset grows the closer a theory—almost any theory—can be matched (especially using frequentist methods with its cult of the p-value). Scientists are adept at uncovering supportive evidence, and nearly anything can be said to have been the result of a theory.

Scientists are poor at finding contradictory evidence, however, or when finding it, they are slow to acknowledge it, or when they acknowledge it, they are clever at showing how the uncooperative data is an aberration or the exception which proves the rule.

Though many genuine new things will be learned, more data will lead to more false positives. These are more costly than false negatives. Proof? Well, mankind has lived its entire existence without whatever it is that has not yet been discovered; that is, he survived and flourished; he got to this point in time. So whatever isn’t yet known isn’t necessary for survival. But false beliefs are often deadly to body or mind. Hello, Twentieth Century. Greetings, Nudging Nanny State.

Yes, yes. We may gaze into the heavens and see a rogue wormhole veering our direction, which could only be distracted by applying vast wads of radioactive Knowledge, which we’d better hurry and discover. But barring cataclysmic interventions, the argument stands.

Another item: many theories of human behavior, even though true (more or less), only hold transiently, or for far smaller groups than intimated. For example, Brooks thought science via data-mining learned this:

But as James Pennebaker of the University of Texas notes in his book, “The Secret Life of Pronouns,” when people are feeling confident, they are focused on the task at hand, not on themselves. High status, confident people use fewer “I” words, not more.

Another word for confident is cocky, but let that pass. In our culture, here and now, this finding might be roughly true, but that doesn’t make it so for all humans in all societies at all times. There is no expiration date built into theories.

Because all articles need an ending, Brooks had to have one. He chose this:

We think of John Lennon as the most intellectual of the Beatles, but, in fact, Paul McCartney’s lyrics had more flexible and diverse structures and George Harrison’s were more cognitively complex.

This proves my contention that all theories have short shelf lives. This part of “we” can’t differentiate between any of these three gentlemen, finding their work uniformly horrible and a scourge to mankind.


  1. I routinely hear comments like this: “OK, so you’ve seen some phenomena in numerous realistic situations and experienced it yourself. But has it been confirmed by data from a completely artificial gross oversimplification?”

  2. I read the news today, oh boy.
    Four thousand holes in Blackford-Lancashire.
    And though the holes were rather small,
    they had to count them all.
    Now they know how many holes it takes
    to fill the Albert Hall.

  3. Formal Empiricism is filled with fallacies. This is one of them. Knowledge generation and propagation is much more complicated… and utterly escapes the New York Times.

  4. Noblesse Oblige,

    “This is one of them. Knowledge generation and propagation is much more complicated… and utterly escapes the New York Times.”

    Nonsense, the NYT understands knowledge generation quite well. They frequently generate knowledge from whole cloth. :)

  5. “… finding their work uniformly horrible and a scourge to mankind.”

    Let it go Briggs. (Or should I say let it be?)

    Even you must have some appreciation for this bit of Harrison doggerel:

    “Let me tell you how it will be
    There’s one for you, nineteen for me
    ‘Cause I’m the taxman, yeah, I’m the taxman

    Should five per cent appear too small
    Be thankful I don’t take it all
    ‘Cause I’m the taxman, yeah I’m the taxman

    If you drive a car, I’ll tax the street,
    If you try to sit, I’ll tax your seat.
    If you get too cold I’ll tax the heat,
    If you take a walk, I’ll tax your feet.

    Don’t ask me what I want it for
    If you don’t want to pay some more
    ‘Cause I’m the taxman, yeah, I’m the taxman

    Now my advice for those who die
    Declare the pennies on your eyes
    ‘Cause I’m the taxman, yeah, I’m the taxman
    And you’re working for no one but me.”

    (OK, so there are four “yeahs”.)

    But anyway, your post reminds me of a quote usually (but perhaps erroneously) attributed to Albert Einstein:

    “Not everything that can be counted counts, and not everything that counts can be counted.”