William M. Briggs

Statistician to the Stars!

Wired’s theory: the end of theory

Chris Anderson, over at Wired magazine, has written an article called The End of Theory: The Data Deluge Makes the Scientific Method Obsolete.

Anderson, whose thesis is that we no longer need to think because computers filled with petabytes of data will do that for us, doesn’t appear to be arguing serious—he’s merely jerking people’s chains to see if he can get a rise out of them. It worked in my case.

Most of the paper was written, I am supposing, with the assistance of Google’s PR department. For example:

Google’s founding philosophy is that we don’t know why this page is better than that one: If the statistics of incoming links say it is, that’s good enough. No semantic or causal analysis is required.

He also quotes Peter Norvig, Google’s research director, who said, “All models are wrong, and increasingly you can succeed without them.”

Lastly,

The scientific method is built around testable hypotheses….The models are then tested, and experiments confirm or falsify theoretical models of how the world works…But faced with massive data, this approach to science ? hypothesize, model, test ? is becoming obsolete.

Part of what is wrong with this argument is a simple misconception of what the word “model” means. Google’s use of page links as indicators of popularity is a model. Somebody thought of it, tested it, found it made reasonable predictions (as judged by us visitors who repeatedly return to Google because we find its link suggestions useful), and thus became ensconced as the backbone of its rating model. It did not spring into existence simply by collecting a massive amount of data. A human still had to interact with that data and make sense of it.

Norvig’s statement, which is false, is typical of the sort of hyperbole commonly found among computer scientists. Whatever they are currently working on is just what is needed to save the world. For example, probability theory was relabeled “fuzzy logic” when computer scientists discovered that some things are more certain than others, and nonlinear regression were re-cast as mysterious “neural networks,” which aren’t merely “fit” with data, as happens in statistical models, instead they learn (cue the spooky music).

I will admit, though, that their marketing department is the best among the sciences. “Fuzzy logic” is absolutely a cool sounding name which beats the hell out of anything other fields have come up with. But maybe they do too well because computer scientists often fall into the trap of believing their own press. They seem to believe, along with most civilians, that because a prediction is made by a computer it is somehow better than if some guy made it. They are always forgetting that some guy had to first tell the computer what to say.

Telling the computer what to say, my dear readers, is called—drum roll—modeling. In other words, you cannot mix together data to find unknown relationships without creating some sort of scheme or algorithm, which are just fancy names for models.

Very well—there will always be models and some will be useful. But blind reliance on “sophisticated and powerful” algorithms is certain to lead to trouble. This is because these models are based upon classical statistical methods, like correlation (not always linear), where it is easy to show that it becomes certain to find spurious relationships in data as the size of that data grows. It is also true that the number of these false-signals grow at a fast clip. In other words, the more data you have, the easier it becomes to fool yourself.

Modern statistical methods, no matter how clever the algorithm, will not being salvation either. The simple fact is that increasing the size of the data increases the chance of making a mistake. No matter what, then, a human will always have to judge the result, not only in and of itself, but how it fits in with what is known in other areas.

Incidentally, Anderson begins his article with the hackneyed, and false, paraphrase from George Box “All models are wrong, but some are useful.” It is easy to see that this statement is false. If I give you only this evidence: I will throw a die which has six sides, and just one side labeled ‘6’, the probability I see a ‘6’ is 1/6. That probability is a model of the outcome. Further, it is the correct model.

9 Comments

  1. I always thought the doctors had the best marketing.

    For instance, performing a Kaplan-Meier survival analysis sounds a lot sexier than calculating a conditional probability. Probably pays better too.

  2. I just like using the word “stochastic”. It’s great in every-day conversation with those who don’t know what it means.

    Say it aloud a few times. Stochastic. It’s like “fantastic” but inductively better.

    Mmmmmm….Stochastic.

  3. Mostly love this, as it saves me from needing to write the necessary takedown. Remember that Norvig is an AI guy, and so has an investment in the idea that if you don’t know how to be intelligent, you probably didn’t need it anyway.

    You’re off track on fuzzy logic, though — as an old fuzzy logic guy I’ve had this argument a fair number of times — because while both probability logic and fuzzy logic can be viewed as infinite-valued logics on the [0..1] interval, the interpretation is different: in a probability logic, you interpret that characteristic function in terms of membership in some set; in fuzzy logic, you interpret the characteristic function in terms of the match between some statement of a property and the real property involved. So, for example, you might make the statement “houses on Staten Island are white” and assign it some probability p based on a population and a frequency; on the other hand, you can’t state based on that “the house at such and such an address is white” — you can only say what you’d be willing to bet.

    On the other hand, in the fuzzy world, you can look at a titanium-white house with titanium-white trim and say that the value of the characteristic function “This house is white” has a value of 1, but that a titanium white house with blue trim is white with a value x

  4. Briggs

    July 1, 2008 at 3:54 pm

    Charlie,

    Mostly I agree with you, but where I disagree is over the meaning of words again.

    First, I do not hold with the frequency interpretation of probability, but the logical view. And in the view, I can see no difference at all between “fuzzy” logic and probability.

    Martin Gardner has an excellent review article on the supposed differences between the two theories.

    In that paper he says “Michael Arbib suggests that the cult would never have arisen if Zadeh had named his logic ‘set theory with degrees of membership.'” Incidentally, I have often thought the same about the mysticism associated with “quantum mechanics”—if it had instead been called “discrete movement physics”, nobody would have ever thought it had magical powers. Nobody, for instance, would have thought of writing a book called “Discrete movement physics healing.”

    Gardner also says, “he line between fuzzy logic and probabiliy logics is blurry. To assign ‘very tall’ a fuzzy value of n is the same as saying that, in a given culture, a person over six feet will be called tall with a probability of n. Is fuzzy, detractors ask, merely probability theory in disguise?” Good question.

  5. Aha, yu have a length limit. Well, durn. The rest of my comment ws even more incisive and convincing.

  6. he line between fuzzy logic and probabiliy logics is blurry. To assign ?very tall? a fuzzy value of n is the same as saying that, in a given culture, a person over six feet will be called tall with a probability of n.

    I wish the rest of my comment hadn’t fallen off. Anyway, the short answer is that this doesn’t model the concept wel. Consider a sorites: is this man bald? We don’t evaluate the phrase “is he bald” to mean “balder than average”. Similarly, when we compare titanium white with ecru, we don’t make a statement about any probability. Now, if what you mean is tht “fuzzy logic” and “probability logic” are both examples of infinite-valued logics where the characteristic function has the closed interval [0,1] as the range, instead of the set {0,1}, I’d have to agree that they’re (probably, I don’t have a proof to hand) reducible one to the other. But they’re different models. (Which wraps us right around.)

  7. Briggs

    July 1, 2008 at 5:08 pm

    Charlie,

    Thanks for the heads up. I had no idea I had any length limit set up. I’ll check into it.

  8. Markov Chain Monte Carlo evolutionary or genetic algorithms.

    “But faced with massive data, this approach to science ? hypothesize, model, test ? is becoming obsolete.”

    This terrifying sentence is becoming true in the sense that published articles often don’t include reproducible results when developed with huge data sets. And by this don’t mean that the results are dishonest. I mean the results simply cannot be reproduced by anyone with out the big data set required to develop those results. I am increasingly alarmed at how the convenient conventions and short cuts that arise in science are mistaken for the proper practice.

  9. Well, hasn’t this argument gone round in a complete circle. Remember some time back in time (long, long ago, I recall) the The Final Theory of Everything was just around the corner and physics was to be reduced to simply gathering data?

    When was that exactly?

Leave a Reply

Your email address will not be published.

*

© 2017 William M. Briggs

Theme by Anders NorenUp ↑