William M. Briggs

Statistician to the Stars!

A Brief Explanation Of Occam’s Razor

6771386125_afc97f3b28_o

We’re interested in whether some proposition Y is true. One explanation, perhaps causal or deterministic, or even only probabilistic, is X. Thus, Pr(Y | X) is high or equals 1.

A second explanation, again causal, deterministic, or only probabilistic, is the joint proposition W = W1 & W2 & … & Wm. Again, Pr(Y | W) is high, maybe even higher than Pr(Y | X), or equals 1.

X is simple, in the sense that the proposition is brief and grammatically of less content than W, which is long and grammatically complex.

As a rule of thumb, Occam’s razor says to prefer X as the better explanation of Y because X is simpler than W. Why does this work? Or rather, does it always work?

Suppose Y = “The table card is an Ace of Clubs.” A fellow shows you three blank cards and one Ace of Clubs. He shuffles them around, showing them to you at various points, lays one down on a table and asks you to identify the card as the Ace of Clubs, which is somewhat objectionable, because the way he has been handling the cards made it appear for certain that the card would be one of those still in the fellow’s hand.

Let X = “Magic”, which is the fellow’s causal explanation of Y. Then Pr (Y | X) = 1.

Now let W = “The fellow first holds the cards in such a way as to disguise their identity, pretending there are three blanks and one Ace, but where there are really three Aces and one blank, and let the delicate handling be such that the thumb of the right hand blocks the Ace mark on the bottom card while allowing you what appears to be a surreptitious peek, building your confidence you know where the so-called long Ace is, and let etc., etc., etc.”

W is very complex, takes much practice, and is most difficult to explain (a good 800 words is necessary). Nevertheless, Pr (Y | W) = 1. Since X is much simpler than W, via Occam’s razor, X is the preferred explanation.

Well, this is absurd, because W is the true explanation (when I do the trick). Occam’s razor has failed. But Occam’s razor was not meant to be more than a rule of thumb. It was never meant to be taken as an authoritative argument.

Occam’s razor starts with this premise: of all the times complex and simple explanations were put forth for a proposition, more of the simple than complex turned out to be true. This start is true upon common observation: nobody disagrees with it. Here’s the finish of Occam: X is a simpler explanation than W, therefore it’s more likely that X is true and W false. That conclusion also follows from the first premise. The conclusion would not follow if it were re-written like this: X is a simpler explanation than W, therefore X is true and W false. This is a fallacy. The “more likely” qualification is what make Occam work.

Why does it work; that is, why is the first premise true? We want to know about Y and have X and W in hand. We all come equipped with more information than just Y, X & W. We also know Z, which itself is a very complex proposition about the way the world works.

In order to believe X, we have to have Pr (X | Z) be high or 1; again, in order to believe W, Pr (W | Z) should be high or 1. Now X is simple, so all we have is Pr (X | Z). But W is complex, and as a rough guide, the following equation approximately holds:

Pr(W|Z) = Pr (W1 & W2 & … & Wm | Z) ~=~ Pr(W1|Z) x Pr( W2|Z) x … x Pr(Wm | Z).

All those probabilities on the right hand side are numbers equal to 1 or less, and when they’re multiplied the result is a number much less than 1—usually. Thus—usually Pr(X|Z) > Pr(W|Z).

Of course, in our example, and using my background knowledge Z Pr(W1|Z) = 1, Pr( W2|Z) = 1, …, Pr(Wm | Z) = 1, so Pr(W|Z) = 1; also, Pr(X|Z) = epsilon (which is any number near 0 but not equal to it; I cannot prove magic wasn’t used). For you, Z is probably something like Z’ = “I know this guy is tricking me but I don’t know how, and there is no way he is using actual magic.” Again, Pr(W|Z’) = 1 (or near enough) and Pr (X|Z’) = 0 (or near enough).

That’s it. That’s all Occam’s razor is. Nothing more than common sense boiled down.

21 Comments

  1. Sure.

    And: Occam’s razor is really just a corolary to a more general statement: the liklihood that the majority will be wrong about any issue question varies exponentially with the complexity of that issue or question in the context of the general social knowledge of the period or culture.

  2. Or cutting through all the alphabet soup, the complex explanation is more likely to include incorrect (low probability for being true) information and assumptions which makes the chances of its conclusion being true smaller than those of the simple explanation. IOW, parsimony generally reduces error better than over-zealousness.

  3. From explorables.com concerning Occam’s Razor:
    “The ‘correct’ interpretation is that entities should not be multiplied needlessly.

    Researchers should avoid ‘stacking’ information to prove a theory if a simpler explanation fits the observations.”

    Sounds like a vote against scientism, doesn’t it?

    Guess I always thought Occam’s Razor was a suggestion (not a law or a rule) that did indeed say to use the simplest explanation with all explanations were nigh unto equal.

  4. Occam’s Butterknife!

  5. I’m not sure if “X” in your example is the most simple given its relative lack of intelligibility. What is “magic” or what does the term “magic” imply? While the propositional complexity of “W” is more complex than “X,” the confused nature of our knowledge of “X” qua “magic” (not qua “X”) might, following Occam’s razor, make our knowledge of “W” more simple (less confused). So, I think one would have to take into account the terms of the proposition when applying the razor.

    My critique of the example withstanding, your main point is absolutely correct. Probabilistic certainty ends when we know the cause of a thing – no matter how complex the cause may be.

  6. For the big jobs I prefer Occam’s chain saw.

  7. Suppose I have two models X and X’ which differ only in that X’ includes Occam’s razor as a part of its rationale. By Occam’s razor, I should reject that model in favor of the simpler one.

  8. Ye Olde Statistician

    June 7, 2016 at 12:56 pm

    When in doubt, consult Ockham.

    In medieval times, writers used razor blades to scrape the ink off a parchment so that something different could be written there. Our word is “eraser” rather than “razor,” but the purpose was to “clear the decks.”

    Ockham famously attempted to radically simplify the scholastic model of cognition by replacing it with a model employing fewer terms.

    The form of the Razor one usually hears — “entia non sunt multiplicanda sine necessitate” — does not actually appear in his writings. What he did write in Summa totia logicae, I.12 was “Frustra fit per plura quod potest fieri per pauciora.” (It is vain to make through many [factors] what could be made through fewer.)

    The reason was not that “the simpler explanation is more likely to be correct.” He told us his reason: when there are too many factors in your model, you won’t understand your own model. If ?=f(X1, X2,…,X7) gives an adequate fit to Y, there is no need to throw in X8. It was an epistemological rule, not an ontological rule. The physical world, he wrote could be as complex as God wished, but our understanding must be of simplified models.

  9. From Wikipedia, an even briefer summary arguably much more informative:

    “William of Ockham, a Franciscan friar who studied logic in the 14th century, first made this principle well known. In Latin it is sometimes called lex parsimoniae, or “the law of briefness”. William of Ockham supposedly wrote it in Latin as:

    “Entia non sunt multiplicanda praeter necessitatem.

    This translates roughly as: ‘More things should not be used than are necessary.’

    “This means that if there are several possible ways that something might have happened, the way that uses the fewest guesses is probably the right one. However, Occam’s razor only applies when the simple explanation and complex explanation both work equally well. If a more complex explanation does a better job than a simpler one, then you should use the complex one.

    “Occam’s razor is a principle, not an actual razor: the word ‘razor’ is a metaphor. OCCAM’s RAZOR GETS RID OF UNNECESSARY EXPLANATIONS just like a razor shaves off extra hair.” [EMPHASIS added]

    That’s from Wikipedia — seems “brief” enough with more than enough info to make the point.

    SPEAKING OF UNNECESSARY EXPLANATIONS, CONSIDER from the [so-called] “brief explanation”:

    “As a rule of thumb … does it always work?”

    Does the audience to which this blog is being addressed, or anybody who’s heard the expression “rule of thumb” really need a mini-treatise explaining that a “rule of thumb” doesn’t always work? Really?? That’s a tad bit more than the verbose hair-splitting over semantics routinely seen here, but still…some things really do go without saying…

    That illustrates what’s really needed: The application of Occam’s Razor TO this blog’s writing — less is better. Or, to put it in a closely related about belaboring the obvious & so forth is the metaphorical gem, “Don’t pole vault over mouse turds.”

  10. Ye Olde Statistician

    June 7, 2016 at 2:20 pm

    You noted the errors in the Wikipedia article, right?

    “We may assume the superiority all else being equal of that demonstration which derives from fewer postulates or hypotheses.”
    — Aristotle, The Posterior Analytics

  11. However, Ken if the dictionary were used there’s probably an even shorter definition than the one you offered.
    I have a quote on my trashed pc somewhere which explains it in a bout two lines.
    I think the point is to show the case where and why it isn’t a hard and fast rule.
    Also an example using probability.

    It works the other way round as well. For someone who doesn’t think with alphabet soup, starting with a thing you already understand in order to illustrate how the language of X, y, etc works is useful when it comes to more complicated examples.

  12. A good example here: http://onhech.blogspot.com/2011/07/occams-razor-and-global-climate-change.html

    This individual takes a simplistic explanation of a complex theory, then ignores all the complexities thereof, then uses Occam’s Razor to justify what was apparently his belief all along. And seems to think that he’s really smart and can bet in the direction of a simplistic version of reality. I see that a lot—use of a tiny portion of a theory and then applying Occam’s Razor to justify an already held belief while it ignores everything that added complexity.

  13. Well, this is absurd, because W is the true explanation (when I do the trick).

    W is the true explanation only when you do the trick. In this case, I don’t see why you would consider Occam’s razor principle.

    So consider that we don’t know the true explanation. You put forth your explanation W. Let me offer an explanation Q =”The three blank cards are rectangular, and the ace of club card is trapezoidal but almost rectangular, which can be easily distinguished by the fellow’s hands but not by eyes.”

    I believe Q is simple, and is a simpler explanation than W. Does this imply it is more likely that Q is true and W is false? No, not according to my brain.

  14. Occam’s razor starts with this premise: of all the times complex and simple explanations were put forth for a proposition, more of the simple than complex turned out to be true. This start is true upon common observation: nobody disagrees with it. Here’s the finish of Occam: X is a simpler explanation than W, therefore it’s more likely that X is true and W is false?

    What common observation make the start TRUE? How is simplicity related to truth and the probability of being true?

    The premise that “X is a simpler explanation than W” doesn’t imply it’s more likely that X is true and W is false. One would need at least to assume that the true explanation is simple; even with this assumption, the justification of simplicity becomes a problem.

  15. Following on from YoS, I suggest that Occam’s razor isn’t even referring to truth, but utility. We should prefer a simpler model over a complex one not because it is necessarily more true but because it’s easier to work with and evaluate.

    Though a consequence of this is that we are more likely to be able to have confidence in a simpler model because it’s easier to test

  16. David L. Hagen

    June 7, 2016 at 6:57 pm

    Thus the benefit of Einstein’s Razor: “Everything Should Be Made as Simple as Possible, But Not Simpler” (per Louis Zukofsky’s 1950 poem.)
    http://quoteinvestigator.com/2011/05/13/einstein-simple/

  17. Can you show me how you “partition” your W into W1 & W2 & … & Wm such that

    Pr(W|Z) = Pr (W1 & W2 & … & Wm | Z) ~=~(approximate) Pr(W1|Z) x Pr( W2|Z) x … x Pr(Wm | Z)

    ?

    Note that, in general, Pr (W1 & W2 & … & Wm | Z) ≠ and cannot be approximated by Pr(W1|Z) x Pr( W2|Z) x … x Pr(Wm | Z). Why?
    P(A & B| C) = P(A|B,C)*P(B|C).

  18. A thoroughly abused term these days.

    JMJ

  19. Sander van der Wal

    June 8, 2016 at 1:46 pm

    Wasn’t the notion that the simplest model was preferred if you had *no* or *almost no* idea which of the reasonable models was best?

  20. swordfishtrombone

    June 8, 2016 at 4:55 pm

    1. Universe.

    2. Universe + God.

  21. swordfishtrombone: There needs to be an hypothesis. What is your hypothesis—that the complex universe described by science is simpler without God? What are you trying to prove? I don’t think Occam’s Razor actually applies to religion versus science since it can give some pretty wierd outcomes. I suppose if this misuse of the Razor is appealing……

Leave a Reply

Your email address will not be published.

*

© 2016 William M. Briggs

Theme by Anders NorenUp ↑