Objective Bayes Vs. Logical Probability (Vs. Frequentism)

Wait…a Reverend?
Reader and colleague JH has rightly taken me to task (via email) for incorrectly calling, or rather misleadingly labeling, logical probability “objective Bayes.”

She pointed to this set of lecture slides (pdf) as examples of what most people think of when they hear “objective Bayes.” Its basic idea is to use (these are a technical terms) “non-informative”, “references, or “ignorance” “priors” on unobservable parameters to get the math to work out in Bayes’s theorem.

I accept this.

For somebody (me) who’s always carping about precision in technical language, I also admit I earned my spanking for any confusion I have caused by this mix-up.

There is overlap between objective Bayes and logical probability. Both use Bayes’s theorem. But then so do frequentists use it when the mood strikes. Still, I can see now that it is improper (there’s a joke in that word) to call logical probability “LP Bayes” as I have been doing.

Saying “Objective Bayes” puts one in mind of Jeffrey’s priors, sequences of priors, improper, proper, and conjugate priors; information matrices, Markov Chain Monte Carlo and “drawing” observations; invariant measures, Radon Nikodym derivatives, with probability 1s, lemmas, theorems, proofs; unobservable parameters; parameters, parameters, and more parameters, parameters galore; a deeply mathematical subject which sometimes is and sometimes isn’t interpretable.

Of course, some of this stuff is useful to LP. Some isn’t. None of it is wrong mathematically; but then nothing is wrong mathematically with frequentism, either. Logical probability is not a branch of mathematics, though math is useful to it. Objective Bayes, at least academically, is math twenty-four hours a day.

Not that there’s anything wrong with that! Why, some of my best friends are mathematicians. Please don’t sic Anthony Kennedy on me: I have not formed an “improper animus” toward our calculating friends. God undoubtedly made them that way, and who am I to judge?

But I do have plenty of animus toward the reification of the mathematics. Take for example “improper priors.” These are government-defined “probability distributions”, which means probability distributions which aren’t probability distributions but are called probability distributions by those anxious to get on with the math. If you don’t already know, I can tell you that they are needed when jumping the infinity shark. Frequentists are rightly suspicious of them and are the reason some frequentists have not yet come over from the Dark Side.

By a curious coincidence, which ought to make both OBs and Freqs (shorthand is easier, isn’t it?) nervous, use of improper “flat” priors often produces identical mathematical results in both theories. Normal-distribution linear regression is the most prominent example. The interpretations still differ of course—which proves that probability in any flavor is not a mathematical subject.

Advertised by its name, logical probability is a branch of epistemology just as classic logic itself is. The biggest difference between LP and all the others is admitting what everybody already knew to be true: that not all probability is quantifiable and that probability is always a measure of knowledge, even in it mathematical disguise (the “always” carries much weight).

There is an enormous literature in this field that lies undiscovered and unread by nearly all statisticians. The best and most well known work which bridges the gap between logical and mathematical probability is E.T. Jaynes’s Probability Theory: The Logic of Science. But this is not to say that Jaynes’s book is the best on logical probability.

Another classic is A Treatise on Probability by John Maynard Keynes (yes, that Keynes), which opens with this definition:

Part of our knowledge we obtain direct; and part by argument. The Theory of Probability is concerned with that part which we obtain by argument, and it treats of the different degrees in which the results so obtained are conclusive or inconclusive…

The terms certain and probable describe the various degrees of rational belief about a proposition which different amounts of knowledge authorise us to entertain. All propositions are true or false, but the knowledge we have of them depends on our circumstances; and while it is often convenient to speak of propositions as certain or probable, this expresses strictly a relationship in which they stand to a corpus of knowledge, actual or hypothetical, and not a characteristic of the propositions in themselves.

For years, I’ve been trying to get statisticians to read David Stove’s The Rationality of Induction (especially its second half), but so far I have convinced just one Named Person (a prominence who resides in a university and in the blogosphere) to scan it—and he immediately proceeded to misinterpret it (those few statisticians who think about it love the so-called problem of induction; like all pseudo philosophical problems, it’s a guarantor of papers written to be read only by the writers of these kinds of papers).

The shortest introduction in here, here, and here. When you read these posts, understand that I was using objective in the sense of deduced or true, even if you don’t want it to be and not its mathematical sense. I know I also have to clean these up and collect them under one heading. Coming soon!


9 Comments

  1. Cees de Valk

    I have no specific knowledge on this, but it seems to me that artificial intelligence would be the field where probability is applied purely as a form of logic. If only because machines are often uncomfortable with expert elicitation and infinities.

  2. Briggs:

    Thanks for giving us this stimulating discussion topic!

    The process by which I like to sort logical concepts into named categories starts with clarification of what one is meant by “logic.” Logic contains the rules by which correct inferences can be distinguished from incorrect ones. In the classical logic of Aristotle, every proposition has a “truth-value.” The set of possible truth-values is {true, false}.

    That each proposition has a truth-value implies that information for a deductive conclusion from an argument is not missing. In practice, information is often missing. This circumstance may be covered via a generalization from the classical logic. Under this generalization, the rule that every proposition has a truth-value is replaced by the rule that every proposition has a probability of being true.
    This replacement creates the “probabilistic logic.” By this argument, I conclude that the adjective “probabilistic” logically applies to the noun “logic.” The adjective “logical” does not, however, apply to the noun “probability.” For this reason, the term “logical probability” is a misnomer.

    One of the arenas in which information is missing is scientific research. This happens to be the arena in which statisticians make their livings. Statisticians build models. Models make inferences. On each occasion in which an inference is made, there is more than one candidate for being made. How should the builder of a model discriminate the one correct inference from the many incorrect inferences?

    Philosophers call this problem the “problem of induction.” It is this problem that has separated the ranks of statisticians into warring camps with some statisticians passionately subscribing to the frequentist doctrine, others to the subjective Bayesian doctrine and so forth.

    The problem of induction proved to be a tough problem to solve. Many philosophers believe the problem to be unsolved to this day. These philosophers believe that the conclusion that is reached when a model builder discriminates the one correct inference from the many incorrect inferences is illogical.

    These philosophers are nearly right, for almost every model builder in modern practice uses an intuitive rule of thumb called a “heuristic” in discriminating the one correct inference from the many incorrect ones. In each case in which a particular heuristic selects a particular inference as the one correct inference, a different inference selects a different inference as the one correct inference. In this way the method of heuristics violates the law of non-contradiction. Non-contradiction is, however, a principle of logic. Thus, the arguments of model builders belonging to the various camps are usually illogical. After these people build their models in this illogical way, we publish them under peer review. Professionals such as engineers and physicians then use these models in making decisions on behalf of their clients. Today, while it is unacceptable to publish an illogical argument in a mathematical journal, it is acceptable to publish an illogical argument in a scientific journal.

    An alternative to the method of heuristics did not arise until 1963. In that year, a student in the theoretical physics PhD program of the University of California, Berkeley got an idea that solved the problem. His name was Ronald Christensen.

    In the probabilistic logic, an inference has the unique measure that is called its “entropy.” The entropy of an inference is the missing information for a deductive conclusion per statistical event. The existence and uniqueness of the measure of an inference supports solution of the problem of induction through optimization. Under this optimization, the correct inference is the one that minimizes the entropy or (depending upon the type of inference) maximizes the entropy under constraints expressing the available information.

    By 1980, Christensen and his colleagues had worked the kinks out of this approach and published their work. Most statisticans and philosophers ignored or misunderstood these publications. Among them were those academic statisticians and philosophers who were training the next generation of scientific researchers. By this mechanism, illogic maintained its hold on the selection of the inferences that were made by models. Note that it was not the probabilities that were illogical but rather the doctrines of the various statistical camps that were illogical.

  3. I erred. In paragraph 7 of my recent post, please strike the first instance of the phrase “a different inference” and replace it with “a different heuristic.” I apologize for the inconvenience.

  4. Nullius in Verba

    “I have no specific knowledge on this, but it seems to me that artificial intelligence would be the field where probability is applied purely as a form of logic.”

    It is! In computer science it’s called Bayesian Belief (to distinguish it from Bayesian Probability) and is one of the main ways belief or knowledge is represented in AI software. (Other methods include Dempster-Shafer belief and fuzzy logic.) The Bayesian Belief Network (BBN) is a powerful technique, and fairly intuitive for human experts as well.

    “If only because machines are often uncomfortable with expert elicitation and infinities.”

    Machines are probably more comfortable with infinities than humans are. You program it with the rules for its manipulation, and then to the machine it’s just another number. There’s too much mysticism about infinity. It just follows a different set of rules.

    The IEEE-754 standard for floating point computer arithmetic has had infinities since 1987. It’s just the human programmers who either don’t implement them properly or are nervous of using them.

  5. @Terry Oldberg

    That’s very interesting to a physics/mathematics dilettante like me. But, while you say “the correct inference…” I don’t see how you achieve a better connection to the “true” inference. I have a decent grasp on both thermodynamic and informational entropy and it’s fairly straightforward (intuitively) to see the connection to probability and inference but, in the end, how does it enable us to have a higher degree of confidence in the correctness of our knowledge? It seems like the best that can be said is that there’s no better way to decide between inferences.

  6. DAV

    The IEEE-754 standard for floating point computer arithmetic has had infinities since 1987. It’s just the human programmers who either don’t implement them properly or are nervous of using them.

    Mostly because you can’t do much with them. The are intended to allow indicating undefined or inaccurate results in a return value. The IEEE 754-2008 and infinity is just a special form of NaN with the primary difference being Inf has ordering where NaN does not (NaN won’t even compare equal to itself). Inf is useful for setting boundaries but not much else. The IEEE standard also has a negative zero which is the result of dividing a positive number by -Inf. If you’re getting infinities from your program, you’re likely doing something wrong.

  7. Sander van der Wal

    @Nullius in Verba

    Why would a machine be more comfortable with infinities than humans? Humans will follow the same rules while manipulating infinities as machines. Humans invented these rules after all, and “taught” them to the machines.

    Stating that programmers are nervous using them is not because of the infinity concept itself. It is because what happens when a program describing a physical model (the kind of program that uses IEEE floating point) gets too close to infinity, the computer model will not be a good representation of the symbolic model anymore.

  8. Nullius in Verba

    “Mostly because you can’t do much with them. The are intended to allow indicating undefined or inaccurate results in a return value.”

    On the contrary. You can do quite a lot with them, if you design your algorithm that way. NaNs are intended to indicate undefined results, and representing inaccurate results requires interval arithmetic or something equivalent. The purpose of an explicit representation for infinity it to avoid the need to have checks or exception handlers for calculations yielding infinity, and to allow a calculation to continue and yield a result when the infinities are harmless or correct.

    For example, if you calculate atan(y/x) when x = 0 and y = 1, you get atan(1/0) = atan(+infinity) = Pi/2. That’s perfectly legal and correct, and saves a lot of inefficient mucking about with special cases.

    “Why would a machine be more comfortable with infinities than humans? Humans will follow the same rules while manipulating infinities as machines. Humans invented these rules after all, and “taught” them to the machines.”

    Mathematicians taught them, and are comfortable with infinities. Most professional maths software (Matlab, Mathematica, Maple, R, etc.) will calculate with and output infinity quite happily.

    But a lot of programmers are not mathematicians, and tend to follow the common intuition of assuming that it is an error, primarily because a lot of the less sophisticated programming languages they learned programming with would stop if they yield an infinity. And comparatively few actually know what the rules for them are, which can lead to coding errors.

    Humans don’t just follow rules blindly – they want/need to know what the numbers mean. That’s where the problems arise.

Leave a Reply

Your email address will not be published. Required fields are marked *