Subjective Versus Objective Bayes (Versus Frequentism): Part IV

Just review and clarification this time, folks. Dirty hard work. But necessary given the confusion from last post. Time to pump some neurons! Next time we—finally!—get to parameters, models, and all that.

(All the stuff in this series is, in a fuller form, in my new upcoming book, which is tentatively called Logical Probability and Statistics—but I’ve only changed the name 342 times, so don’t count on this one sticking.)

Recapitulation

Probability is the measure of uncertainty; matters of certainty and uncertainty speak of our knowledge and how we know what we know or of what we are unsure.
Given a fixed set of premises and conclusion, it follows that conditionally the conclusion (a proposition) is either true, false, or somewhere in between, i.e uncertain. A proposition can be true given one set of premises, false given another, or uncertain given a third.
A proposition is necessarily true when it (a) validly follows a chain of true propositions back to a bedrock set of propositions which are accepted as true because (b) they are axiomatically true, i.e. just plain true. We cannot explain why or how these fundamental propositions are true: we accept they are becaue of their obviousness, they are revealed to us, (Socrates would say we remember them), i.e by faith.
A proposition is contingently true when it validly follows from chain of propositions which are accepted as true. If these accepted propositions (and their sires, grandsires, etc. if present) are themselves true then the proposition at hand is necessarily true as above, and it is misleading to say it is contingently true. It can be that a proposition is accepted as contingently true because it is not known to be necessarily true.
An objectivist takes an argument as it is and neither adds to or subtracts from it. The truth, falsity, or in-betweenness of the conclusion follow only from the evidence stipulated.
An subjectivist takes an argument and adds to or subtracts from it; either from the premises or in modifying the conclusion. This is acceptable only if these modifications are manifest. They usually are not. Logic and probability do not guaurantee an absence of confusion.
A frequentist often acts like a subjectivist unaware of her subjectivism; but even if not, she makes other errors. See the previous post for a sample—and only a sample—of these errors.
Probability need not be a number, can be a range, or can be unique value. A probability of 1 implies truth, of 0 falsity. There is no such thing as probability; it is not a physical thing; neither are numbers.
An easily seen result of probability is that adding a truth to a list of premises does not change the argument. If, for example, a conclusion follows (or doesn’t) from some set of premises, then saying, “Accepting these premises and this truth” is equivalent to saying, “Accepting these premises.”

Clarifications

Given all these, there was confusion last time exactly how evidence in premises allows us to deduce probabilities.
Suppose the proposition (conclusion) “A Q will emerge”. What does knowing only the premise, “I have no idea how things emerge from this process” do for us? That is, what is the objectivist probability the conclusion is true given this and only this premise?

The answer “I don’t know” floats to mind. After all, we admitted ignorance and ignorance says nothing about Q. The logical nothing means no thing, incidentally, and not just a little thing, nor mostly nothing, nor uniformity, nor anything else. The probability doesn’t exist if we know nothing about Q. But this is only so if the premise itself truly has nothing to say about Q. Is there anything that can be deduced from the premise which allows us to uncover occult evidence about Q?

Well, it might be argued that “I have no idea how things emerge from this process” taken in conjunction with the conclusion “A Q will emerge” implies that Q is possible. But this is to act like a subjectivist and to change the premise to “I really don’t know, but since the guy is asking about Q, it seems as if Q is at least possible.” This inference is false because it cheats. We cannot go from “I know nothing about Q” to “Q is possible.” At best, this argument is circular because it takes information in the conclusion and places it is the premise (subjectively). Thus “I don’t know” is the probability; i.e. the probability doesn’t exist.

Switch the premise to “A Q might emerge from this process.” The conclusion is still “A Q will emerge.” The argument is invalid, but we feel on firmer probabilistic ground. The answer which appears first might be, “Since a Q might emerge, but we have no other idea about Q, then the probability is some number between 0 and up to and including 1.”

Is there anything that can be deduced from the premise this time? Yes. Taken one way, from “A Q might emerge from this process” it follows that “A Q might not emerge from this process.” Written tersely, this is “Either a Q will emerge or it won’t,” which is evidently a tautology, a statement which is always true, i.e. it is a truth.

From above we know that adding a truth to a list of premises does not change the argument. And any truth in a list of premises may be swapped for another truth. For example, “It will rain July 4th, 2561 in New York City or it won’t.” Making this substitution and keeping the same fixed conclusion, it does not follow that its probability is the interval (0, 1]. Instead, it is admitting we know nothing about Q, and nothing cannot imply a probability.

But, given the fluidity of English, there is another way we can interpret the premise: “A Q is one of the possibilities of this process.” We can still derive the same tautology from this, but now there is ever-so-slightly more information about Q, and with that we can claim our original answer, which is some number between 0 and up to and including 1. Which still isn’t saying much. All we can infer from this premise is that Q is not impossible. That is why the interval is (0, 1] and not [0, 1]. Small comfort!

Incidentally, we could say that the probability is [0, 1] for the tautological or ignorance premises, but since this interval is everything there is—truth, falsity, and in-betweenness—it really says nothing, which is our answer.

Back to the statistical syllogism. Premise: “There are n states which could emerge from this process, just one of which is called Q, and just one must emerge.” Same conclusion. The statistical syllogism allows us to deduce the probability 1/n from this premise, an answer which confused some who insisted that no probability can be deduced.

If that is so—if no probability whatsoever can be derived from this new premise—then we necessarily are in the logically equivalent situation of the tautological or ignorance premises. We have seen it was only from these (or other logically equivalent propositions) that we can deduce the interval [0,1], which is to say no probability at all.

Is it true that “I know nothing about Q” is logically equivalent to “There are n states which could emerge from this process, just one of which is called Q, and just one must emerge”? Obviously not. We could, of course, deduce the tautology “Q emerges or not”, but that is because this tautology is always true even if there is no process in the universe which produces Qs.

Is “There are n states which could emerge from this process, just one of which is called Q, and just one must emerge” logically equivalent to “A Q is one of the possibilities of this process”? If so, then the answer is the interval (0,1], which says something but very little, and which may have been in the mind of some commenters. Well, at least from our new premise we can derive that “A Q is one of the possibilities of this process”, which was the old premise. But if there is no more information than that in the new premise, then we are done and the critics are right.

But what are we to make of the other words in the new premise? In the old premise, we do not know how many different possibilities exist: the number could be infinite. But in the new premise we know, in addition to that Q can be a state (the old premise), that there are n-1 other states besides Q and that one of these states (Q or another) must emerge. There are only these n-1 states and no other. We have certain evidence which says n different things can happen, that there is a distinction between them (somehow). It is from this other information the statistical syllogism works its magic. Let’s see why.

Notice that we have a “variable” in the premise, which we can replace with actual values. Try n = 1. It is doubtful a critic would object to the statistical syllogism in this case and claim we can say “nothing” about whether a Q will emerge.

Now let n be greater than 1 (and an integer). What of the other n-1 states we know are possibilities? These also have probabilities. Switch the conclusion to “A Q will not emerge.” The statistical syllogism would give (n-1)/n for the probability. Well, if that doesn’t seem true to you, then I can offer no proof, just as I can offer no proof that the probability for “A Q will emerge” is 1/n. Yes, the statistical syllogism is axiomatic (in part).

Technical Mumbo Jumbo

Some people—Jaynes, notably, Stove, Diaconis, others—have thought there was a proof of the statistical syllogism. All these attempts fail. The proofs rest on certain principles, like the “Principle of Indifference”, “Principle of Maximum Entropy”, or “Principle of Symmetry”, etc. etc. All of them reach a point where they make claims like “Pr(State i | Premise) = Pr(State j | Premise), i,j = 1,…,n” and where “Premise” is our new premise. Propositions like this are certainly true, and if you accept them (and have some mathematical training) you can easily see how acceptance leads to the statistical syllogism.

The difficulty comes in accepting “Pr(State i | Premise) = Pr(State j | Premise) etc.” Why should we? Where does this truth come from? Well, ( as I show in this paper) it must be axiomatic. Adding all the various principles of “indifference”, “symmetry” and so on only serve to make the arguments circular.

It is also to act subjectivity because we no longer have “There are n states which could emerge from this process, just one of which is called Q, and just one must emerge” as our premise, but “There are n states which could emerge from this process, just one of which is called Q, and just one must emerge and the principle of indifference” (or another principle). Now that and its brother principles just is to say “It is equally likely that each state should emerge.” Well, if we take as a premise that it is equally likely that each state should emerge, then it necessarily follows each state is equally likely. But the argument is circular. Instead, it is from the statistical syllogism we infer these principles.

We cannot say we have no reason to believe the probability of the conclusion. We have definite reasons, and these are the knowledge that there are n possible states of the process and that one of these is Q and that one of these (and none other) must show. This is a heckofa lot of information to have, and hence sufficient reason to believe the probability.

This is worth emphasizing because phrases like “no reason”, “ignorance”, and the like are often tossed around, especially when it comes to formal models. It is good to see up front what little epistemic value these have or how their use can be misleading.

End result? The objectivist must accept the statistical syllogism. The subjectivist may do what he likes. The frequentist must still sit patiently and wait for an infinite number of trials of this process to complete before telling us the probability.

More examples

Your homework (in the comments) is to give more good examples, or if you’re clever, examples which seem good but which aren’t.

Read Part V (Last).

Follow @mattstat

Terry Oldberg

May 20, 2013, 10:25 am

The statistical syllogism is a first cousin of the problem philosophers call the “problem of induction.” The problem is of how to justify the principles by which correct inferences are discriminated from incorrect ones in building a model. This problem was solved by Ronald Christensen circa 1970.

Christensen’s idea stems from that generalization of the classical logic in which the rule that every proposition has a truth value is replaced by the rule that every proposition has a probability of being true. This produces the “probabilistic logic.” The probabilistic logic extends logic from the classical logic and through the inductive logic.

In the probabilistic logic, it can be shown, an inference has a unique measure. The measure of an inference is its entropy. In view of the existence and uniqueness of the measure of an inference, the problem of induction is solved by an optimization in which that inference is correct which minimizes the entropy or (dependent upon the type of inference) maximizes the entropy under constraints expressing the available information. Maximizing or minimizing the entropy yields a model that expresses all of the available information but no more.

This solution has been tried on many occasions. As expected, optimization of each of its inferences produces a model that excels. Nonetheless, model builders persist in the tradition of selecting the inferences that are made by their models using the intuitive rules of thumb that I call “heuristics.” This practice has the logical shortcoming that on each occasion in which a particular heuristic selects a particular inference as the one correct inference, a different heuristic selects a different inference as the one correct inference. In this way, the method of heuristics violates the law of non-contradiction. Non-contradiction is a principle of logic. Thus, it has come to pass that the vast majority of models used in practical decision making are fundamentally illogical.

12 Comments

DAV

May 20, 2013, 9:32 am

The difficulty comes in accepting â€œPr(State i | Premise) = Pr(State j | Premise) etc.â€ Why should we? Where does this truth come from? Well, ( as I show in this paper) it must be axiomatic.

Well, if the only thing one knows about an n-sided object is the number of sides and that it’s about to be tossed, and if probability is a measure of knowledge level, then, because the level of knowledge is the same for all states, the probability (as knowledge level) has to be equal.

Why can’t you simply say this? Why does it have to be named Principle? Maybe you think you are but it’s hardly explicit. “It must be axiomatic” immediately following the “Why should we?” question is like answering with “because it just is”.
Briggs

May 20, 2013, 9:38 am

DAV,

Nothing about “objects” or “sides” or “tossing” in the premise. The other criticisms and the language for them I discuss in the linked paper; too mathematical for the blog. But if we’re okay with the symmetry of individual constants (name for the math in today’s post, then we don’t need the gritty details).

Problem with axiomatic beliefs is they are all answer “because it just is.” Why? Because it just is! (Self-referential humor.)

Next time it gets more fun!
Jeremy

May 20, 2013, 9:48 am

What makes “There are n states which could emerge from this process, just one of which is called Q, and just one must emerge *and the principle of indifference*” inferior to or even different from “There are n states which could emerge from this process, just one of which is called Q, and just one must emerge *and the principle of statistical syllogism*”?
Briggs

May 20, 2013, 9:52 am

Jeremy,

Good question. The “principle” of the syllogism is derived too, or at least just given a name from the recognition of equally likely states from the given information. Its slight superiority (in name) is that it doesn’t call on physical words to imply “symmetry” or “equal sides” or any of that. I talk more about this in the paper, too.

We could just eliminate it, and maybe that’s best to eliminate another source of confusion. The SS is just the name of the process: it is not used in deriving the process.
Sander van der Wal

May 20, 2013, 9:56 am

So Descartes’ “I think, therefore I am” is a faith in this discussion. And the belief in God is not a faith.
Terry Oldberg

May 20, 2013, 10:25 am

The statistical syllogism is a first cousin of the problem philosophers call the “problem of induction.” The problem is of how to justify the principles by which correct inferences are discriminated from incorrect ones in building a model. This problem was solved by Ronald Christensen circa 1970.

Christensen’s idea stems from that generalization of the classical logic in which the rule that every proposition has a truth value is replaced by the rule that every proposition has a probability of being true. This produces the “probabilistic logic.” The probabilistic logic extends logic from the classical logic and through the inductive logic.

In the probabilistic logic, it can be shown, an inference has a unique measure. The measure of an inference is its entropy. In view of the existence and uniqueness of the measure of an inference, the problem of induction is solved by an optimization in which that inference is correct which minimizes the entropy or (dependent upon the type of inference) maximizes the entropy under constraints expressing the available information. Maximizing or minimizing the entropy yields a model that expresses all of the available information but no more.

This solution has been tried on many occasions. As expected, optimization of each of its inferences produces a model that excels. Nonetheless, model builders persist in the tradition of selecting the inferences that are made by their models using the intuitive rules of thumb that I call “heuristics.” This practice has the logical shortcoming that on each occasion in which a particular heuristic selects a particular inference as the one correct inference, a different heuristic selects a different inference as the one correct inference. In this way, the method of heuristics violates the law of non-contradiction. Non-contradiction is a principle of logic. Thus, it has come to pass that the vast majority of models used in practical decision making are fundamentally illogical.
Briggs

May 20, 2013, 10:42 am

Terry,

Ron’s a good guy, but Donald Williams beat him to it in the 1950s, as did David Stove (early 70s) and a few others, notably ET Jaynes (same and beyond). The “problem” wasn’t so much solved, as shown to have not been a problem in the first place. Where Hume went wrong and why was best explicated by Stove in his two books on the subject, which are must reading.

In future posts, we’ll see that some heuristics are fine, and some not so fine. The maximum entropy principle itself can only come in after an argument is posited, which we’ll also see (and is one of the reasons I say it is a principle which is deduced, not assumed).

But these are mere quibbles.

Incidentally, the statisticians are not well used to these concepts, simply because they’re not part of our regular training. Time to frequentism to be taken out behind the pissoir and quietly put to death.

Oh, for those who don’t know the difference between induction and non-deductive: ordinary statistical inferences are non-deductive; they are not inductive (but sometimes called so). Inductive inferences are of the type (to use Hume’s example) “All the flames I have observed before have been hot. This is a flame before me. Therefore this flame will be hot.” Inductive inferences are not valid but they are probable, either, in the sense that the answer is like a dice toss. We take the essence of flames and say all will be hot. Etc.
Doug Ransom

May 20, 2013, 1:23 pm

When you launch the ebook I’d like to buy on Kobo which is popular in Canada.
Andrew Kennett

May 20, 2013, 5:47 pm

OT sorry I know this is OT but it is a topic much discussed here abouts — Same Sex Marriage. Our host’s fellow (to use an Aussie expression) god-botherer and former opposer to SSM, ex-Aussie Prime Minister Kevin Rudd has had his Saul on the Road to Damascus moment on SSM, see his blog for Monday May 20 at:
http://www.kevinruddmp.com/
Jonathan D

May 20, 2013, 8:17 pm

Very well written, Briggs.

Personally, I think I am in a similar boat to Jeremy, in that I (still) don’t think there is a real difference between your argument, and what I would mean by a principle of symmetry. To me the word ‘symmetry’ describes the derivation of the statistical syllogism, and doesn’t need to refer to anything physical.

@Andrew, I’m not sure Briggs would ever have been too keen on Rudd.
JH

May 21, 2013, 1:41 am

A frequentist often acts like a subjectivist unaware of her subjectivism; but even if not, she makes other errors. See the previous post for a sampleâ€”and only a sampleâ€”of these errors.
Pingback: Subjective Versus Objective Bayes (Versus Frequentism): Part Final: Parameters! | William M. Briggs

Subjective Versus Objective Bayes (Versus Frequentism): Part IV

Related

12 Comments

Leave a Reply

Share this:

Related

12 Comments

Leave a Reply