Video
Links:
HOMEWORK: Find a contentious event, some P and the disputed evidence Q, and describe (in words; we haven’t done numbers yet) Pr(PQ|E) = Pr(P|QE)Pr(Q|E). For fun, pick P = “Joe Biden wins the 2024 Presidential election.” Find the Q that has a high Pr(Q|E)
Lecture
Note that the correct formula at the end of the video is: Pr(PQ|E) = Pr(P|QE)Pr(Q|E).
Last week’s homework was to see if anybody could propose, and justify, a Pr(A); that is, an unconditional probability.
As far as I know, nobody tried this. Or I have already convinced everybody. This surprises me since it is one of the most contentious claims in probability. Many believe things, physical objects, have probability. They possess it like a thing has mass or velocity or dimension.
Nobody can saw how, though. Nobody who makes this claim has proposed a way to measure this mysterious physical-probability. Except by recourse to tallying events. Which, as we’ll see, cannot work as justification.
Here is a link to the coin-flip, probability-of-a-head-is-1 video (blog, Substack).
This is an excerpt from Chapter 4 of Uncertainty.All the references have been removed.
Schema
Our job in probability, so to speak, is to figure which side of the equation we’re on. All probabilities fit the schema Pr(P|Q), where P is the proposition of interest and Q are the premises, evidence, data, “for granteds”, or whatever you wish to call them. The Q must always be there; and so must the P. Now it could be that we could deduce a P from a given Q, like we do in syllogisms, but this needn’t be the case. Given Q = “All men are mortal and Socrates is a man” we could deduce P = “Socrates is mortal.” But given that same Q we could equally ask the probability of P = “Socrates would have liked Oldsmobiles.” The choice of P is ours. This P is, of course, silly, but probability, like logic, does not give any guidance from where or whence propositions arise. This point is basic: reflect on it.
Logic and probability speak about the connections between propositions, and not the propositions themselves. This choosing part of probability really is, or partly is, subjective, because the choice of P and Q is sometimes free, and sometimes supplied to us by outside agencies. But once P and Q are fixed—however they are fixed—probability is no longer subjective, and is itself fixed. This will be proved in the next chapter. For now, we need to recognize there are two actions we can take.
The first is in finding the best or “good” Q for a fixed P. Suppose P = “The Tigers win tonight’s game.” This is evidently a contingent proposition, in the sense there is nothing in the universe which would make P necessarily true. After the game is played, we can use the observation Q = “The Tigers won; hooray” and then the probability of P is 1 conditional on this evidence (the event is then also ontologically true). But before the game, what is the best Q? One possible Q is the tautology, “The Tigers will either win or they won’t.” From this, Pr(P|Q) is the unit interval, i.e. $(0,1)$. This is because (see below, and as we learned in Chapter 2 tautologies or any necessary truth cannot be used to deduce contingent propositions. Q contains implicitly information on the contingency of P, but that is all it contains. Another Q might be, “The Tigers won 4 of their last 5”. This gives some idea that P is likely, but we cannot deduce a single, fixed probability from this Q. This non-quantifiability is true of most situations in life.
It is only because of scientism that we believe we can impose numbers on everything. Now it may be that this 4-out-of-5 is the only Q we have or were provided. In that case, we’re done. The best we can say is the probability of P with this Q is (something like) “pretty good” or “fair.” No numerical probability is possible; or, at least, not without tacit premises which insist exactly what “4 out of 5” means regarding future events. Of course, some numerists (as we might call them) jump on examples like this and readily supply premises which become models. More on this later.
It could also be that we have an opportunity, within our means, of searching for additional evidence which is probative of P. This may be the scouting reports on both teams’ pitchers, whether the game is home or away, a longer record of wins and losses, injury reports, the weather, batting records, sports writers’ opinions, the consequence of the game (is it for the playoffs?), and on and on. There is the sense that as Q increases in richness or content the probability of P should head toward 0 or 1, as the case may be.
It is far from clear that in all cases, particularly those dealing with human behavior, that a realistic Q can be found which puts P arbitrarily close to 0 or 1. Man has free will (If you say you don’t, you contradict yourself.) and acts unpredictably. But if we had the experimental set up with the marble and wanted to know P = “The marble will arrive in slot A”, we have a much better sense, as described above, that we can discover a realistic Q (given precise measurements, say) such that the probability of P is very close to 0 or 1. The realistic qualifier is there because we can always and trivially produce a Q which puts the probability of any P at 0 or 1; e.g. Q = “P is impossible” or Q = “P must happen”.
Realistic evidence is that which itself is ontologically true or possible or necessarily true epistemologically (conditioned on still yet other acknowledged premises). Think of realistic evidence as that evidence which you must justify to outsiders dubious about P. For the marble, this includes the initial conditions, equations of motion, the gravitational field, friction, and so on. Given enough time and effort, we have the idea a pretty accurate Q can be discovered. The search for “Qs” for fixed “Ps” is what science is all about. And, as said, there are some P, such as in QM, for which no Q which leads to certainty will ever be found.
In our daily experience, say when we’re flipping coins to solve minor disputes, we don’t have the information necessary to find a Q beyond “There is a two sided coin marked H and T, one side of which must show when flipped.” This is why we say the probability of an H is 1/2. It is with respect to that and only that evidence, but not to the evidence of the physicist who has taken pains to measure the coin flip “experiment” precisely. About the real coin in front of us we know nothing except that it shares the two qualities mentioned in the premises. It shares in the essence of other backyard coin flips. These premises become a model, a subject which is discussed more later.
The second action is to play detective, to discover the P which best accords with the given, fixed Q. Let Q = “A murder was reported at the mansion, in which resided the Duke and his lady, four aristocratic guests of varying and known histories, and twelve servants. The de-monocled body of Lord Wistful was found strangled in the library. The layout of the mansion and grounds are this and such. The train schedule and distance of the station from the mansion are in this table, etc.” In other words, all the standard clues British television detectives are taught to collect at crime scenes.
Now suppose you entertained P = “The President of the United States did the deed (though he probably had a good reason to do so, for he is a member of the party for which I habitually vote).” The president was not a guest of the Duke, and was not known to be within thousands of miles of the mansion. Still, P is contingent. Given this Q, which does not exclude any human being by name, it is possible that P is true, though it is very unlikely (recall we are accepting Q, which would include premises about how anybody could have done the deed). As above, given Q and including the tacit premise about contingency, the probability of “The President did it” is again the open unit interval.
Clearly, there are better suspects; e.g. P = “The butler did it.” Note carefully that Q is not “There are 18 suspects one of which is the murderer and the Butler is one of the suspects.” Just on that, and only that, evidence, the probability of P = “The Butler did it” is 1/18. Even this Q does not limit us, if examined closely; there is a tacit premise that some human being in the world did the deed, but that is it. Q is hardly definite, but it is definite enough for the police to begin their interviews. The police would not restrict themselves to the eighteen inhabitants; somebody unknown to the household could have committed the crime. With this Q, no P allows of a numerical probability, but given the nature of Q, some P will be more probable than others. “The butler did it” is an obvious one, and so is “Mrs Duke did it.” But those propositions flow from tacit premises in Q given evidence of past murders and so forth.
Of course, the additions and culling of both sides of the probability equation are more natural than my stilted prose has it. As the murder investigation continues, Q is amended and the list of P’s tightens. This happens in scientific investigations, too.
So all probability is conditional. My experience is that after introducing this notion, those trained in classical probability and statistics are suspicious and reject the truth that all probability is conditional. The following exercise is usually convincing: try and write down an unconditional probability. Notation is a problem. Many write things like “Pr(P)”, which appears unconditional, but that’s only because the Q is removed off site. It’s always there, however. The danger is reifying the notation so that the conditions Q are thought not to be there because they’re not written. Once you try this, you’ll quickly become convinced. Hint: be careful not to forget the tacit premises of grammar and logic, a negligence which often convinces students they have finally “discovered” an unconditional probability.
Franklin wonders whether there are probabilistic relations between necessary truths. For instance, all the relevant facts, i.e. premises, might be known which support a mathematical theorem, i.e. a proposition of interest such as the Riemann Hypothesis, which, at this writing, remains unproved. It is not known, given the premises mathematicians have today, whether the theorem is proved or not, yet we would think that because all the necessary premises are there, and they really do logically imply truth of the theorem, but we cannot see how, that logical probability might be incomplete. In other words, why isn’t the Riemann Hypothesis given all the premises we have, premises which would presumably be the same once somebody comes through with the formal proof? If logical probability were complete, we should see the probability of the theorem was 1. But there is a flaw in this thinking.
The easy answer is that any set of premises are not only the premises themselves, but include any number of tacit premises, as we have been assuming all along. For instance, consider you come upon a pile of unconnected electronic parts, resistors, capacitors, wire, solder, and so forth, and the proposition “This is a radio” comes to mind. All the parts for the radio are there, so the answer is that the proposition is true, sort of. The parts are there but the tacit premises of how they fit together are missing. Same thing with the mathematical theorem. We might have all the relevant premises, but how they fit together and in what order is not known. After the theorem is officially proved, it later becomes clear that the facts were necessary and sufficient and these are wondered over, yet how they fit together isn’t considered as important. But the form, or the how of all fits together, is also a premise. That is currently missing for the Riemann Hypothesis. This state of mind is natural because talking of the fitting together adds a mental burden when all one wants to convey is the importance of the theorem. Nevertheless, when discussing probability, like in discussing logic, tacit premises should never be forgotten.
Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: $WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know whom to thank.
If P= Newtonian gravity, and
Q= The evidence,
What is Pr(P|Q)?
Briggs ==> Could you add a little boiler-plate bit to each of these Lessons n how to “read out loud” — in words — your notations such as “Pr(PQ|E) = Pr(P|QE)Pr(Q|E).” and “Pr(P|Q)” and “$(0,1)$.” Some of us are exceptionally ignorant…(just asking for a friend…).
Kip,
That is just Bayes. We want, for this example, Pr(P|QE), which is the probability of P = “Biden wins” given Q = “the list of evidence you assume”, and E = “how I justify that list.”
The “$(0,1)$” is a sort of typo. Ignore the dollar signs; they’re for LaTeX, which is how I wrote the book. The “(0,1)” is the standard notation for a unit interval, i.e. all numbers between 0 and 1, but never 0 or 1.
tom,
You tell me. By themselves, you have nothing, because “the evidence” means nothing. You have to specify exactly precisely what this evidence is. Then we find out how it relates to P.
Notation and language is definitely a problem. I’ve had many conversations with statisticians where they insist that a certain probability is non-conditional, but then when asked how they calculated it they will say that they did so via the conditions of the problem.
This happens a lot with p-value calculations. If I ask them whether it is a conditional probability they will say no. It’s Pr(E) (with E being the event that a random variable produces a set of values with a test statistics equal or greater than the observed value of the test statistic.) It’s not Pr(E|X) with X being some other event, so it can’t be a conditional probability. Even if they use the notation (like wikipedia does) of Pr(E|H_0) with H_0 being the null hypothesis, this usually doesn’t count as a conditional probability, because the null hypothesis isn’t an event and furthermore the probability isn’t calculated by the Pr(A|B) = Pr(AB)/Pr(B) formula.
Now obviously a p-value calculation depends on the choice of random variable and test statistic. This is obvious if you calculate it directly from the definition, though sadly most statistics students these days never do that. If you get someone who knows his stuff you will get a clever series of word games. No, the p-value isn’t “conditional” on the model but it “depends” on the choice of model and the model “intervenes” on the probability calculated for the p-value. They will also likely insist that this only applies to true frequentist random variables (so in fact none of the discussion applies to reality.) Most however are not this knowledgeable and will insist that the p-value is a bona fide “unconditional” probability that is a real property of the universe independent of any choices made.
Evidence for Newtonian gravity is everything from falling apples, the discovery of Neptune, to the Moon landings. There seems to be a huge amount of evidence for Newtonian gravity, including every time someone has ever thrown a ball.
I’m not so much interested in the precise value of P, but rather whether it exists or not.
Briggs: “Socrates would have liked Oldsmobiles.”
Professor Briggs, this proposition is true without qualification. As long as the Oldsmobile in question is a 1966 Olds 98 LS, gray, with six-way power seats, power windows and door locks, rocket 454 engine, and expansive interior with room for up to twelve beer drinking teenagers. I know because my father bought me such a car when I was a new driver, probably (there’s that word) because he figured I would be safer in a massive iron warpig while making the inevitable stupid mistakes of the neophyte. That car served me well, as evidenced by the fact I’m still around typing nonsense, and vindicated my father’s judgment, and allows me to claim that Socrates would have liked Oldsmobiles… as long as the Oldsmobile in question is a 1966… well, okay, that’s a condition, a qualification… hmm… I’ll have to think more about this, but don’t think I’m swallowing your hyperventilations uncritically, dude. I mean, c’mon, you know?!
Mr Pickey says:
“Tigers Win” or “Tigers Loose” /= 1
However “Tigers Win” or “Tigers Not Win” = 1
Because possibility of draw.
But liking lectures even if not understanding all 🙂