Uncertainty & Probability Theory: The Logic of Science
Video
Links:
Bitchute (often a day or so behind, for whatever reason)
HOMEWORK: Q = “The Metalunan interocitor (a die) must take one of 6 states when tossed: s_1 = 1, s_2 = 2, and so on.” P = “It takes state s_6 = 6”. What is Pr(P | Q)? And then what is Pr(P | Q & “fair die”)? That is, we add “fair die” to the Q we accept as true. We always accept Q is true. Always as in always.
NO CLASS NEXT MONDAY. CLASS RESUMES 15 JULY.
Lecture
Last week’s homework was to imagine a psychic guessing cards. She receives a chocolate when right and nothing when wrong. In her first 10 guesses (out of 52), she got 4 right and 6 wrong. Given that as our Q, what is Pr(she guesses next card right | Q)?
Now there was no way (unless you were a trained colleague) you could have got the answer for optimal guessing. The math is beyond that which we have done so far. I was interested instead in your reasoning.
One person nailed it. Nate (his answer is here). My heart soared like a hawk. A teacher is always delighted when a student understands. Like I said in the lecture, when you’re confronted with a difficult question, do not try to jump to the right answer. Try to first solve a simpler but equivalent problem. If your solution works on the simpler, then expand it to the harder. This is, after all, how many mathematicians work out theorems.
So here’s a simpler problem. The deck has 2 cards, a king and queen. You hold one up and think it, and I try to receive the impression using ESP. I guess King. Alas, you say I am wrong. Given that, what is the chance, using an optimal guessing strategy, I get the next card right? Notice the implicit premise—these exist by the score in real problems, which everybody forgets!—that you tell me the truth.
Anyway, the probability is 1; it is certain. I remember I was wrong. If the first card was not King, then it must have been Queen, and so the second must be King. If I remembered that, then optimally I will guess King—and cannot be wrong.
But, if I instead was an NPR listener, not paying too strict attention, and forgetting what I was about, remembering only there was King and Queen, then the chance I get the second is 1/2.
Two different probabilities because two different Q. Change the evidence, change the probability. That is the one real lesson in this entire class. Never forget it.
There are more Q possible in this problem. Instead of telling the psychic right or wrong, we could show her the card after each guess (removing the implicit premise of the sender telling the truth). That provides even more feedback. And, if it is used optimally, the chance of each successive guess being correct, grows much higher than the naive, no feedback case, where the cards are gone through and no indication of right or wrong until the end.
The math has been worked out in a nice paper by Persi Diaconis (I give him a shout out in the video), Ron Graham, and Sam Spiro, based off work Persi did back when ESP was a hot topic in the 1970s. “Guessing about Guessing: PracticalStrategies for Card Guessing with Feedback“. (If you’ve studied any analysis, you’ll get it.) This has applications everywhere, including clinical trials.
Forgetting there was feedback, and could be used, led some early psychic researchers to over-estimate the ability of their subjects. Because, as should now be clear, you can easily get more right guesses using feedback, even if you’re not psychic.
Now, at long last, but slowly, we begin to assign more numbers beside 0 and 1 to probabilities! We start with the proportional syllogism.
This is an excerpt from Chapter 4 of Uncertainty.All the references have been removed. If you see them because I forgot to remove them, the dollar signs around certain entries indicate, to LaTeX, that math is happening.
he Proportional Syllogism
Given Q = “An Metalunan interocitor must be in one of n states, s_1, s_2, …, s_n and S is a interocitor” then the probability Q = “S is in state s_j” equals 1/n (a tacit premise here and throughout is that n is finite; happily, in real life n always is). This value arises from the proportional or statistical syllogism. Carl Hempel originated the idea in statistics, and it is stressed in philosophy in the works of Stove, Williams, Groarke, and Franklin. The proportional syllogism arises in the natural way from recognizing the equality of probabilities in equations like this, called the symmetry of logical constants:
Pr(S is in state s_j|Q) = Pr(S is in state s_k|Q); j,k = 1,2,…,n.
There is no proof of this: its truth is intuitive, i.e. provided by induction. Now “intuitive” does not mean every person can be made to understand it (or any proposition!), only that some can. Notice carefully that there is no evidence whatsoever about the interocitor except that it can take certain states; especially lacking is any evidence about the symmetry of its workings. There may or may not be any; we have no clue. All we have is that the interocitor must take a state with one of n labels (Q insists on this). We have no idea how any of these states arise. We cannot argue from physical symmetry or indeed use any knowledge from physics or engineering; though some try. Except for the assurance each of the n states are possibilities, the workings of the machine are a complete mystery to us and, in fact, can be no better than imaginary because, I shouldn’t have to add, there are in reality no Metalunans thus there are no Metalunan interocitors. This is of zero importance, however, because logic is not concerned with reality but with the relations between propositions. Recall the French-speaking cat example. Given we know the machine has to be in one of n states, and that is all we know, then the probability it is any one of them just is equal.
Interestingly, Persi Diaconis and ET Jaynes attempted proofs of the statistical syllogism which, they thought, avoided the necessity of the symmetry of logical constants, but it will be shown below that the proofs are circular and rely on the symmetry of logical constants after all. David Stove has a proof which also works but which has a quirk, and it will also be shown.
The proportional syllogism is a deduced principle of probability from the equi-probability of logical constants, as in the equation above. Whether in specific instances the results which flow from it match the results given by some other calculation device or principle, such as maximum entropy, is a nice coincidence, but does not obviate it. These principles, whatever nice or desirable properties they have, are not strictly part of probability. Take for example the premise Q = “90% of Martians wear hats” together with “George is a Martian” then the probability P = “George wears a hat” is 90%, which is also so because of the proportional syllogism; but it is constructed differently. To see that, let GM = “George is a Martian”, GH = “George wears a hat”, and S = “Sally”, etc.; then
Pr(GH|Q and GM) = Pr(SH|Q and SM).
Here we have no idea how many Martians there are, except that there are enough to form an even 90%. Unlike in the original equation, the evidence also changes. Note that Pr(GH|Q and GM) ≠ Pr(SH|Q and GM)$ because Sally could be a human or Metalunan. In general, Pr(XH|Q and XM) = Pr(YH|Q and YM) where X and Y are names. The “symmetry”, to use that word loosely, comes in considering that any names (labels) X and Y can be used: there is no information to prefer any name over another, thus the probabilities are equal.
The probability “George wears a hat” given “50% of Martians wear hats and GM” is less than the probability “Sally wears a hat” given “90\% of Martians wear hats and SM”, a fact which also follows from the proportional syllogism. Just as “George wears a hat” has higher probability given the premise “Most Martians wear hats and GM” compared to the premise “Few Martians wear hats and GM.” This example requires the tacit premise that {\it most} is more than {\it few}. Numbers aren’t needed. The probability “George wears a hat” given “X% etc.” is equal to “Sally wears a hat” given “X% etc.”, with only the tacit premise, given our understanding of percentages, that X is somewhere in 0 to 100.
It is a simple principle of logic that if the argument from Q to P is valid, then the argument from “Q and T” to P is also valid, where T is any necessary truth. Intuitively, adding something which cannot possibly be false to Q adds “nothing”; we might say that it doesn’t change how P flows from Q; it is like multiplying a simple algebraic equation by 1. Given “All men are mortal and Socrates is a man” then “Socrates is mortal” is true; and nothing changes if we append to the premise a necessary truth such as “A is A”.
Common necessary truths are logical tautologies. Examples, “Either Mars now is 12 parsecs from Earth or it isn’t,” “If it is sunny then I will go swimming which implies I will not go swimming if it is not sunny” (we take the entire proposition here), “Either unicorns like chocolate or they don’t” (this is true whether unicorns exist or not), and the standby “P or not-P” (which we know is necessarily true based on tacit premises of logic). That last proposition means if Q to P is valid “Q and ‘P or not-P'” to P is also valid. Adding the necessary truth “P or not-P” did not change the argument in any way.
The same conditions hold in probability. If on the argument from Q to P we deduce the probability of P given Q to be some value, then adding a tautology or other necessary truth to Q does not and cannot change this value. Thus
Pr(P|Q) = Pr(P|Q and T).
And this holds even if Pr(P|Q) doesn’t have a numerical value or if it is an interval; it also holds if T = “P or not-P”. A concrete example. As above, given only Q = “A Metalunan interocitor must be in one of n states, s_1, s_2, \dots, s_n and S is a interocitor” then the probability P = “S is in state s_j” equals 1/n. Take the tautology T = “S is in state s_j or it isn’t.” T is necessarily true whether Metalunan interocitors exist or not, and true regardless which state any of them might happen to be in. Adding T to our list of premises thus does nothing to the conclusion that P takes probability 1/n. T, because it is a tautology, does not suddenly change our notion of P, nor does it add any information about it. (NOTE NOT IN BOOK: This T does not even tell us how many states are possible!)
Consider the probability of P = “S is in state s_j” given only T = “S is in state s_j or it isn’t.” Since again T is a tautology it gives no information about P. In particular it is false, though it is often asserted, that Pr(P|T)=1/2. This is more easily believed when any necessary truth can take the place of T. Let T’ = “There are 82 millions ducks in the world or there aren’t.” Then obviously Pr(P|T’) ≠ 1/2, but even stronger
Pr(P|T) = Pr(P|T’);
and this is so even though neither side produces a unique, single number. The closest Pr(P|T) comes to a number is the interval (0,1); note that this does not include the extremes, which is just another way for saying P is contingent. In this sense there is some information in T and T’, or rather the words and grammar of T, T’, or P, which is usually that P is contingent, and which is enough to exclude complete certainty, but that’s all; information in tautologies is thus of (literally) infinitesimal value (when it has any value).
Another version of the tautology is T” = “S might be in state s_j.” Implicit in this is that S might not be in state s_j, thus T = T”. Still another version is T3 =”S might be in state s_j or s_k” with the same tacit inclusion (though it can be argued this version says only these two states are possible; words matter!). Another: T4 = “S might be in state s_1 or s_2 or, …, or s_n”, again with the tacit admission they might not. Note carefully that T4 is not equivalent to Q = “A Metalunan interocitor must be in one of n states, s_1, s_2, …, s_n and S is a interocitor” because T4 never asserts that any interocitor must be in any state, only that it might be in one or another state, or none at all.
It turns out that Metalunan interocitors are actually that race’s version of dice, that n=6, and that each “state” is a side upper most upon tossing. Thus Q really equals “A Metalunan die when tossed must show one of 6 sides, labeled 1 through 6, and S is the side that shows on a toss” then the probability P = “S is 3” equals 1/6. This also follows from the proportional syllogism. Adding tautologies, I hope is now plain, changes nothing.
We finish the die next class!
Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: $WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank.
Who is Matt?
1/6 and 1/6. Adding “fair die” to the already assumed true Q is irrelevant to the question of the states the Metalunan interocitor (die) must take.
So William, Nate said “What if we start like you said, Matt, and break this down to something easier”. Matt?
I get it, but I’m starting to doubt it.
For the most part your results and thinking about Q are the same as mine except that I get there
much more simply and directly by understanding P(E) as describing the information I have about E,
and therefore not about E at all (I know, that’s where Q leads but we
don’t need 3000 years of angels and pin heads to get to that).
However.. I’d previously taken the Monty Hall problem at face value because it’s kind of obvious
that info about some elements in a finite list of choices constitutes knowledge about the remaining
choices. But then you made me think about it and that led to a problem: I don’t think it possible
to write a computer simulation for this that doesn’t beg the question. In effect, if you treat
the monty hall process as unitary, then switching works (p=2/3) – but if you treat it as several disjoint
events in a process, switching achieves nothing. e.g. Monty hides a goat; person one makes a choice;
Monty opens an untenanted door and ushers person one out. Person two comes in, is shown two doors, and asked to guess.. p=1/2
In the math this diff comes about because the second process hides info from person two that was available
to person one – but is that an artifact of the math or of reality? the simulations can’t show this (?) and
I can find nobody who reports a large enough scale real world test.
The same is true in the psychic’s case: randomize a deck of 52; pull ten cards; turn four of those over so
your subject can see them; ask the subject to guess the next card.. 1/48 you can chew on Q to get something
a bit larger (e.g. 1/46.71) but I’d like to try it with maybe a 1000 students before I’ll believe it.
OK, let’s work out P (ability to even remotely answer the homework | hasn’t read $150 stats textbook)
Charlie,
No need to spend the money. I’ve given you free the entire textbook, up to the point we’re at.
So the point of this is that for the purpose of statistics we automatically assume all outcomes are equal?
We’re gonna lose a ton of money in that Metalunan casino!
The reason why this homework question had me completely baffled is that the excerpt from the textbook claimed, with no evidence whatsoever other than a lot of hand waving and appeals to authority, that if there are 6 possibilities then the probability of each must be 1/6.
This is patent nonsense. In my mind, we can draw no conclusion whatsoever about the probability of the 6 different possibilities unless we have more evidence.
But then I realised that I was struggling because this conclusion is completely counter intuitive for me personally.
I would never ever assume that dice must be fair. I’d assume the exact opposite.
When we’re talking about dice or coin flips it sounds incredibly smart to assume equal probabilities. But, for example, human blood groups have 4 possible states and if someone claimed that the possibilities were equally probable they’d look like a complete imbecile when the results came in.
I eventually realised that it makes perfect sense in a statistical environment to assume that probabilities are equal. It gives us a testable theory, whereas my idea wouldn’t give us anything to test because the probability could be anywhere between 0 and 1, which doesn’t get us anywhere.
So I get it. We should assume equiprobability because at least it gives us something to test.
So we assume the probability of rolling a 6 is 1/6, and adding that it’s a fair die adds nothing because we already assumed equiprobability.
Charlie O,
Watch the video, did we?
Charlie,
The problem is that you are viewing probability as an inherent feature of the event. But it is only a reflection of our knowledge. If we only knew that there were four blood types that humans could have AND NOTHING ELSE about them, then the only sensible probability to be assigned is 1/4. But at the same time when making decisions we would be well to not take the probability too seriously, since it came from such little information. For example, if that’s all I knew and someone asked “would you be willing to bet a 1000 dollars with a payout of 4000 dollars that this person has a blood type of A?” I would say no.
The idea is not so much that the probabilities must “actually” be equally distributed, but that if we lack information that says that one event is more likely than another, any assignment of certainty to the events beyond equally distributing the probability would arbitrarily favor one event. It is akin to being asked to estimate a number between 0 and 1, with better rewards given to you the closest you are, and choosing .5. You don’t think that the number is actually .5, but to choose anything else would require arbitrarily assuming that the number is closer to 0 or closer to 1. Of course if we had more information (like “the guy who’s asking me to guess really likes picking high numbers”) then we might pick something else.
If we only assign “real” probabilities to things then no statistical analysis can be done. For example, suppose that I want to know the likelihood of someone in America voting for Biden and I say that the “real” probability of this is b/n where b is the number of people who actually vote for Biden in the next election and n is the number of people in America. Anything else is a 100% wrong answer. Well, then the only way that I can get that number is if I know exactly how the election is going to go down, with no error at all. In that case, why do I need statistical analysis?
The usefulness of probability really comes from helping us keep track of our information. This is what the Monty Hall problem illustrates. If you suddenly came out of a fugue state on the Monty Hall show and only knew that he was asking you to switch or keep the door, and you knew that one option would lead to a prize and the other would not, then your probability for winning on the switch is 50/50. In terms of decisions, there is nothing said that leads you to picking one over the other. If your knowledge includes information about how you got here (like starting with three doors), you can make a more refined analysis which turns your probability into 2/3 for switching and 1/3 for staying. Though note to get THAT probability, you have to take it as equally likely that the prize was in any of the three initial doors. Likely Monty has some method for choosing where the prize is and it may not be equally likely to be in door number 1, 2 or 3. For example if he puts it behind door number 1 10% of the time door number 2 40% of the time and door number 3 50% of the time, then you can get a more refined probability still, ex. if you picked door number 2 and he revealed that it wasn’t in door number 3 then you’d have a 4/5 chance of getting the prize by staying with door number 2 and a 1/5 chance of getting the prize by switching. If we are aware that all we are really doing is quantifying how certain we are of our decision based on the information we have, this is all doable. But if we say it’s the “actual” probability of the event, independent of our information, then we cannot calculate probabilities for even very simple events, since they could depend on factors that we have no idea of and which could drastically alter the probability values.
If there are six states, and we don’t know how the device distributes its results across those states, then there are an infinite number of ways that the results could be distributed. It’s intuitively obvious, however, that those infinite possibilities have an average, which is the evenly-distributed case. This average is the best choice in the absence of other information, given that it is the least-distant from all the other possibilities combined. This is most easy to see with a simpler two-state device—say one that produces a one or a zero. In the extreme cases, it could always produce one or always produce zero. If we choose the 50/50 split, we’re as close to both of those two extremes as we can get. Nothing about this changes as the number of possible states increases.
“Fair die” implies that all outcomes have the same probability, making explicit that which we would have assumed anyway, adding nothing to our Q.
Pr(P|Q) = 1/6 in both cases.
Q = “The Metalunan interocitor (a die) must take one of 6 states when tossed: s_1 = 1, s_2 = 2, and so on.” P = “It takes state s_6 = 6”. What is Pr(P | Q)?
Pr(P|Q)=1/6 by proportional syllogism. As said in lecture adding tautologies change nothing.
And then what is Pr(P | Q & “fair die”)? That is, we add “fair die” to the Q we accept as true. We always accept Q is true. Always as in always.
Same as above. Adding “fair die” changes nothing. In fact it’s a silly premise. What does “fair” even mean? Perfectly symmetric? Impossible! What other factors might impact fairness? Gravity?
Now it’s my turn to make a fool of myself. And I’m enjoying the course, whatever difficulties I find.
We know it’s a six-sided die, with what we know of those things. So, Pr(any specific result | it’s a six-sided die) = 1/6, fair or no. ‘Fair’ tells us that the chance of any possible state is the same as any other possible state, thus the proportional syllogism holds for it, and that nothing else we learn about the die will change that. If we do not know it is fair, then it is fair or not, which does not change the probability of any arbitrary outcome, but might lessen our confidence in any predictions we make until we investigate it’s fairness or bias.
The proportional syllogism doesn’t apply if we see one face is rounded, another weighted, and edges shaved.
But, if we know nothing about the Metalunan Interociter but that it MUST assume one of six states, with no info about those states other than they are labeled 1 thru 6, then I do not see that the proportional syllogism holds any more than it holds for “you either have cancer or you don’t”. There are six states we know nothing about, less than any of us know about cancer in humans, so I’d say that is Pr=(0,1) of any specific state it may assume. Please correct me where I have erred, if I have erred.