Here my friends is the whole of probability and uncertainty, in highlight form. A quick overview of where we have been, what we’re doing, and where we’re going. All can watch this video or read the lecture. Not just those in the Class.
Uncertainty & Probability Theory: The Logic of Science
Video
Links: YouTube * Twitter – X * Rumble * Bitchute * Class Page * Jaynes Book * Uncertainty
HOMEWORK: You must look up and discover a Research Shows paper and see if it conforms to the conditions given below. Have you done it yet?
Lecture
Although we are doing the Class, in which I specify from first principles, in complete and nauseating but necessary detail, the full philosophy of probability, I thought it well to explain the theory of probability, statistics, AI, machine learning, and uncertainty of any kind in as brief a form as possible.
We start with something we want to know, a proposition. Which can be mathematical, grammatical, a notion in some way intelligible. May I call this proposition Y? Notation can hinder, as I frequently insist, but here I hope it helps.
Y can be anything: the picture of a cat with a machine gun, the presence of cancer, a temperature, a dice throw, a count of unicorns, where your wife is, how much dark matter is in this region of space, whether this drug lowers your blood pressure, what answer to give a user query, whether you would have got the job had you worn a tie, where an electron is, whether Socrates is mortal. Anything.
Once we have Y, we seek evidence which tries to explain it. To explain means to understand the why of Y. Which for things in world means the causes of Y. And of the conditions which let causes flourish. That is science.
We gather everything we want to consider as an explanation of Y. The ‘everything’ means what it says; it includes the code, grammar, definitions, data, assumptions, premises, guesses, facts tacit, implicit and explicit, anything that is considered to help explain Y. Call all this information X. We then form this object (don’t let the notation frighten you):
Pr(Y|X)
which stands for “What we know about Y given (or assuming) X”. Sometimes we know all about Y, and sometimes we’re not sure. If we don’t know X, and instead assume other information, no matter how different from X, which information we might call W, then we have Pr(Y|W). This is true if the original X is added to over time, or subtracted from. Changing the evidence in any way changes the problem, just like in math changing any number changes the entire equation.
We can prove (and do in the Class) that this relationship Pr(Y|X) can be mathematical, and, that being so, we prove what mathematical form it takes, and, given that, we prove what numbers are associated with this mathematical form. The form is Bayes’ Theorem, and the numbers are between 0 and 1.
In other words, probability explains our knowledge of the relation between Y and X—and only the relation. Probability is thus a measure of uncertainty between propositions, which means it not in the world ontologically but in our mind. Probability is mathematical epistemology. But notice I only said “can be mathematical.” Sometimes probability is not a fixed number. Not everything can be quantified.
Probability, like logic, is again an explanation of the relation between propositions (which can be complex). Probability is therefore the extension, or rather completion, of logic. It is that, and nothing else or more.
Let X = “All men are mortal; Socrates is a man.” And let Y = “Socrates is mortal”. I didn’t have to pick this X nor this Y. I could have picked Y = “Oranges are tart” with this X. But with the original,
Pr(Y|X) = 1,
as logic demands.
Let X = “She is usually at the store this time on Thursdays, and today is Thursday.” And let Y = “My wife is at the store.” Thus
Pr(Y|X) = usually.
We don’t get a number because there is nothing in X that tells us what quantification specifies “usually”. For most matters in real life, this is good enough. In Science, we try to find those quantifications. Sometimes in our ardency and rush to do something we reify them, which leads to over-certainty.
Let X = “This complex physical model, containing decades worth of data, and host of scientific information”. And let Y = “Tomorrow’s high temperature equals 70F”. Then
Pr(Y|X) = p,
where p will be some number in (0,1).
Let X = “All I know of the company and the type of people they hire for this kind of job, and my performance at the interview, only imagine if I wore a tie, which I forgot”. And let Y = “I get the job”. Then
Pr(Y|X) = high,
where “high” might even be a number if that X contained information of past results. Notice that this is a counterfactual, an impossible state, yet as in ordinary logic we are able to form a probability of the relation without difficulty.
That is it. That is all of probability. This is everything. It is no more complex than this. Every problem or question of uncertainty fits this scheme. Calculations might be complicated, or even impossible to compute except by approximation, and so a great deal of work goes into figuring how to figure these kinds of answers when X and Y grow fractious. But we never leave the schema, the formula Pr(Y|X).
Often, we do not know all that is needed to know about Y, so we pick things we hope are related to the causes and conditions of Y. For instance, let X = “A set of people who claimed to have drank various amounts over the past year, and some other information on each individual, such as sex and age, and whether each had liver cancer or not, and a model which ties all these together; and this new person who claims to have drank this-and-such, and his cancer status is unknown.” And let Y = “Person has liver cancer.” Then
Pr(Y|X) = p,
where as before p is in (0,1). If p is low, we suspect alcohol is not a strong cause, or any cause at all, in liver cancer. If p is large, we suspect alcohol might be a cause, but not always, in liver cancer. Because there is no proof one way or the other, unless p equals 0 or 1 and there is additional direct information in X about cause, which here there is not. Correlations we notice might be spurious. Knowledge of cause is difficult to come by, and not to be discovered in simple probabilities.
What to do in the face of uncertainty is entirely separate from the uncertainty itself. If the p is high in the liver cancer case, it does not follow that any action is warranted. What is best or good or ethical or remunerative to do is not a question of uncertainty, though there may be uncertainty involved inside decisions. For instance, if Y happens (or it doesn’t) an uncertain loss might occur, which might cost another uncertain cost to protect against it. The uncertainty in these decisions is handled like all uncertainty, but probability never tells you what goes into decisions. It never tells you what to do. It never tells you what Y and what X to pick in any situation. These choices are always up to you. But once a Y and X are picked, logic rigorously gives the relation, which is not subjective.
THE END. That is it. That is all of probability. There are acres of rich mathematical detail and nuance, but that is the gist. Every problem is, or should be, the same in structure, only the details change.
AND YET… Like in anything, mistakes can be made, errors introduced, misunderstandings mistook for truth. The same is true in probability. Although in any intellectual activity, there are an infinite number of ways to go wrong, misunderstandings in probability are found in a handful of main areas. Let’s look at a couple.
TESTING
Let Y be our proposition as before. X is the same, but here it as part of it “data”, i.e. observations relating certain measures in the world through a model to Y. That is, a model is also part of X. We still want
Pr(Y|X),
but testing gives us:
Pr(data in X more extreme than seen in X, according to model| A statement in the model asserting something about Y is false, and the rest of X) = p.
In English (though it is difficult to comprehend, such that nobody ever remembers), given we believe that something in a model which relates the quantifiable parts of X and Y is false, and that the data parts of X were used in making the model, we compute the probability of seeing data we did not see, but might (?) have, that is more extreme than the data we did see.
This is the “p-value.” There is no reason in the world to compute this strange creature. But it is calculated. Often. Worse, we are asked to equate Pr(Y|X) with (one minus) the p-value, but tacitly, implicitly. This is an obvious fallacy. A rife ubiquitous sanctioned fallacy.
For reasons I don’t think anybody fully understands, you cannot talk people out of this fallacy. All acknowledge we want Pr(Y|X), so why not compute what you want instead? Nobody knows.
ESTIMATION
We want, as always,
Pr(Y|X)
but we get
f(model parameters|X).
Here X also contains data and a model. This function makes statements about the internal parts of the model, which are called parameters. Usually a single value of the parameter is “estimated”, a misnomer since the model is not part of reality, and neither are its parameters. Sometimes a form of uncertainty about the estimate is given, called either a confidence of credible interval. In practice these are identical. In theory, the former has a definition people claim to believe, but nobody does.
Whatever the difficulties these matters present, the worst is that if the function of the parameters behaves a certain way, we are asked to believe that Pr(Y|X) is certain, or near certain. It is a similar fallacy to the p-value fallacy, but not quite identical. The fallacy is also implicit, since there is never any calculation of Pr(Y|X), just of f.
In other words, output function f is given, usually in “mean” form, and we are asked to believe the uncertainty in output of the model is uncertainty in Y given X, which is isn’t. Overconfidence thus abounds.
THE GREAT WHITE FUNCTION
The previous errors belong to the formal practice of statistics. The next is found in AI, machine learning and similar fields. AI (to use one word for them all) understands that we want, as always,
Pr(Y|X)
and nothing more. In the X is a much more elaborate model (a function) than in statistics. The hope is that with sufficient assiduous searching of all possible functions, one will be discovered which gives extreme values to Pr(Y|X), i.e. probabilities of 0 or 1—after which causal claims will be made.
We’ve already seen the claims of cause can be spurious. But they will be more likely to be believed when “artificial intelligence ” certifies these false claims.
It’s not that a function cannot be found that gives Pr(Y|X) = 1 or 0. One always exists, which is the trivial function that says everything that happened had to; this function always gives a perfect model fit. This model is (correctly) never accepted, however, since this model gives terrible predictions (except in degenerate situations). So the search goes on for better more predictive f, which is well enough
The real problem, almost unnoticeable today but sure to grow, comes in understanding anything short of the full explanation in X, i.e. all causes and conditions which give rise to Y, leaves one short. With a fixed set of data that are not measures of these causes and all conditions, then no such perfect function from a correlation-only X to Y will be found.
Now this is a seemingly smaller error than the others, and in a philosophical sense it is. But our concern is practical. Once AI brushes aside statistics, as it will, that a “computer” has found what is touted to be the “perfect” function for each new set of data, people will be less and less able to resist believing it. Causal claims will continue to be believed with the same hope they are believed with p-values, but it will be worse because the AI model will also give recommendations—orders, really—about what to do about these claims.
What is right to do is, as said, is outside the purview of probability. But our culture’s faith is scientism, and the more complex the math and code, the greater the belief. A sad fate awaits.
MORE
There are many more errors, mistakes, misconceptions and so on, but we leave these for another time.
Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: \$WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank. BUY ME A COFFEE.