Richard Feynman wasn’t the first to suggest negative probabilities. That recognition goes to Paul Dirac, who introduced them and negative energies at the same time—because he thought these things explained curious behavior in quantum mechanics, Dirac not knowing of substance (blog/Substack), and because these strange concepts got some equations to work out the way Dirac wanted them to work out.
Getting equations to match Reality is our theme today, and what that matching means and doesn’t mean. Our thesis is that because equations (i.e. models) might match Reality, up to whatever standard, it does not mean that the guts of equations themselves necessarily represent Reality, or are real substances in themselves. This thesis is yet another version of correlation is not causation. Which is often forgotten, especially when you’re really good at math.
I’ll illustrate this thesis using negative probabilities, but it applies to equations of any kind.
Most will only want to read up to More Details, and skip everything after, which is mathematicalities for those who want to go deeper.
No one ever claimed Dirac, who was great at math, was easy reading, so it was left to Feynman, also great at math but who wrote well, to explain the idea of negative probability. He has a paper anybody with standard college mathematics can follow with ease, which you should read. I’ll paraphrase the first part here, stripping it to bare minimum. This paper is extremely useful because Feynman does us the service of making a mistake clearly.
(I’m changing the notation slightly to match that we use in the Class. Don’t forget: we deduced from first principles what probability means; it’s not frequency or bets, but logic.)
There is a roulette wheel was three slots, 1, 2, and 3. The wheel can vary its conditions two ways, A and B (say a man throws a switch to employ magnets in B), such that Pr(1|AE) = 0.3, Pr(2|AE) = 0.6, and Pr(3|AE) = 0.1; and Pr(1|BE) = 0.1, Pr(2|BE) = 0.4, and Pr(3|BE) = 0.5, where “E” is our evidence of this setup, and A and B the conditions.
Turns out the wheel is condition A 70% of the time, and B 30% of the time, evidence which is also part of E. But when you walk up to the wheel, you don’t know whether the wheel is in A or B.
You want to bet. Given all this evidence, what is the probability of slot 1? Easy to figure:
Pr(1|E) = Pr(1|AE)Pr(A|E) + Pr(1|BE)Pr(B|E) = 0.3 x 0.7 + 0.1 x 0.3 = 0.24.
This is the probability to you, with respect to E. Not to the guy flipping the switch, who has different evidence. You can easily figure the probabilities of slots 2 and 3 in the same way. Because probability is not in the wheel, or in anything, except in your mind.
Next thing Feynman does is invoke negative probabilities. Everything stays the same for condition A, but for B, according to Feynman, Pr(1|BE’) = -0.4, Pr(2|BE’) = 1.2, and Pr(3|BE’) = 0.2. That E’ signals our new evidence.
Not only is there negative probability, but there is the curious “1.2”, which of course is larger than 1. It has to be so that Pr(1, 2, or 3|BE’) = 1. What do numbers larger than 1 mean? Feynman really doesn’t say. He cannot say specifically, because they don’t have any meaning. Nor do negative numbers for probability. We don’t need a negative number for Pr(not 1|BE’), because “not 1” is logically equivalent to “2 or 3”, thus Pr(not 1|BE’) = Pr(2|BE’) + Pr(3|BE’).
We conclude that the strange numbers are nothing more than numbers in equations. What is Pr(1|E’) now?
Pr(1|E’) = Pr(1|AE’)Pr(A|E’) + Pr(1|BE’)Pr(B|E’) = 0.3 x 0.7 – 0.4 x 0.3 = 0.09.
Feynman emphasizes that all is well because this probability, of slot 1, is still positive, so it can be measured against Reality (we also get positive numbers for slots 2 and 3). This is his error, made clearly.
To see, let’s try Pr(1|BE”) = -0.8, Pr(2|BE”) = 1.6, and Pr(3|BE”) = 0.2. Pr(1,2, or 3|BE”) =1, as required. But then
Pr(1|E”) = Pr(1|AE”)Pr(A|E”) + Pr(1|BE”)Pr(B|E”) = 0.3 x 0.7 – 0.8 x 0.3 = -0.03.
This number can’t be matched or measured against Reality. But neither could Pr(1|BE”) nor Pr(1|BE’)! Nor could Pr(2|BE”) or Pr(2|BE’). Which Feynman skipped right over in a hurry to do the math of Pr(1|E’).
Meaning, of course, that just because we can get some equations to work out in one aspect, like sensible probabilities for Pr(1|E’), it does not mean that the innards of the equations that got us there represent Reality. The error is that those good at math and models don’t always pause to reflect what the guts mean, not wholly. Or they are too quick to give interpretations to the innards As long as the “main” equation (for Pr(1|E’) here) works, then the models must, we usually think, have something to do with Reality. Yet this need not be so.
The game was given away by no less that Stephen Hawking, here quoted by Haug in another attempt to get people to love negative probabilities because they make the equations work out:
I have done some work recently, on making supergravity renormalizable, by adding higher derivative terms to the action. This apparently induces ghosts, states with negative probability. However, I have found this is an illusion. One can never prepare a system in a state of negative probability. But the presence of ghosts that no one can predict with arbitrary accuracy. If one can accept that, one can live quite happily with ghosts.
Physics have been too quick for a long time to ascribe being to the bits of their equations because the math, more or less, works out.
Think about this as you read the rest of Feynman’s paper, where he has lots more examples of equations that match Reality more or less well, but which have impossibilities, like negative probabilities, for guts. This is not uncommon in quantum mechanics or string theory. The lesson is that each of the bits in a model must themselves be verified, or deduced, against Reality. Mere matching is insufficient.
More Details
People trying to sell negative probabilities appeal to what David Stove called the Columbus Argument (from the old song, “They all laughed at Christopher Columbus…”). People didn’t like negative numbers, at first, and you can’t have negative five apples, but look how useful negative numbers are! All mathematicians accept them! Therefore, negative probabilities must be believed.
I don’t have a proof that such creatures cannot exist. But we do know all appeals to the Columbus Argument necessarily fail. We also cannot point to usefulness of final equations that have NPs in them, as we saw above. To show they exist, we at least need to know exactly what they are, and precisely what larger-than-one (in absolute value) numbers mean as probabilities.
They couldn’t be negative evidence, because that’s the “wrong side of the bar.” That is, if we have Pr(A|E) for some A and E, we already can have Pr(A|E\N), where “E\N” is the evidence E subtracting (logically) the evidence N. That still produces a probability in [0,1], as expected. I made this point on Twitter to somebody touting Mark Burgin’s Theory of Knowledge:
Problem with this is that there is no p(r), but there is p(r|e_i), the probability of e_i accepting the evidence e_i. Could be that p(r|e_1) = 1, and that p(r|e_2) = 0, and that p(r|e_3) = q, where 0 < q < 1. In each of these p(not-r|e_i) = 1 – p(r|e_i), and logic works. The difficulty [above] originates in an equivocation on what “r is not true” and other cases where it is. It can only be known to be not true based on evidence assumed, which changes. Thus negative probability is not needed. And this difficulty in equivocation happens because it is forgotten all probability is conditional, with no exceptions. Spelling out the evidence (e_i) clarifies this, and shows that the real magic in “negative probability” lies in understanding what happens to changing evidence.
The lure of infinity is strong in negative probabilists. Gábor J. Székely invented something he called “half of a coin” to boost NPs. He gave his half coin the probability generating function $\sqrt{\frac{1+z}{2}}$; he assigned this an infinite sum, and proved it converged to 1. Call his evidence G.
Now you can “pull out” the probability mass function from PGFs with differentiation and evaluating at z = 0, and dividing the lot by k!, where k is the k-th derivative. The 0-th derivative is the PGF itself, and so Pr(0|G) = sqrt(1/2) ~ 0.707. The 1-st derivative is (1/4)sqrt(1/2), so Pr(1|G) ~ 0.177.
The 2-rd (dividing by 2!) is -(1/32)(1/2)^(-3/2), so Pr(2|G) ~ -0.0884.
What is this? How do we interpret it? The PGF of half coins does converge at infinity to 1, as required in a probability. But you can see that every even derivative will give a negative probability. Can a “2” of this half coin ever be realized? If you say No, precisely what do you mean by No? You can wave your hands and say it’s a “superposition” or some such thing, but that’s only to give it a label. It does not explain it.
Now Székely has a theorem (from this paper) that says if f is a PGF from a probability mass (density) function (pmf or pdf) that has negative probabilities, then there exist two pdfs g and h such that fg = h. He remarks that for two PGFs x and y, multiplying them, xy, gives the probability mass function for x + y. This is true. But in fg=h we have a PGF multiplying a pdf, so g is not another PGF, to get a convolution of f and g. Which merely means we have a computational aid of the following sort: Pr(A|E) = p + q, where we can let p and q (or functions of them even) roam wherever we like as long as the sum p + q is in [0,1]. However this helps in calculating what we want, we haven’t invented negative probabilities, we’ve just used equations in making presumably harder calculations easier.
Haug (who quoted Hawking) uses NPs in finance models, and these models don’t always work out. To that he says, “Personally I don’t know of any financial model that at some point in time has not had a breakdown—-there’s a reason we call them ‘models’.” Very true. Finance models, much more often than physical models, have nothing to do with Reality and are only correlational. NPs in this context, if they acknowledged as mere crutches, might have some use.
But one mustn’t let them go to one’s head. Else one will end up like Székely, touting not only NPs, but negative variances, too!
Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: \$WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank. BUY ME A COFFEE
My degree is aerospace engineering and my trade is High School level math teaching. Probability makes my eyes glaze. I had to take a class in college, much later in life, in order to qualify for the teaching thing, and I did fine, but only by slogging and memorizing. It never really clicked.
That said, weren’t imaginary numbers invented purely to solve a problem that wasn’t solvable otherwise also? They don’t really exist, you can’t count things with them, but they solve a lot of unsolvable problems.
I don’t understand your argument due to my lack of understanding of probability, but it seems to me that there are plenty of concepts that started as a way to make an equation work but eventually developed into something useful. Maybe we just haven’t gotten to the useful part yet with the ideas laid out.
Interesting stuff though, when my eyes weren’t glazing.
Herosolong echoes my immediate thoughts about negative probability.
We tend to assume certain things to be mathematically impossible, because they contradict intuition. Imaginary numbers are the best example of this. Heck, negative numbers are an example of this (how can one have -1 avocadoes?!).
So my question is whether or not “negative probability” doesn’t describe probability, but instead describes a more abstract concept, just as the square roots of negative numbers, or negative numbers themselves, describe a more abstract concept. You can’t have -1 avocadoes, any more than you can have sqrt(-1) avocadoes. But perhaps a negative probability points to yet another useful abstraction that cannot be represented by counting numbers or positive numbers or probabilities?
I totally agree that a negative probability isn’t a probability, just as counting numbers cannot be negative. But perhaps there is a reasonable abstraction that might make sense, as long as we include the caveat that these are not “probabilities” in the intuitive/normal sense? If they let you do math that leads to correct quantitative predictions, perhaps there is a useful concept here?
Heresolong:
I would look at it this way. There are many things that can be done in mathematics which initially seem to be meaningless but simplify the math and eventually have a deeper meaning when properly understood. Allowing measures like length and area to be negative would be an example of this which is even closer to the current discussion. Initially it might seem absurd that a length could be negative, but allowing it lets you combine many different results without worrying about the ordering of points, and eventually it becomes clear that we are talking about an oriented length (i.e. length together which direction we are travelling down the path. You might think of this as a velocity along a number line, which is how people often interpret negative derivatives in Calculus.)
So certainly we shouldn’t reject strange notions out of hand because it is true that they COULD eventually mean something. But while there are many strange notions that eventually are shown to be meaningful, there are many MORE notions that are just complete nonsense. They may only be well defined on trivial or empty sets, or for some other reason cannot possibly describe anything interesting in reality. But if you are just defining things at random in mathematics you can say whatever you like.
Because of this we should be extremely skeptical about anything that does not seem to make sense. The success stories for such notions has always went like this:
1.) This problem is really tricky to even state, and harder to solve.
2.) Hey, if we define this weird new concept the problem is easier.
3.) It turns out that you can study this concept mathematically, though its real world meaning is still unclear.
4.) Now that we’ve studied this concept for decades someone has come up with another way to view it, and that new approach does make sense in reality.
But since the 20th century the tendency has been to go this way:
1.) This problem is really tricky to even state, and harder to solve.
2.) Hey, if we define this weird new concept the problem is easier.
3.) Therefore this weird new concept is more real than things that we can actually observe.
That is, people do not develop the concept while being neutral towards whether it has any real world meaning, and instead develop it from a real world perspective immediately, regardless of how little sense it means. The math is assumed to define reality, so if it is weird it is reality that is wrong and must be updated. Thus we skip over the development process that was used to understand how previous weird concepts fit in with reality, and nonsense can easily slip through.
Heresolong – I feel your pain. I’ve never been able to grasp probability. That’s one reason I found this blog, hoping to gain some understanding. It didn’t work, though, my brain seems to be wired wrong, although this blog has helped me accept that limitation. As an engineer, I have a strong aversion to untethered philosophy, and this blog has helped me see the connection between the two. And yes, I am able to have a happy and fufilled professional life using only “frequentist” approaches, fully aware that I have been “doing it all wrong” the whole time. I solve problems and my employer continues to pay me.
The negative probability topic is interesting, especially the analogy to imaginary numbers. I have used imaginary numbers for so many years now that any time spent pondering them seems like a waste, like that mathematician/philosopher who spent a lifetime trying to prove that 1+1=2.
Consider a radio signal carrying a stream of data, say a WiFi signal or a cellular signal. Without fully embracing imaginary numbers, it would be exceedingly difficult to understand how these signals are generated and deciphered. Even the underlying carrier signal is viewed as the sum of a positive frequency component and a negative frequency component. What the heck is negative frequency? Beats me. But as an electrical engineer, I know its really there, and you ignore it your peril. If you can’t bring yourself to believe that, you will really need to find a new profession.
Back to negative probability. It sounds goofy at face value, I don’t understand it as a standalone concept or as a portal into a deeper concept, I don’t understand it’s usefulness, and I can see how it could be used for deception. So no, I’m not buying it.
Microsoft’s Copilot agrees that Briggs has coined a unique new word, “mathematicalities”. Well done, Sir.
PC,
Thanks.
I had not read the Feynman essay – thanks for pointing it out. As usual with him I ended up thinking I understood something I don’t ..
However, negative probabilities make sense in the same ways that impossibles like infinity and i (root -1, not me) make sense: as intermediates in a process leading to real conclusions. Basically: if you can achieve something using these in ways that make sense but are easier/faster than other ways of achieving the same real result without invoking an impossible, than that use is legitimate even if the impossible is, in reality, not possible.
Further: one easy way to rationalize P(E)=-1.2 etc etc is to invoke the ultimate rationale for string theory – hidden dimensions. Thus the “has 5 apples, sells eight, gets four, ends with 1” example is impossible in a 3D universe because it omits the t part of our 3d+t universe – and you can similarly justify p(E)=-1.2 as real by imagining a (3+x)d +t universe for suitable x >1 in Z
The main problem with negative probabilities is that the probability of anything is DEFINED as the range between and inclusive of 0 and 1. Probabilities outside this range are literally meaningless. Any use of a number outside the range 0 <= x <= 1 has nothing to do with probability.