Skip to content

Category: Statistics

The general theory, methods, and philosophy of the Science of Guessing What Is.

April 18, 2018 | 17 Comments

One Out Of Five Babies Are Killed In England & Wales

I received this request from Steve Blendell (slightly edited for spelling):

Matt

How are you friend? Take a look at Prof Cotter’s letter – he’s a physicist. Do the stats stand up?

https://www.irishtimes.com/opinion/letters/the-eighth-amendment-1.3461099

The referendum is in May – our side have got off to a good strong start with posters.

Steve

The referendum is whether to repeal the Eighth Amendment which gives human beings a right to life. ‘No’ voters think killing the lives inside would-be mothers should be illegal, while ‘Yes’ voters want to draw their knives.

Ignore here the conceit, shared by all democracies, that such matters can be put to a (general) vote.

Cotter’s letter to the editor:

Sir, – Posters on my street for the No campaign state that the rate of terminations in England is either one in four (25 per cent) or one in five (20 per cent), depending on which poster I look at. It is also interesting to note that these data only refer to England. The reason for this is that if you include official 2016 statistics for Scotland and Wales, the overall rate drops to 14 per cent. Now 14 per cent is a long way from 25 per cent and doesn’t look good for the No campaign. So voters need to be aware of how statistics are being manipulated to encourage a no vote. – Yours, etc,

THOMAS G COTTER,
Crosshaven,
Co Cork.

Cotter apparently believes the (if true) slightly lower number of killings in England, Wales, and Scotland justify killing multitudes more in Ireland. Which is incoherent. Either the killing is moral and allowable, or it isn’t. If it is, what’s the difference if the entire population decides to kill itself off?

Since that argument goes nowhere, let’s look at the numbers instead. Here is more or less what I told Blendell.

Here are the official statistics: Link (pdf).

They put the abortion ‘rate’ in England and Wales this way: ‘The age-standardised abortion rate was 16.0 per 1,000 resident women aged 15-44.’ That is calculated like this:

     number of abortions/number of women aged 15-44 (in thousands).

That’s one definition of ‘rate’, but not the best if I understand them correctly. The best is

     number of abortions/(number of births + number of abortions).

An equivalent way to put it is

     number of abortions/number of conceptions.

Call this the Real Abortion Rate, and contrast it to the official rate. The Real rate will be higher, and likely much higher, than the number they are touting, which includes all women, whether or not they were pregnant.

Suppose only 1 woman in that age group got pregnant and then killed her child. That’s a Real rate of 100%, but it would be a very small official rate. To find it, take that 1 and divide by all the women (in thousands) aged 15-44. It’s in the thousands of thousands (millions), anyway.

I could not find what the Real rate is for England and Wales, but according to one chart in 2013 there were about 53,900 thousand people (roughly 54 million) in England and 3,100 thousand (3.1 million) in Wales. If women aged 15-44 were, say, 20% of these totals, then the total is 11,400 thousands women aged 15-44, more or less, in 2016.

Now that same report said there were 190,406 abortions in 2016. So that would make my estimate of the official rate per 1,000 women at

190,406/11,400 = 16,

which is exactly what they got, meaning that 20% guess of number of women in that age bracket is pretty good.

But if only 1 women was pregnant and killed her child, the Real rate would be 100% but it would make the ‘official’ abortion rate 1/11,400 = 0.00008, which is mighty small! This is only used to show that the definition of ‘rate’ matters.

More than 1 woman got pregnant. Here’s the official stats for England and Wales: Link.

Extrapolating would make about 900,000 conceptions in 2016, maybe slightly higher, maybe lower. They do not account for multiple births per woman, nor are miscarriages counted. But 900,000 is in the ballpark. That would makes the Real abortion rate about

190,406/900,000 = 21%.

That 21% is NOT per 1,000 women like the 16 above is, so be very careful making comparisons. This says (roughly) 1 out of EVERY 5 ‘conceptions’ are killed. Which is huge. That varies by age group, with (as the official report says) the highest rates around 22, i.e. the most fecund years.

Therefore this is how I would do the posters:

ONE OUT OF FIVE BABIES ARE KILLED IN ENGLAND & WALES.

Maybe accounting for uncertainties it’s 0.5 out of 5, or 1.5 out of 5. But 1 is a reasonable guess. I didn’t do Scotland, but you get the idea.

The numbers will all be meaningless. Statistics are (almost) useless. Those who want to kill do not care how many are killed. They just want to kill. Polls and bookies are predicting bloodlust wins, incidentally.

Image grabbed from here. Notice the hilariously inept ‘Trust us.’

Post corrected of my innumeracy. Bonus pic.

April 10, 2018 | 16 Comments

A Beats B Beats C Beats A

Thanks to Bruce Foutch who found the video above. Transitivity is familiar with ordinary numbers. If B > A and C > B and D > C, then D > A. But only if the numbers A, B, C and D behave themselves. They don’t always, as the video shows.

What’s nice about this demonstration is the probability and not expected value ordering. Hence the “10 gazillion” joke. “Expected” is not exactly a misnomer, but it does have two meanings. The plain English definition tells you an expected value is a value you’re probably going to see sometime or another. The probability definition doesn’t match that, or matches only sometimes.

Expected value is purely a mathematical formalism. You multiply the—conditional: all probability is conditional—probability of a possible outcome by the value of that possible outcome, and then sum them up. For an ordinary die, this is 1/6 x 1 + 1/6 x 2 + etc. which equals 3.5, a number nobody will ever see on a die, hence you cannot plain-English “expect” it.

It’s good homework to calculate the probability expected value for the dice in the video. It’s better homework to calculate the probabilities B > A and C > B and D > C, and D > A.

It’s not that expected values don’t have uses, but that they are sometimes put to the wrong use. The intransitive dice example illustrates this. If you’re in a game rolling against another playing and what counts is winning then you’ll want the probability ordering. If you’re in a game and what counts is some score based on the face of the dice, then you might want to use the expected value ordering, especially if you’re going to have a chance of winning 10 gazillion dollars. If you use the expected value ordering and what counts is winning, you will in general lose if you pick one die and your opponent is allowed to pick any of the remaining three.

Homework three: can you find a single change to the last die such that it’s now more likely to beat the first die?

There are some technical instances using “estimators” for parameters inside probability models which produce intransitivity and which I won’t discuss. As regular readers know I advocate eschewing parameter estimates altogether and moving to a strictly predictive approach in probability models (see other other posts in this class category for why).

Intransitivity shows up a lot when decisions must be made. Take the game rock-paper-scissors. What counts is winning. You can think of it in this sense: each “face” of this “three-sided die” has the same value. Rock beats scissors which beats paper which beats rock. There is no single best object in the trio.

Homework four: what is the probability of one R-P-S die beating another R-P-S die? Given that, why is it that some people are champions of this game?

R-P-S dice in effect are everywhere, and of course can have more than three sides. Voting provides prime cases. Even simple votes, like where to go to lunch. If you and your workmates are presented choices as comparisons, then you could end up with a suboptimal choice.

It can even lead to indecision. Suppose it’s you alone and you rated restaurants with “weights” the probability of the dice in the video (the weights aren’t necessary; it’s the ordering that counts). Which do you choose? You’d pick B over A, C over B, and D over C. But you’d also pick A over D. So you have to pick A. But then you’d have to pick B, because B is better than A. And so on.

People “break free” of these vicious circles by adding additional decision elements, which have the effect of changing the preference ordering (adding negative elements is possible, too). “Oh, just forget it. C is closest. Let’s go.” Tastiness and price, which might have been the drivers of the ordering before, are jettisoned in favor of distance, which for true distances provides a transitive ordering.

That maneuver is important. Without a change in premises, indecision results. Since a decision was made, the premises must have changed, too.

Voting is too large a topic to handle in one small post, so we’ll come back to it. It’s far from a simple subject. It’s also can be a depressing one, as we’ll see.

April 5, 2018 | 22 Comments

The Gremlins Of MCMC: Or, Computer Simulations Are Not What You Think

I don’t think we’re clear on what simulation is NOT. RANDOMNESS IS NOT NECESSARY, for the simple reason randomness is merely a state of knowledge. Hence this classic post from 12 June 2017.

“Let me get this straight. You said what makes your car go?”

“You heard me. Gremlins.”

“Grelims make your car go.”

“Look, it’s obvious. The cars runs, doesn’t it? It has to run for some reason, right? Everybody says that reason is gremlins. So it’s gremlins. No, wait. I know what you’re going to say. You’re going to say I don’t know why gremlins make it go, and you’re right, I don’t. Nobody does. But it’s gremlins.”

“And if I told you instead your car runs by a purely mechanical process, the result of internal combustion causing movement through a complex but straightforward process, would that interest you at all?”

“No. Look, I don’t care. It runs and that it’s gremlins is enough explanation for me. I get where I want to go, don’t I? What’s the difference if it’s gremlins or whatever it is you said?”

MCMC

That form of reasoning is used by defenders of simulations, a.k.a. Monte Carlo or MCMC methods (the other MC is for Markov Chain), in which gremlins are replaced by “randomness” and “draws from distributions.” Like the car run by gremlins, MCMC methods get you where you want to go, so why bother looking under the hood for more complicated explanations? Besides, doesn’t everybody agree simulations work by gremlins—I mean, “randomness” and “draws”?

Here is an abbreviated example from Uncertainty which proves it’s a mechanical process and not gremlins or randomness that accounts for the succeess of MCMC methods.

First let’s use gremlin language to describe a simple MCMC example. Z, I say, is “distributed” as a standard normal, and I want to know the probability Z is less than -1. Now the normal distribution is not an analytic equation, meaning I cannot just plug in numbers and calculate an answer. There are, however, many excellent approximations to do the job near enough, meaning I can with ease calculate this probability to reasonable accuracy. The R software does so by typing pnorm(-1), and which gives -0.1586553. This gives us something to compare our simulations to.

I could also get at the answer using MCMC. To do so I randomly—recall we’re using gremlin language—simulate a large number of draws from a standard normal, and count how many of these simulations are less than -1. Divide that number by the total number of simulations, and there is my approximation to the probability. Look into the literature and you will discover all kinds of niceties to this procedure (such as computing how accurate the approximation is, etc.), but this is close enough for us here. Use the following self-explanatory R code:


n = 10000
z = rnorm(n)
sum(z < -1)/n

I get 0.158, which is for applications not requiring accuracy beyond the third digit peachy keen. Play around with the size of n: e.g., with n = 10, I get for one simulation 0.2, which is not so hot. In gremlin language, the larger the number of draws the closer will the approximation "converge" to the right answer.

All MCMC methods are the same as this one in spirit. Some can grow to enormous complexity, of course, but the base idea, the philosophy, is all right here. The approximation is seen as legitimate not just because we can match it against an near-analytic answer, because we can't do that for any situation of real interest (if we could, we wouldn't need simulations!). It is seen as legitimate because of the way the answer was produced. Random draws imbued the structure of the MCMC "process" with a kind of mystical life. If the draws weren't random---and never mind defining what random really means---the approximation would be off, somehow, like in a pagan ceremony where somebody forgot to light the black randomness candle.

Of course, nobody speaks in this way. Few speak of the process at all, except to say it was gremlins; or rather, "randomness" and "draws". It's stranger still because the "randomness" is all computer-generated, and it is known computer-generated numbers aren't "truly" random. But, somehow, the whole thing still works, like the randomness candle has been swapped for a (safer!) electric version, and whatever entities were watching over the ceremony were satisfied the form has been met.

Mechanics

Now let's do the whole thing over in mechanical language and see what the differences are. By assumption, we want to quantify our uncertainty in Z using a standard normal distribution. We seek Pr(Z < -1 | assumption). We do not say Z "is normally distributed", which is gremlin talk. We say our uncertainty in Z is represented using this equation by assumption.

One popular way of "generating normals" (in gremlin language) is to use what's called a Box-Muller transformation. Any algorithm which needs "normals" can use this procedure. It starts by "generating" two "random independent uniform" numbers U_1 and U_2 and then calculating this creature:

Z = \sqrt{-2 \ln U_1} \cos(2 \pi U_2),

where Z is now said to be "standard normally distributed." We don't need to worry about the math, except to notice that it is written as a causal, or rather determinative, proposition: ``If U_1 is this and U_2 is that, Z is this with certainty." No uncertainty enters here; U_1 and U_2 determine Z. There is no life to this equation; it is (in effect) just an equation which translates a two-dimensional straight line on the interval 0 to 1 (in 2-D) to a line with a certain shape which runs from negative infinity to positive infinity.

To get the transformation, we simply write down all the numbers in the paired sequence (0.01, 0.01), (0.01, 0.02), ..., (0.99, 0.99). The decision to use two-digit accuracy was mine, just as I had to decide n above. This results in a sequence of pairs of numbers (U_1, U_2) of length 9801. For each pair, we apply the determinative mapping of (U_1, U_2) to produce Z as above, which gives (3.028866, 3.010924, ..., 1.414971e-01). Here is the R code (not written for efficiency, but transparency):


ep = 0.01 # the (st)ep
u1 = seq(ep, 1-ep, by = ep) # gives 0.01, 0.02, ..., 0.99
u2 = u1

z = NA # start with an empty vector
k = 0 # just a counter
for (i in u1){
for (j in u2){
k = k + 1
z[k] = sqrt(-2*log(i))*cos(2*pi*j) # the transformation
}
}
z[1:10] # shows the first 10 numbers of z

The first 10 numbers of Z map to the pairs (0.01, 0.01), (0.02, 0.01), (0.03, 0.01), ..., (0.10, 0.01). There is nothing at all special about the order in which the (U_1, U_2) pairs are input. In the end, as long as the "grid" of numbers implied by the loop are fed into the formula, we'll have our Z. We do not say U_1 and U_2 are "independent". That's gremlin talk. We speak of Z is purely causal terms. If you like, try this:

plot(z)

We have not "drawn" from any distribution here, neither uniform or normal. All that has happened is some perfectly simple math. And there is nothing "random". Everything is determined, as shown. The mechanical approximation is got the same way:

sum(z < -1)/length(z) # the denominator counts the size of z

which gives 0.1608677, which is a tad high. Try lowering ep, which is to say, try increasing the step resolution and see what that does. It is important to recognize the mechanical method will always give the same answer (with same inputs) regardless of how many times we compute it. Whereas the MCMC method above gives different numbers. Why?

Gremlins slain

Here is the gremlin R code, which first "draws" from "uniforms", and then applies the transformation. The ".s" are to indicate simulation.

n = 10000
u1.s = runif(n)
u2.s = runif(n)
z.s = sqrt(-2*log(u1.s))*cos(2*pi*u2.s)
sum(z.s < -1)/n

The first time I ran this, I got 0.1623, which is much worse than the mechanical, but the second I got 0.1589 which is good. Even in the gremlin approach, though, there is no "draw" from a normal. Our Z is still absolutely determined from the values of (u1.s, u2.s). That is, even in the gremlin approach, there is at least one mechanical process: calculating Z. So what can we say about (u1.s, u2.s)?

Here is where it gets interesting. Here is a plot of the empirical cumulative distribution of U_1 values from the mechanical procedure, overlaid with the ECDF of u1.s in red. It should be obvious the plots for U_2 and u2.s will be similar (but try!). Generate this yourself with the following code:


plot(ecdf(u1),xlab="U_1 values", ylab="Probability of U1 < value", xlim=c(0,1),pch='.') lines(ecdf(u1.s), col=2) abline(0,1,lty=2)

The values of U_1 are a rough step function; after all, there are only 99 values, while u1.s is of length n = 10000.

Do you see it yet? The gremlins have almost disappeared! If you don't see it---and do try and figure it out before reading further---try this code:

sort(u1.s)[1:20]

This gives the first 20 values of the "random" u1.s sorted from low to high. The values of U_1 were 0.01, 0.02, ... automatically sorted from low to high.

Do you see it yet? All u1.s is is a series of ordered numbers on the interval from 1-e6 to 1 - 1e-6. And the same for u2.s. (The 1e-6 is R's native display resolution for this problem; this can be adjusted.) And the same for U_1 and U_2, except the interval is a mite shorter! What we have are nothing but ordinary sequences of numbers from (roughly) 0 to 1! Do you have it?

The answer is: The gremlin procedure is identical to the mechanical!

Everything in the MCMC method was just as fixed and determined as the other mechanical method. There was nothing random, there were no draws. Everything was simple calculation, relying on an analytic formula somebody found that mapped two straight lines to one crooked one. But the MCMC method hides what's under the hood. Look at this plot (with the plot screen maximized; again, this is for transparency not efficiency):

plot(u1.s,u2.s, col=2, xlab='U 1 values',ylab='U 2 values')
u1.v = NA; u2.v = NA
k = 0
for (i in u1){
for (j in u2){
k = k + 1
u1.v[k] = i
u2.v[k] = j
}
}
points(u1.v,u2.v,pch=20) # these are (U_1, U_2) as one long vector of each

The black dots are the (U_1, U_2) pairs and the red the (u1.s, u2.s) pairs fed into the Z calculation. The mechanical is a regular gird and the MCMC-mechanical is also a (rougher) grid. So it's no wonder they give the same (or similar) answers: they are doing the same things.

The key is that the u1.s and u2.s themselves were produced by a purely mechanical process as well. R uses a formula no different in spirit for Z above, which if fed the same numbers always produces the same output (stick in known W which determines u1.s, etc.). The formula is called a "pseudorandom number generator", whereby "pseudorandom" they mean not random; purely mechanical. Everybody knows this, and everybody knows this, too: there is no point at which "randomness" or "draws" ever comes into the picture. There are no gremlins anywhere.

Now I do not and in no way claim that this grunt-mechanical, rigorous-grid approach is the way to handle all problems or that it is the most efficient. And I do not say the MCMC car doesn't get us where we are going. I am saying, and it is true, there are no gremlins. Everything is a determinate, mechanical process.

So what does that mean? I'm glad you asked. Let's let the late-great ET Jaynes give the answer. "It appears to be a quite general principle that, whenever there is a randomized way of doing something, then there is a nonrandomized way that delivers better performance but requires more thought."

We can believe in gremlins if we like, but we can do better if we understand how the engine really works.

There's lots more details, like the error of approximation and so forth, which I'll leave to Uncertainty (which does not have any code).

Bonus code

The value of -1 was nothing special. We can see the mechanical and MCMC procedures produce normal distributions which match almost everywhere. To see that, try this code:

plot(ecdf(z),xlab="Possible values of Z", ylab="Probability of Z < value", main="A standard normal") s = seq(-4,4,by=ep) lines(s,pnorm(s),lty=2,col=2) lines(ecdf(z.s),lty=3,col=3)

This is the (e)cdf of the distributions: mechanical Z (black solid), gremlin (green dot-dashed), analytic approximation (red dashed). The step in the middle is from the crude step in the mechanical. Play with the limits of the axis to "blow up" certain sections of the picture, like this:

plot(ecdf(z),xlab="Possible values of Z", ylab="Probability of Z < value", main="A standard normal", xlim=c(-1,1)) s = seq(-4,4,by=ep) lines(s,pnorm(s),lty=2,col=2) lines(ecdf(z.s),lty=3,col=3)

Try xlim=c(-4,-3) too.

Homework

Find the values of U_1 and U_2 that correspond to Z = -1. Using the modern language, what can you say about these values in relation to the (conditional!) probability Z < -1? Think about the probabilities of the Us.

What other simple transforms can you find that correspond to other common distributions? Try out your own code for these transforms.

April 4, 2018 | 19 Comments

Quantum Potency & Probability

A note on a complex subject. Ed Feser points us to the paper “An Aristotelian Approach to Quantum Mechanics” by Gil Sanders, and in turn Sanders points us to “Taking Heisenberg’s Potentia Seriously” by Kastner, Kauffman, and Epperson. Feser’s book Scholastic Metaphysics is also not to be missed.

Heisenberg was, of course, brought up when Aristotle’s notions on the distinction between act and potency were still taught. He thought those ideas useful in explaining quantum (= discrete) curiosities that were flooding through physics.

Sander’s paper is a gentle and informative introduction to these topics, while Kastner et al. go a little deeper. Below are some quotes. I believe they are useful in dispelling the recurring idea that probabilities are ontic, i.e. real things. Probability is purely epistemological, a relative measure of evidence, whereas potency is a real feature of objects. I urge you to read the papers themselves; they are not long. If you know about Aristotelian beneathphysics already, then jump to the end about probability.

Sanders (the set up in brief):

A wave function is a complete mathematical description of the properties of particles (represented as state vectors) in a physical system. By itself the wave function is a superposition of all possible state vectors. With Schrödinger evolution, the wave function evolves as a linear superposition of different states. It is deterministic in that the current vector state will physically determine the resulting vector state. If we could know all the proceeding conditions, we could predict with certainty what the resulting state vector would be. The wave function generally evolves in accord to Schrödinger, but once some form of measurement is performed, the wave function collapses in the sense that it no longer operates in accord to Schödinger’s equation but in accord to the collapse postulate. Through a linear combination of these state vectors, the once indefinite superposition of state vectors nondeterministically produces some definite state vector. In other words, the collapse postulate tells us that once a particle is measured, it is no longer in a superposition of different states but collapses into a particle with definite properties and a definite position in a nondeterministic manner.

Sanders (Aristotle is on his way):

The methodology of physics is such that it must use the exceedingly abstract tools of mathematics in order to perform its inquiry. Mathematics is inherently quantitative and structural by nature, thus it is in principle incapable of capturing qualitative aspects of nature in the same way that a metal detector is in principle incapable of detecting plastic. Whatever does not fit this quantifiable method, like immanent teleology and causal powers, must be ignored; only mathematically definable properties are discoverable. The wave function, for example, is a mere abstract equation that is standardly interpreted to be a representation of something concrete, but as to what that is we do not know. At best physics can only give us a partial description of reality (unless abstract structure is all that exists), it fails to tell us what is the inner qualitative nature of the thing that exhibits this mathematical structure.

Sanders (Aristotle has arrived):

According to the world renowned physicist, Heisenberg, the wave function “was a quantitative version of the old concept of “potentia” in Aristotelian philosophy. It introduced something standing in the middle between the idea of an event and the actual event, a strange kind of physical reality just in the middle between possibility and reality” (1958, 41)…

A potentia is simply a thing’s potential to have its qualities or substance changed. For example, a piece of glass has the potential to shatter or it has the potential to melt into a fluid. The former kind of change is a change of qualities or accidents, whereas the latter is a change in substance. This stands in contrast to actus, which refers to the way a thing actually is here and now… A potentiality should not be confused with mere possibility. It is possible for a unicorn to exist, but it is not possible for a piece of glass to become a unicorn because it lacks that potential whereas it does have the potential to break. A piece of glass’ actuality limits the potential range of things that can be actualized.

Sanders (Aristotle has filled the room):

[Modern physics restricts] the “real” to actuality because their view of matter is still mechanistic, where material objects are mere forms, which corresponds only to actuality. The Aristotelian conception of matter is decidedly hylomorphic in that all material substances are composed of form and matter. Form (or structure) corresponds to actuality, whereas matter corresponds to the potency that persists through change. This matter is the substrate of a material substance that is receptive to different forms, whereas the form gives definite structure to the matter…Since matter and form are just more specific instances of potency and actuality, we already know that this analysis is plausible given the above argument for Aristotle’s act-potency distinction.

Sanders (skipping over a justification of hylomorphism and a proof that potency has a kind of existence, then this):

Additionally, hylomorphism entails a gradual spectrum of material beings with greater degrees of potentiality to greater degrees of actuality. Something has greater actuality if it has more determinate form (or qualities) and something has higher potency if it is more indeterminate with respect to being more receptacle to various forms. For example, a piece of clay has higher potency insofar as it is more malleable than a rock and thus more receptacle to various forms. A rock can likewise be modified to receive various forms, but it requires a physical entity with greater actuality or power to do so because it has more more determinate form as a solid object… [H]ylormophism predicts that you will find higher levels of potency because you are getting closer to prime matter. This is precisely what we find in QM. The macroscopic world has more actuality, which is why we experience it as more definite or determinate, whereas the microscopic world has far less actuality, thereby creating far less determinate behavioral patterns.

Sanders (finally QM):

Let’s start with the wave function, which if you recall, initially describes several mathematical possibilities (aka superposition) prior to collapse. QM forces forces us to reify the wave function in some way because by itself it would suggest that the quantum world only exists when we are measuring it, which is rather absurd….It’s far more plausible to interpret the wave function as real insofar as it describes a range of potential outcomes for particles that are low in act but great in potency. This view reinterprets superpositions as being the potentials of a thing or state, not as actual states in which all possibilities are realized.

Sanders (more QM):

Thus collapse occurs when there is contact between a perceptible object and a non-perceptible particle whereby contact with the perceptible object actualizes a particular potential (spin-y as opposed to spin-x) of the particle into a definite state. The actualization of certain outcomes at measurement has the result of affecting the range of potential outcomes of some other particle: “actual events can instantaneously and acausally affect what is next possible” (Kastner, 2017)… This problem is resolved if you’re an Aristotelian. Suppose you intended to visit Los Angeles but unbeknownst to you an earthquake sunk that traffic-ridden city into the ocean. This actualized event changed the range of potential places that I (or anyone else) could visit without acting upon other persons. In other words, actuality cannot directly alter a distant actuality without interaction but it can instantaneously and acausally change a distant range of potentials.

Kastner (skipping over the same material discussed in Sanders; the Kastner PDF was built, I’m guessing, from Windows, making it very difficult to cut and paste from; thus my laziness explains why I quote them less):

We thus propose a new kind of ontological duality as an alternative to the dualism of Descartes: in addition to res extensa, we suggest, with Heisenberg, what may be called res potentia. We will argue that admitting the concept of potentia in to our ontology is fruitful, in that it can provide an account of the otherwise mysterious nonlocal phenomena of quantum physics and at least three other related mysteries (‘wave function collapse’; loss of interference on which-way information; ‘null measurement’), without requiring any change to the theory itself…

As indicated by the term ‘res,’ we do conceive of res potentia as an ontological extant in the same sense that res extensa is typically conceived—i.e. as ‘substance,’ but in the more general, Aristotelian sense, where substance does not necessarily entail conflation with the concept of physical matter, but is rather merely “the essence of a thing . . . what it is said to be in respect of itself”.

Of course, “one cannot ‘directly observe’ potentiality, but rather only infer it from the structure of the theory.” If we could measure it directly, it would be actus not potentia. They use the phrase quantum potentia (QP).

Probability

The belief that all things had to be all act, pure actus, and contain no potentia accounts for many of the confusions about QM. One of those confusions was the concepts of probability and “chance”. Physicists were reluctant to throw away the useful idea of cause; there had to be some causal reason “collapse” was happening. That collapse is the movement of a potential to an actual, but they didn’t see it that way, thinking the superposition of waves was all act. How did this happen? Probability was discovered to be indispensable in applying and understanding QM. Thus some thought probability itself was ontic, that chance was an objective feature of the world, and that probability/chance was the causal agent that selected the collapse point.

After all, isn’t QM probability calculated as a mathematical-function of the physical-wave-function? Didn’t that make probability real?

Well, no. It’s true the probability is a mathematical-function, something like the “square of the corresponding amplitude in a wave function”. The probability thus takes as input aspects of reality, a reality (the wave) which contains both act and potential, and spits out a number. But so what? Conditioning on measures of real things doesn’t turn thoughts about things into the things themselves or into causal forces. (This does not rule out the mind-body projecting energy, but I don’t believe it can, and that is not what turning thoughts into causal forces means.)

If I tell you this bag has one black and one white ball and one must be drawn out blind, the probability of drawing a black is a function of reality all right, but your thoughts about that probability isn’t what is causing your hand to grasp a ball. There is no probability in the bag. Or in your hand, or anywhere except in your thought. That’s easy to see in balls-in-bags because, as the two papers emphasize, we are dealing with objects that contain mostly act. That the balls have the potential to be all sorts of places in the bag is what makes the probability calculation non-extreme (not 0 or 1).

This is made even more obvious by recalling two physicists can have different probabilities for the same QM event. Just as two people could have two different probabilities for balls in bags. Person A has the probability 1/2, given just the premise above, but Person B notices the bottom of the bag is transparent; Person B has probability 1 of drawing the black. Physicist A knows everything about the measurement apparatus except for one thing newly learned by B, an additional physical measure. Both have different probabilities. It will turn out, in both cases, B makes better predictions. But in neither case could the probabilities have caused anything to happen. Indeed, Person B has an extreme probability because the cause of selecting black is perfectly known—and obviously isn’t the probability.

Physicist B does not have that advantage Person B has. For in Physicist B’s case, we have a proof that we can never reach extreme probabilities for certain class of correlated (in the physics use of that word) events. It has to be something in act that moves the potential in the wave to act (“collapse”), but what that is is hidden from us. That isn’t “hidden variables”; that’s an understanding our knowledge of cause is necessarily incomplete.

Consider Kastner:

[W]e might plan to meet tomorrow for coffee at the Downtown Coffee Shop. But suppose that, unbeknownst to us, while we are making these plans, the coffee shop (actually) closes. Instantaneously and acausally, it is no longer possible for us (or for anyone no matter where they happen to live) to have coffee at the Downtown Coffee Shop tomorrow. What is possible has been globally and acausally altered by a new actual (token of res extensa). In order for this to occur, no relativity-violating signal had to be sent; no physical law had to be violated. We simply allow that actual events can instantaneously and acausally affect what is next possible…which, in turn, influences what can next become actual, and so on.

They mean causal in the efficient cause sense, of course; and we needn’t agree with them about physical “laws”. The probability, in their minds, ignorant of the closing, that they will meet inside the coffee shop is high (close to or equal to 1 depending on individual circumstances). That they will meet inside won’t happen, though. They did not have the right information upon which to condition. That knowledge was not a hidden variable in any causal sense. Bell lives on.

Now about how all this works in individual experiments, and the relation to probability, we’ll leave for another time.