Skip to content

Author: Briggs

April 5, 2018 | 20 Comments

The Gremlins Of MCMC: Or, Computer Simulations Are Not What You Think

I don’t think we’re clear on what simulation is NOT. RANDOMNESS IS NOT NECESSARY, for the simple reason randomness is merely a state of knowledge. Hence this classic post from 12 June 2017.

“Let me get this straight. You said what makes your car go?”

“You heard me. Gremlins.”

“Grelims make your car go.”

“Look, it’s obvious. The cars runs, doesn’t it? It has to run for some reason, right? Everybody says that reason is gremlins. So it’s gremlins. No, wait. I know what you’re going to say. You’re going to say I don’t know why gremlins make it go, and you’re right, I don’t. Nobody does. But it’s gremlins.”

“And if I told you instead your car runs by a purely mechanical process, the result of internal combustion causing movement through a complex but straightforward process, would that interest you at all?”

“No. Look, I don’t care. It runs and that it’s gremlins is enough explanation for me. I get where I want to go, don’t I? What’s the difference if it’s gremlins or whatever it is you said?”


That form of reasoning is used by defenders of simulations, a.k.a. Monte Carlo or MCMC methods (the other MC is for Markov Chain), in which gremlins are replaced by “randomness” and “draws from distributions.” Like the car run by gremlins, MCMC methods get you where you want to go, so why bother looking under the hood for more complicated explanations? Besides, doesn’t everybody agree simulations work by gremlins—I mean, “randomness” and “draws”?

Here is an abbreviated example from Uncertainty which proves it’s a mechanical process and not gremlins or randomness that accounts for the succeess of MCMC methods.

First let’s use gremlin language to describe a simple MCMC example. Z, I say, is “distributed” as a standard normal, and I want to know the probability Z is less than -1. Now the normal distribution is not an analytic equation, meaning I cannot just plug in numbers and calculate an answer. There are, however, many excellent approximations to do the job near enough, meaning I can with ease calculate this probability to reasonable accuracy. The R software does so by typing pnorm(-1), and which gives -0.1586553. This gives us something to compare our simulations to.

I could also get at the answer using MCMC. To do so I randomly—recall we’re using gremlin language—simulate a large number of draws from a standard normal, and count how many of these simulations are less than -1. Divide that number by the total number of simulations, and there is my approximation to the probability. Look into the literature and you will discover all kinds of niceties to this procedure (such as computing how accurate the approximation is, etc.), but this is close enough for us here. Use the following self-explanatory R code:

n = 10000
z = rnorm(n)
sum(z < -1)/n

I get 0.158, which is for applications not requiring accuracy beyond the third digit peachy keen. Play around with the size of n: e.g., with n = 10, I get for one simulation 0.2, which is not so hot. In gremlin language, the larger the number of draws the closer will the approximation "converge" to the right answer.

All MCMC methods are the same as this one in spirit. Some can grow to enormous complexity, of course, but the base idea, the philosophy, is all right here. The approximation is seen as legitimate not just because we can match it against an near-analytic answer, because we can't do that for any situation of real interest (if we could, we wouldn't need simulations!). It is seen as legitimate because of the way the answer was produced. Random draws imbued the structure of the MCMC "process" with a kind of mystical life. If the draws weren't random---and never mind defining what random really means---the approximation would be off, somehow, like in a pagan ceremony where somebody forgot to light the black randomness candle.

Of course, nobody speaks in this way. Few speak of the process at all, except to say it was gremlins; or rather, "randomness" and "draws". It's stranger still because the "randomness" is all computer-generated, and it is known computer-generated numbers aren't "truly" random. But, somehow, the whole thing still works, like the randomness candle has been swapped for a (safer!) electric version, and whatever entities were watching over the ceremony were satisfied the form has been met.


Now let's do the whole thing over in mechanical language and see what the differences are. By assumption, we want to quantify our uncertainty in Z using a standard normal distribution. We seek Pr(Z < -1 | assumption). We do not say Z "is normally distributed", which is gremlin talk. We say our uncertainty in Z is represented using this equation by assumption.

One popular way of "generating normals" (in gremlin language) is to use what's called a Box-Muller transformation. Any algorithm which needs "normals" can use this procedure. It starts by "generating" two "random independent uniform" numbers U_1 and U_2 and then calculating this creature:

Z = \sqrt{-2 \ln U_1} \cos(2 \pi U_2),

where Z is now said to be "standard normally distributed." We don't need to worry about the math, except to notice that it is written as a causal, or rather determinative, proposition: ``If U_1 is this and U_2 is that, Z is this with certainty." No uncertainty enters here; U_1 and U_2 determine Z. There is no life to this equation; it is (in effect) just an equation which translates a two-dimensional straight line on the interval 0 to 1 (in 2-D) to a line with a certain shape which runs from negative infinity to positive infinity.

To get the transformation, we simply write down all the numbers in the paired sequence (0.01, 0.01), (0.01, 0.02), ..., (0.99, 0.99). The decision to use two-digit accuracy was mine, just as I had to decide n above. This results in a sequence of pairs of numbers (U_1, U_2) of length 9801. For each pair, we apply the determinative mapping of (U_1, U_2) to produce Z as above, which gives (3.028866, 3.010924, ..., 1.414971e-01). Here is the R code (not written for efficiency, but transparency):

ep = 0.01 # the (st)ep
u1 = seq(ep, 1-ep, by = ep) # gives 0.01, 0.02, ..., 0.99
u2 = u1

z = NA # start with an empty vector
k = 0 # just a counter
for (i in u1){
for (j in u2){
k = k + 1
z[k] = sqrt(-2*log(i))*cos(2*pi*j) # the transformation
z[1:10] # shows the first 10 numbers of z

The first 10 numbers of Z map to the pairs (0.01, 0.01), (0.02, 0.01), (0.03, 0.01), ..., (0.10, 0.01). There is nothing at all special about the order in which the (U_1, U_2) pairs are input. In the end, as long as the "grid" of numbers implied by the loop are fed into the formula, we'll have our Z. We do not say U_1 and U_2 are "independent". That's gremlin talk. We speak of Z is purely causal terms. If you like, try this:


We have not "drawn" from any distribution here, neither uniform or normal. All that has happened is some perfectly simple math. And there is nothing "random". Everything is determined, as shown. The mechanical approximation is got the same way:

sum(z < -1)/length(z) # the denominator counts the size of z

which gives 0.1608677, which is a tad high. Try lowering ep, which is to say, try increasing the step resolution and see what that does. It is important to recognize the mechanical method will always give the same answer (with same inputs) regardless of how many times we compute it. Whereas the MCMC method above gives different numbers. Why?

Gremlins slain

Here is the gremlin R code, which first "draws" from "uniforms", and then applies the transformation. The ".s" are to indicate simulation.

n = 10000
u1.s = runif(n)
u2.s = runif(n)
z.s = sqrt(-2*log(u1.s))*cos(2*pi*u2.s)
sum(z.s < -1)/n

The first time I ran this, I got 0.1623, which is much worse than the mechanical, but the second I got 0.1589 which is good. Even in the gremlin approach, though, there is no "draw" from a normal. Our Z is still absolutely determined from the values of (u1.s, u2.s). That is, even in the gremlin approach, there is at least one mechanical process: calculating Z. So what can we say about (u1.s, u2.s)?

Here is where it gets interesting. Here is a plot of the empirical cumulative distribution of U_1 values from the mechanical procedure, overlaid with the ECDF of u1.s in red. It should be obvious the plots for U_2 and u2.s will be similar (but try!). Generate this yourself with the following code:

plot(ecdf(u1),xlab="U_1 values", ylab="Probability of U1 < value", xlim=c(0,1),pch='.') lines(ecdf(u1.s), col=2) abline(0,1,lty=2)

The values of U_1 are a rough step function; after all, there are only 99 values, while u1.s is of length n = 10000.

Do you see it yet? The gremlins have almost disappeared! If you don't see it---and do try and figure it out before reading further---try this code:


This gives the first 20 values of the "random" u1.s sorted from low to high. The values of U_1 were 0.01, 0.02, ... automatically sorted from low to high.

Do you see it yet? All u1.s is is a series of ordered numbers on the interval from 1-e6 to 1 - 1e-6. And the same for u2.s. (The 1e-6 is R's native display resolution for this problem; this can be adjusted.) And the same for U_1 and U_2, except the interval is a mite shorter! What we have are nothing but ordinary sequences of numbers from (roughly) 0 to 1! Do you have it?

The answer is: The gremlin procedure is identical to the mechanical!

Everything in the MCMC method was just as fixed and determined as the other mechanical method. There was nothing random, there were no draws. Everything was simple calculation, relying on an analytic formula somebody found that mapped two straight lines to one crooked one. But the MCMC method hides what's under the hood. Look at this plot (with the plot screen maximized; again, this is for transparency not efficiency):

plot(u1.s,u2.s, col=2, xlab='U 1 values',ylab='U 2 values')
u1.v = NA; u2.v = NA
k = 0
for (i in u1){
for (j in u2){
k = k + 1
u1.v[k] = i
u2.v[k] = j
points(u1.v,u2.v,pch=20) # these are (U_1, U_2) as one long vector of each

The black dots are the (U_1, U_2) pairs and the red the (u1.s, u2.s) pairs fed into the Z calculation. The mechanical is a regular gird and the MCMC-mechanical is also a (rougher) grid. So it's no wonder they give the same (or similar) answers: they are doing the same things.

The key is that the u1.s and u2.s themselves were produced by a purely mechanical process as well. R uses a formula no different in spirit for Z above, which if fed the same numbers always produces the same output (stick in known W which determines u1.s, etc.). The formula is called a "pseudorandom number generator", whereby "pseudorandom" they mean not random; purely mechanical. Everybody knows this, and everybody knows this, too: there is no point at which "randomness" or "draws" ever comes into the picture. There are no gremlins anywhere.

Now I do not and in no way claim that this grunt-mechanical, rigorous-grid approach is the way to handle all problems or that it is the most efficient. And I do not say the MCMC car doesn't get us where we are going. I am saying, and it is true, there are no gremlins. Everything is a determinate, mechanical process.

So what does that mean? I'm glad you asked. Let's let the late-great ET Jaynes give the answer. "It appears to be a quite general principle that, whenever there is a randomized way of doing something, then there is a nonrandomized way that delivers better performance but requires more thought."

We can believe in gremlins if we like, but we can do better if we understand how the engine really works.

There's lots more details, like the error of approximation and so forth, which I'll leave to Uncertainty (which does not have any code).

Bonus code

The value of -1 was nothing special. We can see the mechanical and MCMC procedures produce normal distributions which match almost everywhere. To see that, try this code:

plot(ecdf(z),xlab="Possible values of Z", ylab="Probability of Z < value", main="A standard normal") s = seq(-4,4,by=ep) lines(s,pnorm(s),lty=2,col=2) lines(ecdf(z.s),lty=3,col=3)

This is the (e)cdf of the distributions: mechanical Z (black solid), gremlin (green dot-dashed), analytic approximation (red dashed). The step in the middle is from the crude step in the mechanical. Play with the limits of the axis to "blow up" certain sections of the picture, like this:

plot(ecdf(z),xlab="Possible values of Z", ylab="Probability of Z < value", main="A standard normal", xlim=c(-1,1)) s = seq(-4,4,by=ep) lines(s,pnorm(s),lty=2,col=2) lines(ecdf(z.s),lty=3,col=3)

Try xlim=c(-4,-3) too.


Find the values of U_1 and U_2 that correspond to Z = -1. Using the modern language, what can you say about these values in relation to the (conditional!) probability Z < -1? Think about the probabilities of the Us.

What other simple transforms can you find that correspond to other common distributions? Try out your own code for these transforms.

April 4, 2018 | 19 Comments

Quantum Potency & Probability

A note on a complex subject. Ed Feser points us to the paper “An Aristotelian Approach to Quantum Mechanics” by Gil Sanders, and in turn Sanders points us to “Taking Heisenberg’s Potentia Seriously” by Kastner, Kauffman, and Epperson. Feser’s book Scholastic Metaphysics is also not to be missed.

Heisenberg was, of course, brought up when Aristotle’s notions on the distinction between act and potency were still taught. He thought those ideas useful in explaining quantum (= discrete) curiosities that were flooding through physics.

Sander’s paper is a gentle and informative introduction to these topics, while Kastner et al. go a little deeper. Below are some quotes. I believe they are useful in dispelling the recurring idea that probabilities are ontic, i.e. real things. Probability is purely epistemological, a relative measure of evidence, whereas potency is a real feature of objects. I urge you to read the papers themselves; they are not long. If you know about Aristotelian beneathphysics already, then jump to the end about probability.

Sanders (the set up in brief):

A wave function is a complete mathematical description of the properties of particles (represented as state vectors) in a physical system. By itself the wave function is a superposition of all possible state vectors. With Schrödinger evolution, the wave function evolves as a linear superposition of different states. It is deterministic in that the current vector state will physically determine the resulting vector state. If we could know all the proceeding conditions, we could predict with certainty what the resulting state vector would be. The wave function generally evolves in accord to Schrödinger, but once some form of measurement is performed, the wave function collapses in the sense that it no longer operates in accord to Schödinger’s equation but in accord to the collapse postulate. Through a linear combination of these state vectors, the once indefinite superposition of state vectors nondeterministically produces some definite state vector. In other words, the collapse postulate tells us that once a particle is measured, it is no longer in a superposition of different states but collapses into a particle with definite properties and a definite position in a nondeterministic manner.

Sanders (Aristotle is on his way):

The methodology of physics is such that it must use the exceedingly abstract tools of mathematics in order to perform its inquiry. Mathematics is inherently quantitative and structural by nature, thus it is in principle incapable of capturing qualitative aspects of nature in the same way that a metal detector is in principle incapable of detecting plastic. Whatever does not fit this quantifiable method, like immanent teleology and causal powers, must be ignored; only mathematically definable properties are discoverable. The wave function, for example, is a mere abstract equation that is standardly interpreted to be a representation of something concrete, but as to what that is we do not know. At best physics can only give us a partial description of reality (unless abstract structure is all that exists), it fails to tell us what is the inner qualitative nature of the thing that exhibits this mathematical structure.

Sanders (Aristotle has arrived):

According to the world renowned physicist, Heisenberg, the wave function “was a quantitative version of the old concept of “potentia” in Aristotelian philosophy. It introduced something standing in the middle between the idea of an event and the actual event, a strange kind of physical reality just in the middle between possibility and reality” (1958, 41)…

A potentia is simply a thing’s potential to have its qualities or substance changed. For example, a piece of glass has the potential to shatter or it has the potential to melt into a fluid. The former kind of change is a change of qualities or accidents, whereas the latter is a change in substance. This stands in contrast to actus, which refers to the way a thing actually is here and now… A potentiality should not be confused with mere possibility. It is possible for a unicorn to exist, but it is not possible for a piece of glass to become a unicorn because it lacks that potential whereas it does have the potential to break. A piece of glass’ actuality limits the potential range of things that can be actualized.

Sanders (Aristotle has filled the room):

[Modern physics restricts] the “real” to actuality because their view of matter is still mechanistic, where material objects are mere forms, which corresponds only to actuality. The Aristotelian conception of matter is decidedly hylomorphic in that all material substances are composed of form and matter. Form (or structure) corresponds to actuality, whereas matter corresponds to the potency that persists through change. This matter is the substrate of a material substance that is receptive to different forms, whereas the form gives definite structure to the matter…Since matter and form are just more specific instances of potency and actuality, we already know that this analysis is plausible given the above argument for Aristotle’s act-potency distinction.

Sanders (skipping over a justification of hylomorphism and a proof that potency has a kind of existence, then this):

Additionally, hylomorphism entails a gradual spectrum of material beings with greater degrees of potentiality to greater degrees of actuality. Something has greater actuality if it has more determinate form (or qualities) and something has higher potency if it is more indeterminate with respect to being more receptacle to various forms. For example, a piece of clay has higher potency insofar as it is more malleable than a rock and thus more receptacle to various forms. A rock can likewise be modified to receive various forms, but it requires a physical entity with greater actuality or power to do so because it has more more determinate form as a solid object… [H]ylormophism predicts that you will find higher levels of potency because you are getting closer to prime matter. This is precisely what we find in QM. The macroscopic world has more actuality, which is why we experience it as more definite or determinate, whereas the microscopic world has far less actuality, thereby creating far less determinate behavioral patterns.

Sanders (finally QM):

Let’s start with the wave function, which if you recall, initially describes several mathematical possibilities (aka superposition) prior to collapse. QM forces forces us to reify the wave function in some way because by itself it would suggest that the quantum world only exists when we are measuring it, which is rather absurd….It’s far more plausible to interpret the wave function as real insofar as it describes a range of potential outcomes for particles that are low in act but great in potency. This view reinterprets superpositions as being the potentials of a thing or state, not as actual states in which all possibilities are realized.

Sanders (more QM):

Thus collapse occurs when there is contact between a perceptible object and a non-perceptible particle whereby contact with the perceptible object actualizes a particular potential (spin-y as opposed to spin-x) of the particle into a definite state. The actualization of certain outcomes at measurement has the result of affecting the range of potential outcomes of some other particle: “actual events can instantaneously and acausally affect what is next possible” (Kastner, 2017)… This problem is resolved if you’re an Aristotelian. Suppose you intended to visit Los Angeles but unbeknownst to you an earthquake sunk that traffic-ridden city into the ocean. This actualized event changed the range of potential places that I (or anyone else) could visit without acting upon other persons. In other words, actuality cannot directly alter a distant actuality without interaction but it can instantaneously and acausally change a distant range of potentials.

Kastner (skipping over the same material discussed in Sanders; the Kastner PDF was built, I’m guessing, from Windows, making it very difficult to cut and paste from; thus my laziness explains why I quote them less):

We thus propose a new kind of ontological duality as an alternative to the dualism of Descartes: in addition to res extensa, we suggest, with Heisenberg, what may be called res potentia. We will argue that admitting the concept of potentia in to our ontology is fruitful, in that it can provide an account of the otherwise mysterious nonlocal phenomena of quantum physics and at least three other related mysteries (‘wave function collapse’; loss of interference on which-way information; ‘null measurement’), without requiring any change to the theory itself…

As indicated by the term ‘res,’ we do conceive of res potentia as an ontological extant in the same sense that res extensa is typically conceived—i.e. as ‘substance,’ but in the more general, Aristotelian sense, where substance does not necessarily entail conflation with the concept of physical matter, but is rather merely “the essence of a thing . . . what it is said to be in respect of itself”.

Of course, “one cannot ‘directly observe’ potentiality, but rather only infer it from the structure of the theory.” If we could measure it directly, it would be actus not potentia. They use the phrase quantum potentia (QP).


The belief that all things had to be all act, pure actus, and contain no potentia accounts for many of the confusions about QM. One of those confusions was the concepts of probability and “chance”. Physicists were reluctant to throw away the useful idea of cause; there had to be some causal reason “collapse” was happening. That collapse is the movement of a potential to an actual, but they didn’t see it that way, thinking the superposition of waves was all act. How did this happen? Probability was discovered to be indispensable in applying and understanding QM. Thus some thought probability itself was ontic, that chance was an objective feature of the world, and that probability/chance was the causal agent that selected the collapse point.

After all, isn’t QM probability calculated as a mathematical-function of the physical-wave-function? Didn’t that make probability real?

Well, no. It’s true the probability is a mathematical-function, something like the “square of the corresponding amplitude in a wave function”. The probability thus takes as input aspects of reality, a reality (the wave) which contains both act and potential, and spits out a number. But so what? Conditioning on measures of real things doesn’t turn thoughts about things into the things themselves or into causal forces. (This does not rule out the mind-body projecting energy, but I don’t believe it can, and that is not what turning thoughts into causal forces means.)

If I tell you this bag has one black and one white ball and one must be drawn out blind, the probability of drawing a black is a function of reality all right, but your thoughts about that probability isn’t what is causing your hand to grasp a ball. There is no probability in the bag. Or in your hand, or anywhere except in your thought. That’s easy to see in balls-in-bags because, as the two papers emphasize, we are dealing with objects that contain mostly act. That the balls have the potential to be all sorts of places in the bag is what makes the probability calculation non-extreme (not 0 or 1).

This is made even more obvious by recalling two physicists can have different probabilities for the same QM event. Just as two people could have two different probabilities for balls in bags. Person A has the probability 1/2, given just the premise above, but Person B notices the bottom of the bag is transparent; Person B has probability 1 of drawing the black. Physicist A knows everything about the measurement apparatus except for one thing newly learned by B, an additional physical measure. Both have different probabilities. It will turn out, in both cases, B makes better predictions. But in neither case could the probabilities have caused anything to happen. Indeed, Person B has an extreme probability because the cause of selecting black is perfectly known—and obviously isn’t the probability.

Physicist B does not have that advantage Person B has. For in Physicist B’s case, we have a proof that we can never reach extreme probabilities for certain class of correlated (in the physics use of that word) events. It has to be something in act that moves the potential in the wave to act (“collapse”), but what that is is hidden from us. That isn’t “hidden variables”; that’s an understanding our knowledge of cause is necessarily incomplete.

Consider Kastner:

[W]e might plan to meet tomorrow for coffee at the Downtown Coffee Shop. But suppose that, unbeknownst to us, while we are making these plans, the coffee shop (actually) closes. Instantaneously and acausally, it is no longer possible for us (or for anyone no matter where they happen to live) to have coffee at the Downtown Coffee Shop tomorrow. What is possible has been globally and acausally altered by a new actual (token of res extensa). In order for this to occur, no relativity-violating signal had to be sent; no physical law had to be violated. We simply allow that actual events can instantaneously and acausally affect what is next possible…which, in turn, influences what can next become actual, and so on.

They mean causal in the efficient cause sense, of course; and we needn’t agree with them about physical “laws”. The probability, in their minds, ignorant of the closing, that they will meet inside the coffee shop is high (close to or equal to 1 depending on individual circumstances). That they will meet inside won’t happen, though. They did not have the right information upon which to condition. That knowledge was not a hidden variable in any causal sense. Bell lives on.

Now about how all this works in individual experiments, and the relation to probability, we’ll leave for another time.

April 3, 2018 | 11 Comments

Not All Conspiracies Are Theories

I don’t often do this, but just watch.

Why, it’s almost as if the news is coordinated—and a danger to our democracy. I admire most the look of utter sincerity on the faces of the newsreaders.

Yes, I’m aware these planted stories are by an organization that “leans right”. But we have often, often heard similar clips from “leans left” outlets, where each of them use precisely the same words on a “story”. Two minutes searching brings us here (with annoying laugh track), here (the lead is the same as previous), here, here (some repeats), and best one here (some reps, but you must see the CNN fakery). It’s a staple gag of talk radio to stitch together clips from mainstream stories of the day, where we hear all reporters say exactly the same thing, but I can’t locate any on line.

The only difference between those clips and the video above is that the video above is masterfully edited. Sincerity.

End of line.

April 2, 2018 | 26 Comments

Oh, Hell

Hell either exists or it does not, a tautological statement, and therefore necessarily true. If it does not, then you have nothing to worry about, nothing to be sorry for, and there is nothing you need do in preparing for your death, which will certainly come. There is also nothing you need do in conducting your life, either, to avoid going to a place that does not exist. The only fear is that other people might persecute or punish you for what they, and not necessarily you, feel are crimes. Fearing death is irrational because death brings only non-existence or possibly something else which is not Hell.

If Hell exists, then either some or none go there. In none go there, then again there is nothing for you to worry about or confess. Your behavior, no matter how reprehensible (by whatever definition), can never earn eternal punishment. For Hell, by definition, is eternal separation from God, which is the worst punishment there is. If Hell exists, so does God. If some go to Hell you have to approach your salvation from being sent there with fear and trembling, to coin a phrase.

If there is no Hell it does not follow obviously that after death comes non-existence. But that is because of the ambiguity in existence. Some argue that our lives (all life) are strictly bio-mechanical, meaning we are nothing but machines, albeit complicated ones. If we are machines, it may be possible to duplicate your particular machinery, if only in software, and so duplicate you. But this is impossible given the proof that our existences cannot be wholly material. Our intellects and wills are non-material, i.e. non-physical processes, and therefore cannot be duplicated in any physical way, which includes software. Thus if there is no Hell and no God, non-existence is necessarily your patrimony. From that we deduce your existence is utterly without meaning or importance. End of story.

There are two major beliefs which propound that Hell could exist (and thus so does God) but none go there. One is called universalism, the other is annihilationism. Universalism argues all—yes, even Yours Truly, who certainly deserves Hell if it exists—are eventually brought into God’s grace. Universalism is agnostic on the question of Hell’s existence. Under universalism, it is logically possible people may go to Hell, but they would stay there only a finite time.

If universalism is true, again there is no reason to moderate your behavior in any way, for no matter what you do, it will be forgiven. Cursing God, denying Him, worshiping Satan, murder, rape, listening to NPR, whatever. Nothing you do has the least consequence in “earning” salvation, which is automatic. So you may as well have at it! Eat, drink, and kill, for tomorrow we die. And are received into God’s arms. It is, of course, logically possible that God orders Heaven into a hierarchy and that, because of your misdeeds, you earn a lower spot in paradise, but that you go there, under universalism, is guaranteed.

Annihilationism argues that only the good, considered under God’s rules, reach Heaven, and that the bad are snuffed out, their souls annihilated. Bad people reach complete non-existence. You may call this a punishment if you wish, but it is an odd use of the word, because at the end there is no you to be punished. Where once you were, nothing is left, just as if God did not exist. If annihilationism holds, your behavior matters, but only to the extent you wish to attain Heaven, which is either earned by yourself, or is given to you by God using whatever lights He decides. If you would prefer to use your time on earth to “live it up”, doing whatever you like, the worst that can happen to you is nothing.

If Hell exists and either universalism or annihilationism holds, it is curious that God would send his son Jesus to be sacrificed on a cross. Sacrificed for what? Perhaps, in God’s mind, it was a necessary bookkeeping maneuver, a box to be ticked before universalism or annihilationism could be implemented. There was, however, no need for Jesus to come and tell us to (say) be nice to one another, for there is no need to be nice to one another if nobody is sent to eternal Hell. There is no need to do anything, except what you want. Do what thou wilt shall be the whole of the law. The worst that can happen to you is non-existence, or a possible finite sentence in Hell.

If Hell does not exist, what are we to make of Jesus? Just after the parable of the talents, in which the “wicked, lazy servant” is cast “into the darkness outside, where there will be wailing and grinding of teeth”, Jesus, his very self, said:

When the Son of Man comes in his glory, and all the angels with him, he will sit upon his glorious throne, and all the nations will be assembled before him. And he will separate them one from another, as a shepherd separates the sheep from the goats.

He will place the sheep on his right and the goats on his left…

Then he will say to those [goats] on his left, ‘Depart from me, you accursed, into the eternal fire prepared for the devil and his angels.’

He ends by saying the evil “will go off to eternal punishment, but the righteous to eternal life” (my, and I wonder if the were also our Lord’s, emphasis).

If there is no Hell, Jesus was either lying, mistaken, or not quite in his right mind. Not just on this occasion, but on many.

Jesus said that “at the end of the age” that the “angels will go out and separate the wicked from the righteous and throw them into the fiery furnace, where there will be wailing and grinding of teeth.”

Jesus said:

If your hand causes you to sin, cut it off. It is better for you to enter into life maimed than with two hands to go into Gehenna, into the unquenchable fire. And if your foot causes you to sin, cut it off. It is better for you to enter into life crippled than with two feet to be thrown into Gehenna. And if your eye causes you to sin, pluck it out. Better for you to enter into the kingdom of God with one eye than with two eyes to be thrown into Gehenna, where their worm does not die, and the fire is not quenched.’

John, quoting from the source, said that “But as for cowards, the unfaithful, the depraved, murderers, the unchaste, sorcerers, idol-worshipers, and deceivers of every sort, their lot is in the burning pool of fire and sulfur, which is the second death.”

Jude said “Likewise, Sodom, Gomorrah, and the surrounding towns, which, in the same manner as they, indulged in sexual promiscuity and practiced unnatural vice, serve as an example by undergoing a punishment of eternal fire.”

Paul was convinced of Hell’s existence:

For it is surely just on God’s part to repay with afflictions those who are afflicting you, and to grant rest along with us to you who are undergoing afflictions, at the revelation of the Lord Jesus from heaven with his mighty angels, in blazing fire, inflicting punishment on those who do not acknowledge God and on those who do not obey the gospel of our Lord Jesus.

These will pay the penalty of eternal ruin, separated from the presence of the Lord and from the glory of his power, when he comes to be glorified among his holy ones* and to be marveled at on that day among all who have believed, for our testimony to you was believed.

There are many other quotations, but these are sufficient here. None are denied by those who believe in God and in the non-existence of Hell, or in those who believe in Hell but say no one goes there or no one goes there permanently. These people say Jesus’s direct warnings about Hell were admonitory and not meant to be taken literally. They say that this was Jesus’s way of calling people to proper behavior. If so, this is like a teacher threatening to send a student to detention, when both the teacher and the student know there is no such thing as detention. Or in threatening to punish a criminal with an imaginary whip. It is not only useless, but bizarre and dishonest.

Why would you follow a man who said such things?