*Note: this post ran originally on 1 December 2015, but because I’ve seen the DA crop up here and there recently, I thought it a good time to revisit. Nothing has been changed, except I corrected a typo in which I removed a stray ‘$’ from the LaTeX. More about the DA is in my book Uncertainty.*

The Doomsday Argument! No, not global warming. The one that predicts the total number of humans who will ever live. It’s also called the Carter catastrophe; the same Carter famous for the anthropic principle. Here’s the Wiki article (HT to reader Nate West).

To solve this problem, the only rule we need is this: All probability is conditional—and conditional *only* on the information provided. The idea is that you’re born, you notice your birth, and you reason that your place in the order of all human births is nothing special. From that, can we conclude how many more of us we expect? This situation is analogous, at first, to balls in a bag.

Our evidence is X = “There are N balls labeled 1 through N in a bag, from which only one will be removed.” The probability of Y = “The ball will have label j, where j is from 1 to N inclusive” is 1/N, via the statistical syllogism. We deduce via the language used that N is finite (no bag can hold an infinite amount of any real thing).

Reach into the bag and pull out the ball B. It will have a label; call it B = j. Our evidence is now augmented: we have *in toto* X’ = “X and The ball has label j”. What can we say about N? Well, given X’, the probability N is less than j is 0, and the probability N is at least j is 1, both of which are obvious. But what about these interesting and relevant probabilities (both given X’, naturally): “N equals j”, “N is greater than j”?

*We do not know*.

Why? Because there is *no* information in X or X’ about the possible values of N, except that N must be at least equal to j (given X and not X’), information which is deduced. Now *mentally* you might *add* information that is *not provided*, by, say, thinking to yourself, “This j is awfully low and that’s such a big bag; therefore, surely N is large.” Or “I know this Briggs, who is a trickster. He made the bag big on purpose. N is small.” Or anything, endlessly. *None* of these additions are part of the problem (the stated evidence), however, and all such moves are “illegal” in probability. You cannot use information not provided. It is against the law!

Now suppose we legally augment our X and, for fun, say that N is some number in the set S. We don’t need to know much about S, except that it exists, is finite, and contains only natural numbers. Thus, X now equals “There are N balls labeled 1 through N in a bag, from which only one will be removed; and N is a number in the set S.” Given X, the probability “N is s_i (one of the set S)” is 1/#S, where “#S” stands for the number of elements in S (its cardinality, if you like); thus, the probability “N = s_i” is 1/#S, where I’ll assume the s_i are increasing in i. What about the probability that the ball withdrawn has label j? Here it gets tricky, so let’s be careful.

The key lies in realizing the bounds of j are between 1 and the largest value of S. First suppose N = s_1. We want:

Pr(B = j | N = s_1, X).

This is 1/s_1 for j = 1 to s_1, and 0 for all those j up to s_I (the largest value of S). Now

Pr(B = j | N = s_2, X)

equals 1/s_2 for j = 1 to s_2, and 0 for all values up to s_I. From this, we notice we have to be careful about specifying j precisely. From total probability we know

Pr(B = j | X ) = Pr(B = j | N = s_1, X) * Pr(N=s_1|X) + … + Pr(B = j | N = s_I, X) * Pr(N=s_I|X)

and where knowledge of j is relevant to the probability. If j = 1, then

Pr(B = 1 | X ) = [(1/s_1) + … + (1/s_I)] * (1/#S)

but if j a number larger than, say, s_1 but smaller than s_2, then (call this j’)

Pr(B = j’ | X ) = [0 + (1/s_2) + … + (1/s_I)] * (1/#S)

and so forth for other j (don’t forget S is *known*).

The ball is withdrawn and B = j. Can we now say anything more about N? As before, there is 0 probability N is less than j, and so if j is greater than some s_i, there is 0 probability N equals those s_i. We can do more, using the good reverend’s rule, but it’s still tricky:

Pr(N = s_i | B = j, X) = Pr( B = j | N = s_i, X) * Pr( N = s_i | X) / Pr( B = j | X).

First suppose j = 1, then

Pr(N = s_i | B = 1, X) = [(1/s_i) * (1/#S)] / ([(1/s_1) + … + (1/s_I)] * (1/#S))

= (1/s_i) / [ 1/s_1 + 1/s_2 + … + 1/s_I]

If you stare at that fraction for a moment, and recalling that the s_i are given in increasing number, you realize that values of smaller N are more probable than larger values. As a for-instance, suppose S = {20,21,…,40}, which has cardinality 21. Given X, the probability “B = 1″ is (1/20 + 1/21 + … + 1/40) * (1/21) = 0.02761295. Thus Pr(N = 20 | B = 1, X) = 0.04416451, Pr(N = 21 | B = 1, X) = 0.04206144, etc. out to Pr(N = 40 | B = 1, X) = 0.01472150. Notice that these probabilities do not change for j between 1 and 20.

In this same example, next let j = 21, then

Pr(N = s_i | B = 21, X) = Pr( B = 21 | N = s_i, X) * Pr( N = s_i | X) / Pr( B = 21 | X).

For “N = 20″, the first term on the right equals 0, and so Pr(N = s_i | B = 21, X) = 0, as desired. For “N = 21″, we have

Pr(N = 21 | B = 21, X) = Pr( B = 21 | N = 21, X) * Pr( N = 21 | X) / Pr( B = 21 | X).

= [ (1/21) * (1/21) ] / ([0 + 1/21 + 1/22 + … + 1/40] * (1/21))

= (1/21) / [0 + 1/21 + 1/22 + … + 1/40] = 0.06994537,

and for Pr(N = 22 | B = 21, X) = 0.06676604, out to Pr(N = 40 | B = 21, X) = 0.03672132.

Collecting all these tidbits leads to the conclusion that smaller (but not impossible) values of N are always more likely than larger, regardless of the value of j. Why? That’s easy. *Before* we see B, the possible values of N are s_1, s_2, and so on up to S_I, each equally likely. *After* we see B, some values of N (from S) might now be impossible, but since j will always be less than any remaining possible larger members of S, smaller values of N are closer to j than larger, thus smaller values are more likely. Simple as that.

What does this have to do with Doomsday? Everything. The crucial step was in conjuring the set S. Where did that come from? I made it up. S was known throughout second part of the calculations and unknown through the first part. When S was unknown, N was unknown, and there was *nothing* we could say about N except that it had to be as large as j. I mean *nothing* in its literal, logical sense. In that case, given *only* that you witness your birth order, your B = j that is, we are blind about the future of humanity.

When S was known, we had a rough idea of what N was, which we tightened slightly by learning where N might not be (by removing the ball). But for an S with large cardinality, we aren’t learning much by viewing B. S is what we started with, and something very like S is what we ended with. But this is cheating because I made the S up. We wanted N, of which we are ignorant, and then we pretend we know an S that tells us something but not everything about N! All the other solutions to the Doomsday argument I have seen also make up S, but then they add an extra layer of cheating. We posited a discrete finite S, from which deduced that N might equal any of its members with equal probability (before seeing B). But those who conjure up more creative S often fix the set so that smaller values of S are more likely (hence smaller values of N are more likely, even before we see B). Some form of exponential “distribution” for S is popular. Some even use non-probability arguments (called “improper priors”), which is triply cheating.

Once S is fixed, however it is fixed, the calculations flow in the same manner as above, but it’s easy to see that smaller values of N are always going to be more likely than larger, and that’s because the j will always be smaller (or no greater) than the maximum value of S. And given that some let S toodle out to infinity, it’s no shock at all to discover that N is not expected to be big.

Thus the Doomsday Argument is really a non-problem which includes its own answer in its formulation, which is cheating. Of course, it makes perfect sense to ask the question of how many of us there will be left, but trying to discover the answer using only your birth order is doomed to failure (beyond proving that N must be at least as large as j). Since all probability is conditional on only the information supplied, many different answers for our future numbers are possible. It’s easy to think of probative information: demographics, politics, epidemics, apocalypses (rocks from the sky, Christ’s return, etc.), and on and on. (Of course, some of these sets of information may lead to the guesses people have made about S.) I do not (now) have a good answer how to use these to put uncertainty on (the real) N.

**Update** Bayes’s theorem isn’t all that.

The difficulty lies in misunderstanding Bayes’s theorem, which some mistakenly write like this:

Pr(N = s_i | B = j) = Pr( B = j | N = s_i) * Pr( N = s_i ) / Pr( B = j ),

where the evidence about N in X is left off (finding the denominator is no problem because Pr( B = j ) = SUM_i Pr( B = j | N = s_i) * Pr( N = s_i )). Pr( N = s_i ) is thus “naked” (and violates the rule that all probability is conditional), yet users of Bayes’s theorem are trained to posit “priors” like this, and so posit one they do. It seems, say critics of the theory, that these priors are pulled from thin air. The critics are right. It’s completely arbitrary to conjure a Pr( N = s_i ), and so the resulting Pr(N = s_i | B = j) cannot be trusted. (I have much more about this kind of thing in my forthcoming book.)

Of course, I made up my own “prior”, but referenced as being a deduction from X. The probability Pr(N = s_i | B = j, X) is thus *true*. The attention then focuses on X, where it belongs. Why this X? No reason at all. If we’re after the best information about N, that is what should go into X. But it has to be information that is not N itself, like my S was. My S was merely a presumption that I already knew a lot about N; it was N by proxy, but a fuzzy proxy. Cheating, like I said.

It’s not Bayes’s theorem that’s the problem. It works just fine when we supplied information in X about S. But it also worked dandy when X was just “There are N balls labeled 1 through N in a bag, from which only one will be removed.” I didn’t display the equation at the time, but it’s there. I’ll leave it as homework for you to show.

**Update** I’m graduating a comment I made in reply to Steve Brookline to the main post, because it highlights what I think is the central error people make in the DA. SB’s comments should be examined for orientation. I’m repeating them here in concise form.

A standard application of the DA starts by asking for this:Pr(N < 20j) (the 20 comes from the magic number in statistics). Note the missing conditions. Accepting the bare notation, then Pr(N < 20j) = Pr(N/20 < j) = Pr(j > N/20) = 1 – Pr(j <= N/20) = 1 – 0.05 = 0.95. It is said Pr(j <= N/20) = 0.05 because j is “uniform” or is “uniformly distributed”, as if probability has life. The fatal error has been made, because we notice that this result appears to hold regardless what value N or j has. But there just is no such thing as “Pr(N < 20j)”.

We have to be careful with the notation. There is no such thing as unconditional probability, and when you drop the conditions, which often makes manipulating the equations easier, you run the risk of introducing error, which is what happens in the standard doomsday argument. Here’s what we want.

Pr(N < 20*j | B = j, X) = Pr(B = j | N < 20*j, X) * P(N < 20*j | X) / Pr(B = j | X).

(For why we want this, see SB’s comments.) Now X can be anything relevant; it as least says there are balls 1 through N, but it must also say *something* about N (directly or implied).

Suppose X contains information that N is in the set {1, 2, …, 19}. Then Pr(N < 20*j |X) = 1 for any j. Never forget j runs from 1 to N, which is where things go awry: j is (in the classical language) dependent on N; in the new (and proper) language, knowledge of N is relevant to knowledge of j.

*This is it*: it appears, because of loose notation, many forget that j and N are related. Steve used the notion of cutting a string; but of course, that can only be done quantumly (i.e. discretely), so the example is the same. Knowledge of the place j where you cut depends on knowledge of the length of the string N, and vicesy versey.

You can work it out, but the result is the right-hand-side is 1/1, and thus Pr(N < 20*j | B = j, X) = 1, as expected. So right here is all the proof I need to show that at least one “prior” on N ruins that 95% finding.

Here’s another one. Suppose X says N = 20. Then Pr(N < 20*j |X) = 0 for j = 1, and Pr(N < 20*j |X) = 1 for j > 1. Again, you can work it out, but it amounts to the same thing, that Pr(N < 20*j | B = j, X) = 0 when j = 1, else it equals 1 for all other j.

Again, suppose X says N is in set {20, 21, …, 40}. Starts to get interesting. I leave this one as a homework, too.

Vincent Price found out his B = N in https://en.wikipedia.org/wiki/The_Last_Man_on_Earth_%281964_film%29

And in another more recent fictional artistic effort, COP21 is predicting B is not too far from N.

“Given X, the probability “N is s_i (one of the set S)” is 1/#S,”

I am confused by this. Are we just taking for granted that N was selected from the set S by some method assigning equal probability to each element of S? I didn’t notice anything in the suppositions that would necessarily imply that.

KiteX3,

Good question. The language you used to ask it speaks of a view of probability which makes it real in some sense “…that N was selected from the set S by some method…”

Nothing like that is true. Probability represents only our ignorance. N

isone of the set of S, we know not which. N is not “selected” in any sense at all. N iscausedto be whatever number it is by, in this case, innumerable forces. If we knew these causes, we’d know N. But we don’t; and, as I’ve said, my S is a cheat.By why, accepting the cheating, is N equally likely to be any of the s_i? Well, we derive that via the statistical syllogism, which itself relies on the symmetry of logical constants. We deduce that our

uncertainty inN is that N can be any one of S’s members, and that each is equally likely. This is not because we are “indifferent”; it is because that is what the evidence we have demands. After seeing B, the evidence calls for a different understanding of the uncertainty in N.Additional evidence to add: “You do not know the day nor the hour”

The whole doomsday argument is new to me, and it’s just as silly sounding as the Drake Equation. Sure, you can you throw together some math that answers a question, but if the evidence for the question comes from someone’s keyster then I don’t know what they want me to do with their answer.

I think I might disagree with Mr. Briggs here, though I’m not sure. If you take a piece of string of length M and cut it at a point chosen

uniformly at random, then tell me the length L of one of the two pieces (chose one of the two uniformly at random) then Prob(M<20*L)=0.95.

I don't think it matters at all what the prior for M is.

SteveBrooklineMA,

Have a stab at defining this formally, using Bayes, for instance. Don’t forget to include the conditions in your probability. There is no such thing as “Pr(M<20*L)", but there could be some "Pr(M<20*L | something)".

Update: to save time, and keeping things discrete for ease:

Pr(N < 20*j | B = j, X) = Pr(B = j | N < 20*j, X) * P(N < 20*j | X) / Pr(B = j | X).

Spot the curiosity?

Briggs-

I’ll ponder! Meanwhile, in terms of your setup, the claim is that if we have a bag of N balls, and we pull out ball j, then

Prob( N < 20 j )

= Prob( N/20 < j)

= 1 – Prob(j= 1-(N/20)/N

= 0.95

I think this is pretty much the Doomsday Argument.

Steve B,

Yep, that’s their argument. The trick is spotting the flaw. I owe a piece to the Stream. Once I’m finished, if you haven’t had your Aha! moment yet, I’ll update with the answer. There’s a big hint in my previous comment, in the way I wrote the equation.

Ugh. Symbol for “less than or equal” causes problems.

Prob( N < 20 j )

= Prob( N/20 < j)

= 1 – Prob(j = 1-(N/20)/N

= 0.95

Well, I see that you would need some way of calculating the terms in Bayes Rule that you’ve written above. But perhaps I was being sloppy, using j instead of B:

Prob(N less than 20*B)

= Prob(N/20 less than B)

= 1- Prob(B less than or equal to N/20)

= 1 – floor(N/20)/N

>= 1-(N/20)/N

= 0.95

And no, I don’t think you need any prior on N to say that. N is a parameter, B is uniform on 1,2,…N. Regardless of what N is, or what distribution it might have, the estimate holds. Can I say that if X is normally distributed with mean mu and standard deviation sigma, then Prob(X<mu)=1/2? I don't think I need a distribution on mu to say that.

Steve B,

We have to be careful with the notation. There is no such thing as unconditional probability, and when you drop the conditions, which often makes manipulating the equations easier, you run the risk of introducing error, which is what happens in the standard doomsday argument. Here’s what we want.

Pr(N < 20*j | B = j, X) = Pr(B = j | N < 20*j, X) * P(N < 20*j | X) / Pr(B = j | X).

Now X can be anything relevant; it as least says there are balls 1 through N, but it must also say

somethingabout N (directly or implied).Suppose X contains information that N is in the set {1, 2, …, 19}. Then Pr(N < 20*j |X) = 1 for any j. Never forget j runs from 1 to N, which is where things go awry: j is (in the old language) dependent on N; in the new language, knowledge of N is relevant to knowledge of j. You can work it out, but the result is the right-hand-side is 1/1, and thus Pr(N < 20*j | B = j, X) = 1, as expected. So right here is all the proof I need to show that at least one “prior” on N ruins that 95% finding.

Here’s another one. Suppose X says N = 20. Then Pr(N < 20*j |X) = 0 for j = 1, and Pr(N < 20*j |X) = 1 for j > 1. Again, you can work it out, but it amounts to the same thing, that Pr(N < 20*j | B = j, X) = 0 when j = 1, else it equals 1 for all other j.

Again, suppose X says N is in set {20, 21, …, 40}. Starts to get interesting. I leave this one as a homework, too.

In your example, what you have to write to be proper is, Pr (X < mu | mu = some number, sigma = another number, details about normal). In that case, yes, the probability is 1/2, but only because we deduce that from all that information after the bar.

With humans, you can only be born while there are humans around giving birth. So instead of drawing just yourself from the bag, you need to pull out 7 billion of them right now.

Which makes the argument a bit different. For instance, with a growing population there is no change at all that you can have 7 billion people alive at when there are just 6 billion of them living. So a growing population is always living at the very begining of the interval, there is not much change of having some Copernican position.

Sander,

That kind of information is usually ignored in the first-order attacks of the problem. It’s highly relevant to discovering information about the real N, of course.

@Briggs

shouldn’t the ball drawing process have some resemblance to human procreation? You should use two different colors, and you can only draw balls from the sack wile there are at least two colors on the table. After some time you put random balls away, not in the sack, which is equivalent to assuming there is no reincarnation.

This whole fixed number of people in the sack reminds me too much of predestination 😉

Sander,

Well, not if all you care about is the birth order and your supposed ordinariness. If you really care to nail N, you have to do what you suggest and say something (at least) about real procreation.

Also note that this argument form is used in physics to discuss inflation and other cosmological ideas. It’s much more important than just estimating the number of people.

So do we agree that 0.95 is less than or equal to Prob( N is less than or equal to 20 B | a natural number N, B uniform on 1…N, X )? That to me is the Doomsday Argument.

Steve B,

No, we don’t.

“The idea is that you’re born, you notice your birth, and you reason that your place in the order of all human births is nothing special.” – Briggs

Why make the (rather baseless, in my view) assumption that your birth is ‘nothing special’? All that follows in your article is based upon what is, after all, just accepted as being axiomatic when it might possibly not be, and if it turns out that one’s birth order IS special, then your entire argument just falls to pieces. What I would like to know is how you ‘reasoned’ your way into believing what you do here.

The cardinality of S= {20,21,…,40} is 21, not 41.

It is is confusing and is not the usual practice to use the same notation X to represent different sets of background information.

made up…thus true.Isn’t there a contradiction here? What does it mean to say the probability is true? True?Steve B,

For presentation simplicity, the background information X = “There are N balls labeled 1 through N in a bag, from which only one will be removed” is omitted in all the probabilities below. Given X and that we observe a ball labeled j (B=j), the Bayesian setup:

Likelihood function: Pr(B=j|N=n)

Prior: Pr(N=n)

Posterior: Pr(N=n|B=j)= Pr(B=j |N=n)*Pr(N=n)/Pr(B=j)

(Conditional probability definition, total probability and Bayes’ rules are used to derive the posterior probability distribution.)

So, you calculate the likelihood, and indeed the probability Pr(B ≤ n/20 |N=n) = floor(n/20)/n.

However, Briggs calculates the posterior probability P(N< 20*j | B = j).

As usual, the postulation of the prior probability distribution, which involves possible values of N and its probability distribution, can be problematic.

JH,

Good catch on the typo. It’s fixed.

But there just is no such thing as “Pr(N=n)” and so forth, because there is no such thing as probability which isn’t conditional.

Briggs,

Well-defined probabilities require premises. Trivial fact. Writing out all the premises in math probability statements is a bit silly for my taste (especially when the premises are clearly stated in my comment). If one uses the notation “|”, it also implies that one can apply conditional probability defintion (cpd) to all events or propositions, but but can you apply cpd to Pr(B=j|X), where B and X are defined in my previous comment? Some philosophers use a subscript Pr_X(B=j) to differentiate the two kinds of “conditionality.”

JH,

That “trivial fact” leads to the conclusion that all probability must be defined as conditional (“cpd” in your words). That is the point in dispute. The notation you used, as I said, can be used without harm if one is careful. But many times the “bare” notation leads to mistakes, as it is has done here, and as I proved. I think I’ll graduate my comments to the main post so they’re not missed.

Thanks JH , DECEMBER 1, 2015 AT 10:28 PM, that makes sense to me. I was simply taking N as a given, not a random variable. Writing N=n explicitly makes it clearer. Though I’m not sure Briggs agrees that

Pr(B less than or equal to n/20 |N=n, B uniform, whatever else) = floor(n/20)/n.

Steve B,

JH is wrong, but wrong in the standard way. The term “random variable” needs to die just as much as p-values, confidence intervals, and the like.

I do not agree with your equation, and I gave a

proofwhy not. Pay particular attention to the discussion of how knowledge of where you cut and N are relevant.So, why am I wrong? In the standard way? Again, please do not infer/imagine more than what I have written.

Shouldn’t the rational belief we have in the model itself be part of the conditionals? Briggs argues that that is not the case, but there is some information available that is not taken into account, namely the number of people alive today, and the number of people being alive in the past, and how long that past is supposed to be.

Combining those numbers we can see that the number of people alive today is very dissimilar to the number of people alive at any one day in most of the past is orders of magnitudes less. Which means that the Copernican Principle should be expected to give a very good answer.

BTW, cosmologist always state that use of the Copernican Principle has to be checked, which they then do. The Principle generates observation proposals that can then be used to refine the models. It is not the end point of the discussion.

Sorry, the Copernican principle shout NOT be expected to …

Sander,

Of course all of these are relevant things, as are the other matters you pointed out; relevant for truly understanding how many more of us there are to come. But they are not part of the “official” DA. The DA only seeks to show how much time is left given you witness your birth order.

Isn’t this the German Tank Problem in a prettier dress?

Your (or Mr Bayes) methodology appeared to have been applied to a real situation, estimate of German tank production:

https://www.wired.com/2010/10/how-the-allies-used-math-against-german-tanks/

The article provides the equation from the question:

“Suppose one is an Allied intelligence analyst during World War II, and one has some serial numbers of captured German tanks. Further, assume that the tanks are numbered sequentially from 1 to N. How does one estimate the total number of tanks?”