Skip to content

Category: Statistics

The general theory, methods, and philosophy of the Science of Guessing What Is.

March 6, 2018 | 35 Comments

The Solution To The Doomsday Argument

Today, a classic re-post which deserves to be far better known.

The Doomsday Argument! No, not global warming. The one that predicts the total number of humans who will ever live. It’s also called the Carter catastrophe; the same Carter famous for the anthropic principle. Here’s the Wiki article (HT to reader Nate West).

To solve this problem, the only rule we need is this: All probability is conditional—and conditional only on the information provided. The idea is that you’re born, you notice your birth, and you reason that your place in the order of all human births is nothing special. From that, can we conclude how many more of us we expect? This situation is analogous, at first, to balls in a bag.

Our evidence is X = “There are N balls labeled 1 through N in a bag, from which only one will be removed.” The probability of Y = “The ball will have label j, where j is from 1 to N inclusive” is 1/N, via the statistical syllogism. We deduce via the language used that N is finite (no bag can hold an infinite amount of any real thing).

Reach into the bag and pull out the ball B. It will have a label; call it B = j. Our evidence is now augmented: we have in toto X’ = “X and The ball has label j”. What can we say about N? Well, given X’, the probability N is less than j is 0, and the probability N is at least j is 1, both of which are obvious. But what about these interesting and relevant probabilities (both given X’, naturally): “N equals j”, “N is greater than j”?

We do not know.

Why? Because there is no information in X or X’ about the possible values of N, except that N must be at least equal to j (given X and not X’), information which is deduced. Now mentally you might add information that is not provided, by, say, thinking to yourself, “This j is awfully low and that’s such a big bag; therefore, surely N is large.” Or “I know this Briggs, who is a trickster. He made the bag big on purpose. N is small.” Or anything, endlessly. None of these additions are part of the problem (the stated evidence), however, and all such moves are “illegal” in probability. You cannot use information not provided. It is against the law!

Now suppose we legally augment our X and, for fun, say that N is some number in the set S. We don’t need to know much about S, except that it exists, is finite, and contains only natural numbers. Thus, X now equals “There are N balls labeled 1 through N in a bag, from which only one will be removed; and N is a number in the set S.” Given X, the probability “N is s_i (one of the set S)” is 1/#S, where “#S” stands for the number of elements in S (its cardinality, if you like); thus, the probability “N = s_i” is 1/#S, where I’ll assume the s_i are increasing in i. What about the probability that the ball withdrawn has label j? Here it gets tricky, so let’s be careful.

The key lies in realizing the bounds of j are between 1 and the largest value of S. First suppose N = s_1. We want:

Pr(B = j | N = s_1, X).

This is 1/s_1 for j = 1 to s_1, and 0 for all those j up to s_I (the largest value of S). Now

Pr(B = j | N = s_2, X)

equals 1/s_2 for j = 1 to s_2, and 0 for all values up to s_I. From this, we notice we have to be careful about specifying j precisely. From total probability we know

Pr(B = j | X ) = Pr(B = j | N = s_1, X) * Pr(N=s_1|X) + … + Pr(B = j | N = s_I, X) * Pr(N=s_I|X)

and where knowledge of j is relevant to the probability. If j = 1, then

Pr(B = 1 | X ) = [(1/s_1) + … + (1/s_I)] * (1/#S)

but if j a number larger than, say, s_1 but smaller than s_2, then (call this j’)

Pr(B = j’ | X ) = [0 + (1/s_2) + … + (1/s_I)] * (1/#S)

and so forth for other j (don’t forget S is known).

The ball is withdrawn and B = j. Can we now say anything more about N? As before, there is 0 probability N is less than j, and so if j is greater than some s_i, there is 0 probability N equals those s_i. We can do more, using the good reverend’s rule, but it’s still tricky:

Pr(N = s_i | B = j, X) = Pr( B = j | N = s_i, X) * Pr( N = s_i | X) / Pr( B = j | X).

First suppose j = 1, then

Pr(N = s_i | B = 1, X) = [(1/s_i) * (1/#S)] / ([(1/s_1) + … + (1/s_I)] * (1/#S))

     = (1/s_i) / [ 1/s_1 + 1/s_2 + … + 1/s_I]

If you stare at that fraction for a moment, and recalling that the s_i are given in increasing number, you realize that values of smaller N are more probable than larger values. As a for-instance, suppose S = {20,21,…,40}, which has cardinality 21. Given X, the probability “B = 1″ is (1/20 + 1/21 + … + 1/40) * (1/21) = 0.02761295. Thus Pr(N = 20 | B = 1, X) = 0.04416451, Pr(N = 21 | B = 1, X) = 0.04206144, etc. out to Pr(N = 40 | B = 1, X) = 0.01472150. Notice that these probabilities do not change for j between 1 and 20.

In this same example, next let j = 21, then

Pr(N = s_i | B = 21, X) = Pr( B = 21 | N = s_i, X) * Pr( N = s_i | X) / Pr( B = 21 | X).

For “N = 20″, the first term on the right equals 0, and so Pr(N = s_i | B = 21, X) = 0, as desired. For “N = 21″, we have

Pr(N = 21 | B = 21, X) = Pr( B = 21 | N = 21, X) * Pr( N = 21 | X) / Pr( B = 21 | X).

     = [ (1/21) * (1/21) ] / ([0 + 1/21 + 1/22 + … + 1/40] * (1/21))

     = (1/21) / [0 + 1/21 + 1/22 + … + 1/40] = 0.06994537,

and for Pr(N = 22 | B = 21, X) = 0.06676604, out to Pr(N = 40 | B = 21, X) = 0.03672132.

Collecting all these tidbits leads to the conclusion that smaller (but not impossible) values of N are always more likely than larger, regardless of the value of j. Why? That’s easy. Before we see B, the possible values of N are s_1, s_2, and so on up to S_I, each equally likely. After we see B, some values of N (from S) might now be impossible, but since j will always be less than any remaining possible larger members of S, smaller values of N are closer to j than larger, thus smaller values are more likely. Simple as that.

What does this have to do with Doomsday? Everything. The crucial step was in conjuring the set S. Where did that come from? I made it up. S was known throughout second part of the calculations and unknown through the first part. When S was unknown, N was unknown, and there was nothing we could say about N except that it had to be as large as j. I mean nothing in its literal, logical sense. In that case, given only that you witness your birth order, your B = j that is, we are blind about the future of humanity.

When S was known, we had a rough idea of what N was, which we tightened slightly by learning where N might not be (by removing the ball). But for an S with large cardinality, we aren’t learning much by viewing B. S is what we started with, and something very like S is what we ended with. But this is cheating because I made the S up. We wanted N, of which we are ignorant, and then we pretend we know an S that tells us something but not everything about N! All the other solutions to the Doomsday argument I have seen also make up S, but then they add an extra layer of cheating. We posited a discrete finite S, from which deduced that N might equal any of its members with equal probability (before seeing B). But those who conjure up more creative S often fix the set so that smaller values of S are more likely (hence smaller values of N are more likely, even before we see B). Some form of exponential “distribution” for S is popular. Some even use non-probability arguments (called “improper priors”), which is triply cheating.

Once S is fixed, however it is fixed, the calculations flow in the same manner as above, but it’s easy to see that smaller values of N are always going to be more likely than larger, and that’s because the j will always be smaller (or no greater) than the maximum value of S. And given that some let S toodle out to infinity, it’s no shock at all to discover that N is not expected to be big.

Thus the Doomsday Argument is really a non-problem which includes its own answer in its formulation, which is cheating. Of course, it makes perfect sense to ask the question of how many of us there will be left, but trying to discover the answer using only your birth order is doomed to failure (beyond proving that N must be at least as large as j). Since all probability is conditional on only the information supplied, many different answers for our future numbers are possible. It’s easy to think of probative information: demographics, politics, epidemics, apocalypses (rocks from the sky, Christ’s return, etc.), and on and on. (Of course, some of these sets of information may lead to the guesses people have made about S.) I do not (now) have a good answer how to use these to put uncertainty on (the real) N.

Update Bayes’s theorem isn’t all that.

The difficulty lies in misunderstanding Bayes’s theorem, which some mistakenly write like this:

Pr(N = s_i | B = j) = Pr( B = j | N = s_i) * Pr( N = s_i ) / Pr( B = j ),

where the evidence about N in X is left off (finding the denominator is no problem because Pr( B = j ) = SUM_i Pr( B = j | N = s_i) * Pr( N = s_i )). Pr( N = s_i ) is thus “naked” (and violates the rule that all probability is conditional), yet users of Bayes’s theorem are trained to posit “priors” like this, and so posit one they do. It seems, say critics of the theory, that these priors are pulled from thin air. The critics are right. It’s completely arbitrary to conjure a Pr( N = s_i ), and so the resulting Pr(N = s_i | B = j) cannot be trusted. (I have much more about this kind of thing in my forthcoming book.)

Of course, I made up my own “prior”, but referenced as being a deduction from X. The probability Pr(N = s_i | B = j, X) is thus true. The attention then focuses on X, where it belongs. Why this X? No reason at all. If we’re after the best information about N, that is what should go into X. But it has to be information that is not N itself, like my S was. My S was merely a presumption that I already knew a lot about N; it was N by proxy, but a fuzzy proxy. Cheating, like I said.

It’s not Bayes’s theorem that’s the problem. It works just fine when we supplied information in X about S. But it also worked dandy when X was just “There are N balls labeled 1 through N in a bag, from which only one will be removed.” I didn’t display the equation at the time, but it’s there. I’ll leave it as homework for you to show.

Update I’m graduating a comment I made in reply to Steve Brookline to the main post, because it highlights what I think is the central error people make in the DA. SB’s comments should be examined for orientation. I’m repeating them here in concise form.

A standard application of the DA starts by asking for this:Pr(N < 20j) (the 20 comes from the magic number in statistics). Note the missing conditions. Accepting the bare notation, then Pr(N < 20j) = Pr(N/20 < j) = Pr(j > N/20) = 1 – Pr(j <= N/20) = 1 – 0.05 = 0.95. It is said Pr(j <= N/20) = 0.05 because j is “uniform” or is “uniformly distributed”, as if probability has life. The fatal error has been made, because we notice that this result appears to hold regardless what value N or j has. But there just is no such thing as “Pr(N < 20j)”.

We have to be careful with the notation. There is no such thing as unconditional probability, and when you drop the conditions, which often makes manipulating the equations easier, you run the risk of introducing error, which is what happens in the standard doomsday argument. Here’s what we want.

Pr(N < 20*j | B = j, X) = Pr(B = j | N < 20*j, X) * P(N < 20*j | X) / Pr(B = j | X).

(For why we want this, see SB’s comments.) Now X can be anything relevant; it as least says there are balls 1 through N, but it must also say something about N (directly or implied).

Suppose X contains information that N is in the set {1, 2, …, 19}. Then Pr(N < 20*j |X) = 1 for any j. Never forget j runs from 1 to N, which is where things go awry: j is (in the classical language) dependent on N; in the new (and proper) language, knowledge of N is relevant to knowledge of j.

This is it: it appears, because of loose notation, many forget that j and N are related. Steve used the notion of cutting a string; but of course, that can only be done quantumly (i.e. discretely), so the example is the same. Knowledge of the place j where you cut depends on knowledge of the length of the string N, and vicesy versey.

You can work it out, but the result is the right-hand-side is 1/1, and thus Pr(N < 20*j | B = j, X) = 1, as expected. So right here is all the proof I need to show that at least one “prior” on N ruins that 95% finding.

Here’s another one. Suppose X says N = 20. Then Pr(N < 20*j |X) = 0 for j = 1, and Pr(N < 20*j |X) = 1 for j > 1. Again, you can work it out, but it amounts to the same thing, that Pr(N < 20*j | B = j, X) = 0 when j = 1, else it equals 1 for all other j.

Again, suppose X says N is in set {20, 21, …, 40}. Starts to get interesting. I leave this one as a homework, too.

More about the DA is in my book Uncertainty.

March 1, 2018 | 13 Comments

Infinity — Part 1 of N

We’ve discussed infinity before, but since the subject in inexhaustible, we’re discussing it again.

Infinity comes in sizes, as many readers know. What’s less appreciated is how these sizes relate to questions in epistemology, physics, even theology. We’ll explore some of these facets in future articles. For now, some basics.

The first and smallest infinity is the set of natural numbers. This infinity is not small. All together now: Just how big is it? We are be tempted to say it is incomprehensibly big, and there is some truth to that, but the sticking point is that difficult word comprehensible.

Let’s see if we can get a feel for this tiniest of infinities. Count 1, 2, 3, and keep going. Never stop. Eventually we get to 1010, which is 10 billion. How far is that from the end? Infinitely far away. Take that billion and make it the exponent, i.e. 10billion. How far is that one from the end? Same answer: an infinite distance.

Now that might seem like a lot of zeros after the 10, but it’s a pittance, not even a drop. Exponents are not very handy for counting really large numbers, so let’s work with tetrations. They look just like exponents, except the superscript is on the other side. So 210 = 10^10^10, and 310 = 10^10^10^10, and so on.

We’re getting really big here. Consider B = billion10, which is 10^10^… a billion times. Big number! Bigger than we will ever need for any counting of physical objects. But it’s still infinitely far from the end. Next try BB = B10, which is 10^10^… not just a billion times, but B times. This is so huge it can’t be well thought of. But it’s still tiny compared to the tiniest infinity.

Well, we can keep going, BBB = BB10, BBBB = BBB10, et cetera. Do that B times, then do it BB more times, then BBB more times, and then…you get the idea. You’ll eventually stop, and come to a finite number that is beyond anybody’s ability to grasp (if B isn’t already). But whatever this number is, it is still infinitely far away from the smallest infinity.

What I’m trying to imbue in you is an appreciation how mind-bogglingly big the first infinity is. The number is so large that numbers less than it are all we would ever need if we’re interested in counting (and, yes, many mathematicians call this infinity a number).

After this “simple” counting infinity probably comes what are called, in a playful fit of whimsy, real numbers. I say “probably” because nobody knows for sure if there is another kind of infinity between the counting kind and the so-called continuum, where the reals live. That there is no differently sized infinity between the counting numbers and the reals is called the continuum hypothesis, which most believe is true, but nobody knows how to prove.

Anyway, one way to think about the reals, is to take any two counting numbers, like 1 and 2, and imagine stuffing an infinite number of numbers in between. Count these “stuffing numbers” however you like. Then take any two of these next to one another, still inside 1 and 2, and stuff another infinity of numbers in between them. Like 1.1000000001 and 1.1000000002, and stuff an infinity between them. And keep doing this for any two successive numbers.

You can go on packing numbers into the gaps like this until you come to a point where you have formed a dense flood of numbers, the succession infinitesimally increasing.

The problem, which may be obvious to you, is that if you’re not careful, this infinity doesn’t seem as big as the counting infinity. That’s because it’s impossible, at least for me, to envision what an infinitely dense succession of numbers look like, when I can’t even tell you what BBBBBBBBB10 looks like. Best I can do is to tell you that the number of natural numbers between 1 and BBBBBBBBB10 is infinitely smaller than the number of reals between 1 and 2.

Strangely, counting big natural numbers is hard, yet working with reals is easy. That ease produces in mathematicians a sort of hubris, or rather, forgetfulness. We’re so used to calculating with reals that we forget just how impossible large the continuum is. The forgetfulness arises when we try and apply real-number equations to things in existence. Are there any actual objects that correspond to the continuum? Depends on what you mean by “actual.”

Skip that question—for now—and think of this. As big as the natural counting infinity is, and as infinitely larger are the reals, there are more infinities larger still. They are comprehensible only in the sense we know they exist, and because we know minimal things about them, such as their ordering. But I don’t think anybody grasps what these numbers are really like. Not when we can’t even say what billion Bs tetrated to a billion BBBB10 is like.

So how many sized infinities are there? And what might this have to do with a proof of God’s existence?
Great question. We’ll do that another time. For those who are adept at math and want to read more, I recommend the paper “Infinite Sets and Infinite Sizes” by the very aptly named Gary Hardegree.

February 27, 2018 | 6 Comments

Some Doctors Want More People Taking Antidepressants

There are calls for “at least a million more Britons” to be put on antidepressants. This is odd because Britain’s National Health Service already “prescribed a record number of antidepressants” in 2016.

That represented “a massive 108.5% increase on the 31 [million] antidepressants which pharmacies dispensed in 2006.” In the States, one estimate is that 12% are already on these drugs.

Still, the clarion for ever more drugs was signaled after the results from a new statistical analysis were announced.

The study was “Comparative efficacy and acceptability of 21 antidepressant drugs for the acute treatment of adults with major depressive disorder: a systematic review and network meta-analysis.” It was led by Andrea Cipriani and published in The Lancet.

Do antidepressants alleviate or ameliorate the suffering caused by acute major depressive disorder? In some cases, the analysis says the answer appears to be yes. Which means that in some cases, the answer appears to be no. This is another way of saying that antidepressants don’t always work, or do not work for all people all of the time.

And that means that, at least for some, placebos are as “effective” as the active chemicals in antidepressants. The authors admit “Depressive symptoms tend to spontaneously improve over time and this phenomenon contributes to the high percentage of placebo responders in antidepressant trials.”

Placebos, it should go without saying, do not carry any risk of side effects. Actual drugs do; about which, more in a moment.

Caution Over the Results

Now this was not an original study, but a re-look at old studies called a “meta-analysis.” As a statistician, I only often half-jokingly say that meta-analyses are conducted to “prove” what individual studies could not. If the results from individual researches were clear and robust, meta-analyses would hardly be needed. On the other hand, a meta-analysis can provide a vantage individual studies cannot. The limitations of the method must be kept in mind.

Only studies that treated acute depression were examined here. What about side-effects? Cipriani cautioned “that some of the adverse effects of antidepressants occur over a prolonged period, meaning that positive results need to be taken with great caution, because the trials in this network meta-analysis were of short duration.”

The result of the meta-analysis indicate antidepressant effectiveness is not strong, classed as medium to small effect sizes. The authors warn “Given the modest effect sizes, non-response to antidepressants will occur.” Meaning not all who are given drugs will react to them.

Now the study’s reported statistical measures are highly specialized and take definite meaning only inside a mathematical system. The details are too technical to go into, but naive use of reported measures can exaggerate effectiveness.

If you’re not taking so many pills you can’t see straight, click here to read the rest.

February 21, 2018 | 15 Comments

Science Is Not The Most Important Subject

Stream: Science Is Not The Most Important Subject

What’s with all the kowtowing to science among religious folks?

As soon as a scientist, or science cheerleader, starts talking about the “unbridgeable” divide between religion and science, a Christian apologist trots out and pleads “There is no contradiction between science and Christianity.”

Well, there isn’t. But the Christian has the wrong attitude. There is no need of meek acceptance of science’s superior ground. Science does not hold the hill. It is down in the valley boasting big. Christians need to recognize this. When a scientist starts waving his slide rule around in a menacing manner, the Christian should say What is wrong with you people?

The Limitations of Science

Science is terrific. But isn’t everything. It isn’t even most things. Knowing the weight of a neutrino won’t tell you why stealing is a sin. Neither can positing some mathematical formula for altruism and selfish genes tell you why men cooperate. All arguments along this line are circular or invalid, anyway. They either assume what they want to know, like that rape is wrong. Or they assume that alone among men, the scientist has escaped the pull of his biology and can tell you how things really are.

Look. Figuring how to create a magnetic monopole won’t get you into Heaven. It won’t keep you out, either. So why are scientists so combative about religion?

The suspicion—more like the raw, rabid hope—of some scientists is that a culture which embraces science will eschew religion. Science will allow humanity to leave its infancy behind and lead it to a bright, happy future where everybody goes around chatting about the reproductive habits of newts.

But not discussing why it’s right wrong to kill yourself. Scientists figure they can handle those tough questions themselves, and then tell the rest of us their “discovery.” This is a vain hope.

The Unmeasurable Cannot Be Measured

Science can speak only about the measurable properties of things. That’s it. Nothing more. About elementary fermions science is teeming with a lot o’ news. It has many cheerful facts about your brainwaves when you take a snooze.

[]

You can click here and observe the words at the link, but science will never tell you why they are important.