# Symmetry, Priors, Logical Probability, Infinities, and Needless Paradoxes

One reason why some reject the notions of logical probability and Bayesian statistics is because it is said that assignments of probability under symmetry generate paradoxes. However, as I will show, this is only so only for illegal jumps to hyperspace; that is, excursions to infinity.

Consider this poser by Joseph Betrand: We have knowledge that a cube has sides less than or equal to 2 cm. Now, according to symmetry, or Keyne’s Principle of Indifference, given this evidence E, the probability the cube has sides less than or equal to 1 cm equals the probability the sides are greater than 1 cm but less than or equal to 2 cm.

However, E also tells us we have a cube, and cubes have volume. If the sides are 2 cms or less, then the volume is 8 cm^{3} or less. And this is where the trouble starts, for we can invoke symmetry again and say the probability that the cube has volume 4 cm^{3} or less, given E, should be equal to the probability the cube has volume from 4 cm^{3} to 8 cm^{3}.

But a cube with the volume of 4 cm^{3} has sides equal to the cube root of 4 cm^{3}, or about 1.59 cm. Trouble! Because we now have symmetry telling us that the probability the cube has sides less than 1 cm is the same as it having sides less than 1.59 cm. Oops.

What to do? Well, if you’re like most people, you toss out logical and Bayesian probability. Worse, there are dozens of examples like this. For the fertile mine, generating paradoxes is easy! Really, every time a Bayesian assigns a prior, he runs into this trouble: merely changing the units of measurement is often enough to incur this bizarre inconsistency.

But there is more to this problem than meets the mind. Logicians, philosophers, and statisticians have been too quick to dismiss logical probability on these grounds. Here’s why.

E does *not* just contain knowledge that the cube is less than 2 cm. Implicit in E is the idea that the cube is *infinitely divisible* This fact, known but appreciated, is the cause of *all* difficulties.

How many cubes do you know of, in real life, that are infinitely divisible? I say there are none. Even at the submicroscopic level, a cube—an actual, fleshy *there-it-is* cube^{a}—is made of discrete building blocks. These blocks are not infinite in number, nor are they infinitesimally small. They are finite and of definite, non-zero size.

Let’s take one of these real cubes; say, a cube which is made of blocks, each 1 cm on a side. We can now re-state our original problem: we have knowledge that a cube has sides less than or equal to 2 cm. This knowledge—and the knowledge that the real cube is made of a finite number of discrete blocks—forms our new E. Now under symmetry, or via the principle of indifference, the probability that the cube’s sides are 1 cm or less is again equal to the probability that the sides are greater than 1 cm.

What about volume? Well, given E, what are the possibilities? Only two: the volume can be 1 cm^{3} just in case the length of the cube’s side are 1 cm, or the volume can be 8 cm^{3} just in case the length of the cube’s sides are 2 cm.

*The volume can take no other values but these two!* Under E, we know we have a cube, which means the length of a side cannot be 0, therefore the volume cannot be zero. We also know that the cube is made of discrete blocks of a definite size. Thus, volumes like 4 cm^{3}, or 3 or 7 or whatever, are not just unlikely, *they are impossible*.

The *only* two possible volumes are 1 cm^{3} and 8 cm^{3}. Under symmetry, or the principle of indifference, the probability the volume is 1 cm^{3} is equal to the probability the volume is 8 cm^{3}. And either of those statements are the same as saying the cube’s length is either 1 cm or 2 cm.

What about transformation of units? No problem here either, because the discrete blocks which make the cube have a fixed, definite length.

What if the cube is made (on a side) from up to N definite blocks? Again, no problem.^{b} Lengths are 1, 2, … N; and the only possible volumes are the cubes of these. Even stronger, any measured quality or dimension—not just volume—can only be discrete.

Is assuming what appears to be true—that the universe, or at least our ability to measure it, is discrete and finite—a limitation on theory of probability. No! N can be as big as you like! Let it grow, *grow*, GROW! Just don’t let it hit infinity, and we will *never* have any paradoxes creep into our calculations.

Would it surprise you to learn that this criticism of infinity is old? Nowadays our position is called *constructivism* or *finitism*. But let’s recall what our man Leopold Kronecker, he of “product” fame and enemy of Georg Cantor, said: “God made the integers; all else is the work of man.” Amen, brother Kronecker, amen.^{c}

————————————————————————————

^{a}The reality of the cube is not essential, not required. Nor is the equality of size of the discrete chunks which make up objects and measurements. What is required is finiteness and discreteness.

^{b}In rare cases, we have to keep track whether N is divisible by 2, and then only to make coherent statements (those times when we want to chop the sides up into parcels and say something about those parcels).

^{c}Obviously, there is much, much more to be said on this subject.

Steven Strogotz did a cool math primer on the Times earlier this year that touched on #3, sortof. I think it’s a fun primer on how mathematics came into being and should be recommended reading for everyone.

Read here.

“fertile mine” — guano pit.

If we decide.

a priori, that there is equal probability of cube sides less than 1 cm or between 1cm and 2 cm, then how can you also declare that there is equal probability of volume less than 4 cm3 or between 4 and 8 cm3? The equivalent prior would have to be transformed using the functional relationship between length and volume for a cube, yes? What am I missing?Rats. The HTML tags for superscript do not work here. Briggs has some magical trick using symbol sets or something.

Kevin,

Re: comment # 2: Bwaa Ha Ha Ha!

Re: comment # 1: You’d think so, but no. Jaynes did some work in this area with transformation groups. It’s really a huge problem, causing people great distress. By that I mean, most think that these paradoxes are a devastating critique of Bayesianism. Even Bayesians think so. In practice, nobody much worries about it.

This seems like it should be obvious* to anyone who has familiarity with mathematical infinities. As always, once infinity shows up, get rid of normal intuition.

* The most dangerous words in math: “It is therefore obvious that…”

Briggs,

Re: Kevin’s point (also the point I came here to make), can you say a little bit about what causes the problems. It seems to me that there is an implicit assumption of a uniform distribution on the lengths of the sides of the cube. If so, why can’t we treat the volume as a function of the random variable side-length, and use the standard transformation-of-an-RV rule to get the distribution on the volume?

Whether or not you are willing to say a bit more about the issue, I would also be curious to know where I could read more on this on my own.

Finally, didn’t Hamming claim (perhaps jokingly) something along the line that the universe is really discrete and continuous math is just a convenient approximation?

I always liked this one:

a = b

a^2 = a*b

a^2-b^2 = a*b-b^2

(a+b)(a-b) = b(a-b)

(a+b) = b

a+a = a

2a = a

2 = 1

Fortunately in this case, the math “more” intuitively point to the infinity. Like noahpoah and Kevin, I am curious if there is a detection technique…

I was reminded of the following paradox, which I first saw in Martin Gardner’s Mathematical Games column:

http://en.wikipedia.org/wiki/Bertrand_paradox_%28probability%29

I think the question at issue is verbal- Just how are the lengths of the edges of the cube distributed “randomly”? Are you selecting from a uniform distribution of edges, or surface areas, or volumes?

I am afraid I don’t understand the problem.

Given a cube has edge lenghts less than 2″, what his the probability that it has edge lenghs less than 1″?

This doesn’t make any sense to me. Do we discuss the set of all cubes? Is there a distribution function that needs to be defined? I see the way out, as Briggs discusses, if we limit the discussion to a set of possible cubes.

Otherwise, I don’t think the question can be anwered.

Alan, Doug,

Re: Bertrand’s paradox. The even bigger problem is, in a discrete world, there are no circles. Nor are there, often, perfect chords.

The problem isn’t verbal, nor does it lie in the definition. There are certainly ways—and it has been done many times—to describe these sorts of paradoxes mathematically.

I don’t love the summary, but you can read more about finitism here.

If the length of each side of a cub is uniformly distributed over the interval (0,2] cm, then P( 0<a side<=1) = P(1 < a side< = 2) = 1/2. We can find the distributions of the surface area and the volume using a standard calculus method, but the resulting distributions are not uniform. Anyway, one of the points here, I think, is that one may naively make the mistake of concluding that P(volume <=1) = 1/8 since the volume of the cube is 8 cm^3.

I do appreciate the beauty of Krnoecker products, but â€œGod made the integers; all else is the work of manâ€? To some people, God is man-made, therefore all, integers or not, is the work of man. *_^

I think commenters are missing the point, though perhaps I am. The point of the cube example, it seems to me, is to demonstrate that there is no canonical (*) prior distribution on the size of the cube, given only that the length of its side is less than 2cm. It might be argued that a uniform distribution is the only reasonable distribution to choose when there is no reason to expect any cube size over any other. But “size” is not well-defined. Does it mean volume? Area of a side? Length of an edge? A uniform distribution on the volume is fundamentally different from a uniform distribution on the length of the side. Thus the given information is not enough to make the choice of prior canonical.

Briggs is saying this problem is not present when the number of possible sizes is finite. In this case, regardless of which measure of size is used, the uniform distribution is essentially the same.

* canonical in the mathematical sense:

http://en.wikipedia.org/wiki/Canonical#Mathematics