Lesson n+1: Measurement & Observables

Just a very crude sketch today: it is not complete by any stretch. Naturally, the students in the summer class don’t receive this level of information.

Best we can tell, the universe is, at base, discrete. That is, space comes to us in packets, chunks of a definite size, roughly 10-35 meters on a side. You may think of quantum mechanics; quantum, after all, means discrete.

Now, even if this isn’t so; that is, even if the universe proves to exist as an infinitely divisible continuum, it will still be true that we cannot measure it except discreetly.

Take, for example, a physician reading blood pressure with an ordinary sphygmomanometer, the cuff with a pump and the small analog dial. At best, a physician can reliably, at a glance, gauge blood pressure to within 1 millimeter of mercury. Even digital versions of this instrument fare little better.

But, of course, these instruments can improve. The readout can continue to add decimal places as the apparatus better discerns the amount of mercury forced through a tube, even to the point—but no further—than counting individual molecules. Fractional or continuous molecules aren’t in it.

Further, every measurement is also constrained by certain bounds, which are a function of the instrument itself and the milieu in which it is employed. That is, actual measurements do not, and can not, shoot off to infinity (in either direction).

Every measurement we take is the same. This means that when we are interested in some observable, particularly in quantifying the uncertainty of this observable, we know that it can take only one value out of a set of values. That is, the observable can only take one value at a time.

I am considering what is called a “univariate” observable; also called a point measurement. It doesn’t matter if the observable is “multivariate”, also called a vector measurement. If a vector, then each element in the vector can take only one out of a set of values at any one time.

We also know that any set of measurements we take is finite. Finite can be very large, of course, but large is always short of infinite. We might not know, and often do not know, how many measurements we can take of any observable, but we always know that this count will be finite.

The situation of measuring any observable at discrete levels a finite number of times is exactly like the following situation: a bag contains N objects, some of which may be labeled 1 and the others something else. That is, any object may be a 1 or it may not be. That statement is a tautology; and based on the very limited information in it, we can tell is that an object with a 1 on it is possible.

In this bag, then, there can be no objects with a 1 on it, 1 such object, 2 such objects, and so on up to all N objects. We want the probability that no objects have a 1, just one does, and so on. Through the theorem of the symmetry of individual constants (which we can prove another day), it is easy to show that the probability of any particular outcome is 1 / (N + 1), because there are N + 1 possible outcomes.

This is, of course, the uniform distribution, in line with what people usually call an “ignorance” or “flat” prior. But it is not a prior in the usual sense. It is different because there are no parameters here, only observables. This small fact becomes the fundamental basis of the marriage of finite measurement with probability.

Suppose we take a few—something less than N—objects from the bag and note their labels. Some, none, or all of these objects will have a 1. Importantly, the number of 1s we saw in our sample give us some information about the possible values of the rest of the objects left in the bag.

No matter the value of N, we can work out the probability that no remaining objects are labeled 1, that just one is, and so on. Again, no parameters are needed. We are still talking about observables and observables only.

We can continue this process by removing more, but not yet all, objects from the bag. This gives us updated information, which we can use to update the probability that no objects remaining are labeled 1, that just one is, and so on. (For those who know, this is a hypergeometric distribution.)

Once more, we still have no need of parameters; we still talk of observables. This assumed we knew N, and that N was finite. But if we do not know N, but do know it is “large”, we can take it to the limit, and then use the resulting probabilities as approximations to the true ones. (This limit is the binomial). The limiting distribution then speaks of parameters—it is important to understand that they only arise because of the limiting (approximating) operation.

Well, you might have the idea. If we do not know N, and cannot say it is “large”, we can apply the same logic to its value as we did to the labels. Point is, all of probability can fit into a scheme where no parameters are ever needed, where everything starts with the simplest assumptions, and ends quantifying uncertainty in only what can be measured.

7 Comments

  1. So we know N and we’ve got a selection of balls in our hand. We’re going to use the hypergeometric distribution to calculate the probability of various future outcomes. Now the numbers you plug in to your hypergeometric calculation have been taken from the observed values but they are applied to the unobservable contents of the bag. Why does this manoeuvre justify saying that these numbers are not “parameters”?

  2. From the previous post: “Since p-values—by design!—give no evidence about the truth or the falsity of the null model, it’s a wonder that their use ever caught on. But there is a good reason why they did. That’s for next time.”

    Ever watch the late local news where they have a headline that engenders curiosity (the hook) such as, “How a cat in a tree foiled a bank robbery!,” after which they then say Coming up! or Next! but what they really mean is Last.

    So is it safe to assume the p-value discussion will be continued in Lesson N+M+1? 😉

  3. I’m curious how your approach to logical probability fits in with Bayes’ Theorem. Suppose I have a two-sided coin, exactly one side of which is
    heads. When flipped, exactly one side shows. Some Baysians might model this by letting p denote Prob(heads) and assuming that p has a uniform distribution on the interval (0,1). My understanding is that you’d say Prob(heads)=1/2. Your model seems equivalent to assuming the distribution of p is a Dirac delta function with peak at 1/2.

    http://en.wikipedia.org/wiki/Dirac_delta_function

    If the coin is flipped many times, and the results recorded, then with either prior model distribution, Bayes’ Thereom can be used to revise the model. But with the Dirac delta prior, this posterior model will be that same Dirac delta. In effect, your prior model is so strong that no amount of experimental evidence will allow for an updated model that differs from Prob(heads)=1/2.

  4. Best we can tell, the universe is, at base, discrete. That is, space comes to us in packets, chunks of a definite size, roughly 10-35 meters on a side. You may think of quantum mechanics; quantum, after all, means discrete.

    Well you rather not 😉
    Because it would be horribly wrong .
    10^-35 m is just the Planck length . It is rather small and plays a role for certain variables like radiuses of black holes and such .
    Of course not every variable that has meter as unit sees anything special at the Planck length and the word “quantum” has no special relation with space or time .
    A photon wavelength (unit m) can be arbitrarily small or large , the Planck length is no kind of “limit” .
    On the contrary if it was , it can be easily shown that the Lorentz invariance (e.g relativity) would be violated and it is extremely unlikely bordering on impossibility that a statement violating Lorentz invariance would be correct .
    So space and time are continuous all the way to 0 with some special things for some special cases happening around the Planck scales .

    Now, even if this isn’t so; that is, even if the universe proves to exist as an infinitely divisible continuum, it will still be true that we cannot measure it except discreetly.
    As we have seen above , this indeed isn’t so .
    I am not sure I understand well the second part because while the number of measures is an integer , the measure itself is some real number .
    But when we are measuring real physical observables like photon wave lengths or photon energies (E=h.f) it is clearly not equivalent to drawing balls in a bag .
    The spectrum is continuous and the probability to measure any given number like f.ex 1 is exactly 0 .
    There is not a countable number of outcomes but an uncoutable number (like the number of points in a segment) .

  5. I forgot to add that the accuracy of the measure being always finite , it follows that the number of possible measure reasults is not only countable but finite .
    This is something diferent from the real continuous spectrum of an observable .
    The strictly mathematical proof that a computer model can never reproduce a chaotic system bases on this principle .
    But a statement about computers and a statement about reality are 2 very different things .

  6. Tom,

    You are a sweetheart. Thanks very much.

    Once I am done with this class—which is killing me, time wise, energy wise, and otherwise, though I love my students—I will answer this more.

    I may re-do or answer on Sunday/Monday.

  7. I can’t see why I deserved thanks William but am of course happy to discuss these fundamental things 🙂
    So to pile up a bit on your week-end (personnaly I will be sitting on my terrace , drinking coffee and reading The children of Hurin) I would like to add one comment more relating to the finiteness of possible measure results .

    As our lives are finite , we are unable to write any real number .
    We can only deal (physically) with natural numbers and even that , not all of them .
    So every time we need real numbers and this is actually all of the time we need to extrapolate observations obtained with a finite number of natural numbers .
    For instance if I measure some length that I know to be a real number between 0 and 1 with 10^-35 m accuracy (admittedly a darn big accuracy) , I will only obtain results among 10^35 different numbers . 10^35 is large but finite . So any measure between
    0 and 10^-35 will give the single number 0 as result yet there is an (uncountable) infinity of non zero numbers that I am really measuring .

    We are lucky , extrapolations often work .
    But it is extremely dangerous to take it for granted in every case .
    It is not only dangerous because I may get values wrong but I can also get logical conclusions wrong .

    I always liked the chaotic system example .
    Let’s suppose that I have a superfast computer and solve numerically the Lorentz system with a step of 10^-43 sec (very hard to do better because this is the Planck time) .
    Doing N runs with different initial conditions I observe that all solutions are extremely irregular but periodical after large times .
    So now I can begin to do statistics and formulate conclusions like “The probability that the period is between X and Y is P(X,Y)” . Etc .
    I spend my time writing peer reviewed papers about period distributions in chaotic systems .
    Yet I only wasted my life and could not be farther from the truth .
    The “no intersection theorem” proves that a chaotic system can NEVER be periodical .
    All my statistics and conclusions were just artefacts due precisely to the fact that I worked within a finite set of 10^43 numbers and forgot that even if it is large , it is not infinite .

Leave a Comment

Your email address will not be published. Required fields are marked *