Jelle de Jong writes in to ask:

Working as a quant analyst in finance I recently got interested in the Briggsian/Jaynesian/Bayesian interpretation of probability but am still struggling a bit with it. When reading your book/blog I was wondering what you mean when you say the ‘true value of a parameter.’ For a situation where we can imagine a (clearly defined) underlying population (say a population of people of which we have measured some property for only a sample) it’s seems clear what the connection is between the model (parameters) and the data-generating system, but if you would ‘estimate’ a binomial parameter how would you interpret this ‘estimated’ probability? Jaynes writes in his book that estimating a probability is a logical incongruity (Jaynes’ Probability Theory 9.10 on p. 292). Do you interpret the estimated parameter as an property of the (hypothetical) underlying distribution (i.e. the fraction of successes in an infinite sample) that can be estimated with corresponding quantification of uncertainty? But as this is just a model, in what sense can we speak of a true value of this parameter (The only truth is that the process will generate a number of successes). Can we give the estimated binomial parameter such a physical interpretation or is it only possible to assign a success probability, but then it would be incoherent to assign a distribution to this estimated probability.

I hope you can take the time to shed some light on my question.

With Kind Regards,

I’d start by putting my name last, in smaller font, and in parentheses, and then prefixing Laplace, Keynes (yes, that one), and especially David Stove who all took a logical view of probability. Historically, this turned out to be wise because the logical view is the correct one.

Consider the evidence—*assumed as true*—that E = “We have a six-sided object which when tossed shows and one side and just one side is labeled 6.” Given this evidence, I *deduce* the following:

Pr( ‘6 shows’ | E ) = p = 1/6.

Let’s add to our evidence by saying we’re going to A = “toss this six-sided object n times.” Then we can ask questions like this (to abuse, as they say, the notation):

Pr( ‘k 6’s show’ | E & A) = binomial(n,p,k)

where we again have *deduced* what the probabilities are. The ‘n’, ‘k’, and ‘p’ are all *parameters* of the binomial; and they are the true values, too. They follow from assuming as true E and A and by assuming we’re interested in k ‘successes’, i.e. k 6’s showing. And this is not the only time where we can deduce the value of a parameter, i.e. have complete knowledge of it; many situations are similar.

Now suppose instead we observe a game in which a ball is tossed into a box the bottom of which has holes, only some of which are colored blue. The box is a carnival game, say. We want to know, given all this information which we’ll label F, this:

Pr( ‘ball falls in blue hole’ | F ) = θ

From just F the only thing we can deduce is that 0 < θ < 1: θ isn’t 0 because some of the holes are blue, and it isn’t 1 because we know that not all holes are blue; beyond that, F tells us nothing. The point to emphasize is that we have *deduced* the true value (in this case values) of the parameter, which is 0 < θ < 1. (Actually, we do know more; we know the number of holes are finite, and this is actually a lot of information; however, for the sake of this post, we’ll ignore that information: but see this paper which works out this entire point rigorously.)

If we add to F another “A”, and consider n tosses of the ball, we deduce this:

Pr( ‘k blue holes’ | F & A) = binomial(n,θ,k)

where again 0 < θ < 1. We have complete knowledge of two parameters, n and k, but θ remains (mostly) a mystery.

And here we must stop unless we gather more evidence. We can make this evidence up (why not? we’ve done so thus far) or we can add evidence in the form of observational propositions: “On the first toss, the hole was not blue,” “On the second toss, the hole was blue,” and so on.

Given F and A and this new observational evidence we can call “X” (where the number of tosses in X are finite), we can deduce:

Pr( θ = t | F & A & X) = something(t)

for every possible value of t (where we have already deduced t can only live between 0 and 1; the value of ‘something’ relies on t). Very well, but this only gives us information about θ, which is only of obscure interest. It says nothing, for instance, about how many balls will go into blue holes, or the probability they will fall into blue holes. It’s just some parameter which assumes F, A, and X are true.

To get the probability of actual balls going into k actual (new) holes, we’d have to take our binomial(n,θ,k) model and hit it with Pr( θ = t | F & A & X), which you can think of a weighted average of the binomial for every possible value of θ Mathematically, we say we integrate out θ because the result of this operation is

Pr( ‘k new blue holes’ | F & A & X & n new tosses) = something(k)

where you can see there is no more (unobservable) θ and where the ‘something’ relies on k. This works even if n = k = 1 (new tosses).

It’s not useful to speak of θ as the “probability” of a ball going through a blue hole: that last equation gives that, and there is no θ in it.

Now, all statistics problems where new data is expected can and should be done in this manner. Almost none are, though.

Hope this helps!

Categories: Philosophy, Statistics

W. Edwards Deming used to say that there was no such thing as a measurement, there was only the result of applying a measuring system to an object. If the same object is measured using two different systems, it will in general deliver two different sets of results. This is the problem of operational definition.

What, for example, is the diameter of a glass bottle? It has an infinite number of diameters: at various heights and points around the circumference? All measurement is an act of sampling. It matters a great deal how “the” measurement is obtained. I have encountered cases in which supplier and customer measured the “earring” of aluminum, the volume of a can, or some other quality in two different ways to a certain amount of inter-company urinary olympics. In other cases, two different instruments were used of slightly different design, or two different instruments one of which had lost its calibration.

Deming used to speak disparagingly of “19th century statistics” like hypothesis tests and said that those who thought of Shewhart’s control charts as “on-going t-tests” had missed the point. And A.C. Rosander in his

Case Studies in Sample Designnoted one of the ways in which traditional hypothesis tests differed from reality — theory v. practice — was that real processes were dynamic, not an urn full of colored beads.Just recently integrated out theta because of a discussion about rocket launch risks.

What, for example, is the diameter of a glass bottle? It has an infinite number of diameters: at various heights and points around the circumference? All measurement is an act of sampling. It matters a great deal how â€œtheâ€ measurement is obtained.Which is why engineering specifications are always have a tolerance. Some of them can be quite narrow. The avionics boxes NASA uses have face so flat they will stick together.

two different instruments one of which had lost its calibration.Check under the bench. If often rolls there.

There we go again, integrating out the parameter of the binomial distribution. We used the Beta distribution, right, and it looked almost easy. I’d like to see it done with the Poisson distribution.

So Iâ€™d like to know whether a die is weighted towards â€œ6â€? How do I answer this question using the posterior predictive distribution? Please donâ€™t tell me what Iâ€™d like to know is not of your interestâ€¦

JH,

Easy: collect the X, predict new X, see if the new X accord with the old under the model F&A or under a new model indicating weightedness.

Rich,

Easy too. Try it with JAGS (or Winbugs). Or, even better, eschew the infinities and treat everything as finite (see the linked article).

DAV,

And it’s the last place we usually check.

YOS,

Deming was right!

Mr. Briggs,

What does â€œthe new X accord with the old under the model F&A or under a new model indicating weightednessâ€ mean exactly? What is the definition of â€œaccordâ€? What is X?

Say, we toss a die 10 times and obtain {1,4,2,6,6,2,3,6,5,6}. What is your old X and new X in this case? What would be your answer to the question of whether the die is weighted towards â€˜6â€? Should I sue the casino for using an unfair die? (A yes-or-no question)

Hi,

Thanks for your answer! I think I’m getting there, I have to reconcile the new way of thinking with a lot of frequentist training. I have some summer reading (Stove etc.) to do too I think!

Kind Regards,

Jelle