Forget Priors!

By Briggs December 13, 201913 Comments

From Bill Raynor:

Hello Matt,

You like to discuss P(Y|M) a lot, but haven’t spent much time talking about the practical construction of that.

A topic I’d like to see: a constructive development of P(Y|X) for some real problem involving real objects…Every time I’ve chatted with a Bayesian about priors, I get a lot of handwaving and mathematical idealization (hello, mathematical Platonism) but very little in the way of real examples. The resulting math is very pretty and elegant, yadda, yadda. The part where I ask “what does that physically imply…” gets rather vague, quickly….

I have in mind something rather Kolomogorovian:

1. define a finite reference set of objects and measurements on those objects (e.g. body weights usually have a body attached…) You can use a finite sampling frame if you wish.
2. define a set of observable propositions on those objects and measurements (mutually exclusive and exhaustive, no infinities or absolute continuity, etc.) including the means of measurement. (e.g. the Brewers beat the Yankees, the mean difference between two (blocked) partitions of objects.) If it involves means, show how object/measurements really are additive — e.g. weights of grain from a field plot.
3. Assigning an additive measure and hence probability to that.

There are innumerable practical uses of Pr(Y|X). The worn stock examples were made for this. Let X = ‘This interocitor must take one of n states’, then if Y = ‘This interocitor is in state s = i’, we have Pr(Y|X) = 1/n.

This is as practical as it gets. Casinos use it; everybody does. And without priors! Most probability is done in this uncomplicated way. Pure probability, no formal models, no parameters.

Now let’s do your Kolomogorovian scheme to see it’s exactly the same thing. Take two blocked objects, A and B, which have finite discrete measurements taken on them, as all measurements are (all measurements are discrete and finite).

Next, since you’re interested in the means, or presumably some function of the means, we can compute A’ and B’, which are the means (also discrete and finite). We want Y = f(A’,B’).

Finally, we gather X, evidence probative about Y, call it some additive measure if you like, and compute Pr(Y|X).

Done!

See, I told you it was the same.

You can make it more complicated, but that doesn’t change the end result. Make it complicated by making X into math.

It could be, and it would be ideal, that you can deduce from X the probability of Y, as we did with the interocitor. The strategy is to consider the nature of the measurements. I give some examples in Uncertainty. Mostly folks are too anxious for the deduction, which won’t be simple, and start cramming ad hoc models into X.

Any number of models are used, most of them continuous approximations. There could be one model for A, another for B, which then implies some sort of model for A’ and B’, which in turn implies a model for Y.

All these models are usually parameterized. These parameters, being part of the models, are just another part of X. The models of the parameters, the priors, are also part of X.

Any past observations, if any, are also part of X. In the end, still Pr(Y|X).

“Why no specifics, Briggs. We want concrete examples.”

Did you try the many examples in the class? Lots of common cases. There’s no general solution, but many, many.

“Not yet. I saw them, but didn’t try. Too busy. What really worries me are the priors. They’re so much nonsense.”

Sure, like the ad hoc models themselves in which the priors are related. But they might, if you do it right, be reasonable nonsense, good approximations.

Here’s the idea. Start with a model for A and B, or start with A’ or B’, or even start with Y. It makes no difference. In the end we still get Pr(Y|X). If there are parameterized models, then if you want to try different parameters or different models with their different parameters, then you have X_1, X_2, X_3, …, each of these the full X for the choice of model and parameter uncertainty; including, of course, any past observations, and other evidence that went into the suggesting the models used.

Which is “best” depends on the uses to which you put the model, the decisions you make with it. X_17 might be great for you, lousy for me; maybe I prefer X_2. Who knows?

None of these are the true model, the cause of Y. If we knew the cause, we wouldn’t be worried about all this other nonsense. And the model isn’t true because we haven’t deduced it like we did with the interocitor.

“This is not a satisfying answer.”

Is it not? It is the true answer. It’s complete. There was no point going on about some specific example (which are in the class anyway). The idea is what counts.

To support this site and its wholly independent host using credit card or PayPal (in any amount) click here

Last updated on December 14, 2019

Briggs

Briggs is an internationally reviled thoughtcriminal, listed as One Of The Top 7 Dangerous Minds by the Hague.

View All Posts

13 Comments

Bill_R

December 13, 2019, 11:01 am

Matt,

details, details, details….

Finally, we gather X, evidence probative about Y, call it some additive measure if you like, and compute Pr(Y|X).

I was hoping for an example with specifics. Specifics about the set of Y’s and the X’s followed by specifics on the construction of P(Y|X) for all (discrete, finite) Y. I’m already familiar with the “assume a spherical unicorn” approach and have used it on many occasions.

As an example, you could consider the R.A.F. example of paired pots of plants in Chapter 21 of Statistical Methods For Research Workers (p.44ff). very finite, discrete measurements, additive and so on. He shows how to construct a randomization distribution for a mean of differences. How would you do it in a practical case?
Justin

December 13, 2019, 12:37 pm

I’m sorry, still not enough concrete examples for me to know what you’re specifically talking about and how it is claimed to have an advantage over standard methods.

Justin
Yonason

December 13, 2019, 2:09 pm

“assume a spherical unicorn” – Bill_R

One spherical unicorn coming up.
https://youtu.be/nQlF-dpU5lw

Would you like fries with that?

More seriously, I’m with both Bill and Justin above asking for more specifics. Are there any? Or is the reason we are referred to Interociters because there aren’t?
Bill_R

December 13, 2019, 3:38 pm

Yonason,
Good one! I’ll skip the fries, though. Do spherical unicorns eat keto fries?

Matt, is this a sufficient outcry from the masses? The sheep look up…
Briggs

December 13, 2019, 6:59 pm

All,

If you guys have already seen the class examples, then it could be fun to provide links to ready-to-use data, with code for traditional models, if available.
Yonason

December 14, 2019, 10:15 pm

I’m coming to the conclusion that statisticians are to math what lawyers are to the general population.
====================================
RE – A case from Briggs’ excellent book.

3 balls in a bag. What are the odds you have all the same color? It depends on how you got there.

(A) – 3B and 3W in an urn. Remove 3, one at a time, and insert in bag. Chance of all 3B = 1/20.
(B) – 6B and 6W in urn. Remove 3 as in A. Chance of all 3B = 1/11
(C) – general case for XB and XW is P=(1/4)[(X-2)/(2X-1)], which for large X approaches all 3B = 1/8.

Other scenarios (models?) can be devised which result in different probabilities. I don’t know what Bill and Justin are looking for, but I hope that short illustration gives an idea of the kind of e.g., I need to be able to begin to see how the most general abstract case can be applied to a potentially real scenario.
Yonason

December 14, 2019, 10:18 pm

P.S. – the e.g., in my last was of my own devising. It was not provided in the book. Sorry if that wasn’t clear.
Justin

December 16, 2019, 9:07 pm

—
3 balls in a bag. What are the odds you have all the same color? It depends on how you got there.

(A) – 3B and 3W in an urn. Remove 3, one at a time, and insert in bag. Chance of all 3B = 1/20.
—

(3/6)x(2/5)x(1/4)=1/20
Yes.

But Yonason that’s the point it depends on sample space, reference set, X, whatever you want to call it, these examples actually support frequentism and parameters the standard methods.
I have several books on urn theory, for example.
You’d need to show an example that Briggs can solve only, or better, using his method.

Justin
Briggs

December 16, 2019, 10:23 pm

Justin,

Frequentism is not needed. Just pure probability, with deductions based on whatever assumptions you bring. Such as numbers, colors, whatever.
Yonason

December 16, 2019, 10:41 pm

“You’d need to show an example that Briggs can solve only, or better, using his method.” – Justin

I would be happy if he gave me detailed enough examples of ANY model/method, hopefully contrasting their effectiveness and application, with real world examples (not interociters!). One thing he does in the book is to say of the case of one black and two white (BWW) that there are 3 ways of getting it (i.e., permutations) [(BWW), (WBW), (WWB)]. However, in the model I have used, permutations are irrelevant. You don’t need them to get a probability. And that is one thing I want to understand. WHEN do you apply permutations, and when not. For my case…
(B,W,W) – [(X/2X)*(X/(2X-1))*((X-1)/(2X-2))]
(W,B,W) – [(X/2X)*((X/(2X-1))*((X-1)/(2X-2))]
(W,W,B) – [(X/2X)*((X-1)/(2X-1))*(X/(2X-2))]

All give the same answer. permutations are irrelevant here, though I know that in some cases it is essential to use permutations to calculate a correct result (energy states in quantum chemistry, for e.g.).

So, the problem for me isn’t that Briggs is wrong, he’s not, only that the information he gives is so sparse that only a statistician can decode it. My request is that he descend from his Mt. Olympus and impart a bit of wisdom to us commoners, i.e., speaka da english, palease, and thank you.
Yonason

December 16, 2019, 11:18 pm

“Frequentism is not needed. Just pure probability, with deductions based on whatever assumptions you bring. Such as numbers, colors, whatever.” – Briggs

Smoke and mirrors?

But, from my e.g., I got a probability from between 1/20 to 1/8 for all same color, depending on model and conditions. …and your pure probability for 3 the same color yielded only 1/8, regardless of how it was arrived at.

Another scenario…
After reading about 3 balls in a bag in your excellent book, the desire to posses them becomes so wildly popular that many companies incorporate to meet the demand. Here’s how one company rises to the challenge.

“Balls in Bags” hires Chuck, Duane, Bob and Alice. Chuck fills bags with 3 black balls. Duane with 3 white; Bob with 1 black and 2 white; Alice with 1 white and 2 black. They all fill bags at the same rate. The probability of any configuration is then 1/4, regardless of the mix.

One day, Bob is out due to the flu, and so Alice has to sub for him. But even though she produces as many bags as before, only half will be configured as she or Bob had before. Now the probability of all same color is 1/3 for same color, and 1/6 for either (b,w,w) or (w,b,b).

Of course it depends on what the most likely method of filling the bags is, and the extent to which any model applies, but I see no way of generalizing from what Briggs says in the book (or here) to be able to arrive at an understanding of what might happen in the real world.

If I could do it myself, I wouldn’t need to be reading here. I don’t need to be made to feel inadequate because I can’t do it, especially since I’ve been given the impression that if I read this material I’ll somehow magically be able to. Not gonna happen without some concrete examples, though.

What a disappointment when the thrill is gone…
https://www.youtube.com/watch?v=CzUgX-HB9tA
…when I ain’t never had it to begin with.
Bill_R

December 17, 2019, 5:31 pm

Yonasen,
Agreeing with Justin on this. The method specifies the reference set and the weights which in turn defines the probability. I call it a reference set as it can be derived from a sampling framework, a permutation or randomization framework (my favorite), a prior distribution, etc. etc. If you don’t define the method, then the probability can be ill-defined.

Yes, we can be like lawyers sometimes. I was asking Matt to demonstrate how he does it, in a purely logical fashion (without reference to sampling distributions or permutation/randomization distributions) , for practical problems.

Bayesian approaches can be handy if you have prior data to define stuff, e.g. empirical bayes and shrinkage estimates.
Yonason

December 18, 2019, 2:44 am

@Bill_R

Thanks Bill. I see, as I thought might be the case, that we were posing different questions. As you can see, my concerns are more pedestrian than yours. Not being able to fly, yet, I need a scaffolding to be able to ascend for a more panoramic view. In the mean time, I just wanted to give my current view from ground level. Hard to tell the junk from the items of value from this vantage point, without some assistance.

So, basically, you were asking a more advanced question. That still doesn’t help me, but thanks for giving me a straight answer.

Briggs on Let Go Your Wee P! — Reader Help RequestedApril 25, 2025
Maypo Horses used to be "bated", i.e. fed. So I guess it means "breath which stinks of eating."
Maypo on Let Go Your Wee P! — Reader Help RequestedApril 24, 2025
I've never understood the concept of "bated breath", but count me as waiting eagerly.
bruce taylor on Class 47: Billions Of Models!April 24, 2025
There are both brain wave studies (Stanford 2024) and brain chemical studies that point to distinct differences between sexes. These…
Briggs on Let Go Your Wee P! — Reader Help RequestedApril 24, 2025
Maypo. Excellent question, and one we cover starting next week, if you don't mind waiting. Quick answer: it ain't easy.
Maypo on Let Go Your Wee P! — Reader Help RequestedApril 24, 2025
Hey Dr. Briggs, I think I generally follow your arguments and agree with them. What I am struggling with is,…

Forget Priors!

Related

13 Comments

Leave a Reply

Share this:

Related

13 Comments

Leave a Reply