William M. Briggs

Statistician to the Stars!

Page 147 of 416

The Chicago Way With Chik-Fil-A

There is a sense in which we all argue that is statistical. For instance, if we say of a group that it sports an “arrogant attitude” we do not mean that all members of that group are arrogant, but that many of the group are, or that a clear majority of the leaders of the group are arrogant.

Or we might say, “Progressives are inflexible and unwilling to listen to counter-argument” and by this we do not mean that each and every soul who labels herself a progressive is inflexible and unwilling to listen to counter-argument, but we do assert that many who self-label as progressive are stubborn in this way, and we do claim that the clear majority of progressive leadership fits this bill.

This is a harmless manner of speaking. It saves time and words as long as there is a clear understanding that the claims are statistical and not intended to apply to each group member. As a for instance, take when Chik-fil-A President Dan Cathy said, “I pray God’s mercy on our generation that has such a prideful, arrogant attitude to think that we have the audacity to define what marriage is about” we do not assume Cathy means all souls who comprise “our generation”—a group which (by definition) includes Cathy himself—possess an “arrogant attitude.” He only meant the majority of this generation’s leaders are prideful and arrogant.

Proof is had in a letter Boston Mayor, and progressive leader, Tom Menino wrote to Cathy after Menino heard Cathy did not toe the progressive line. Menino admitted, re: the “prideful” charge, “guilty as charged.” But poor Menino, straining his education, has confused the shade of meaning of prideful. So skip to where Menino wrote “I was angry to learn [Chik-fil-A was searching for a site in Boston].” Menino goes on to bluff that Boston won’t allow Cathy to open his business in Boston. Guilty as charged on the arrogance, too.

More arrogance proof was furnished by Chicago alderman Joe Moreno, who also heard Cathy’s words, and who then denied Cathy’s company a building permit. Moreno said, “Chik-fil-A approached me with a paper bag in the usual way, but when I opened it, I was shocked to discover the ‘lettuce’ was real.” Chicago, the most corrupt of American cities, run entirely by Democrats for all of living memory, yet blaming all its ills on Republicans, has its rules that Cathy has not yet learned.

Chik-fil-A’s official values are Christian. It closes on Sunday. Its owner prays. It has touted Biblical passages. The company says that it is their intent to “treat every person with honor, dignity and respect — regardless of their belief, race, creed, sexual orientation or gender.”

Not letting this crisis go to waste, Rahm Emanuel said, “Chick-fil-A values are not Chicago values.” This is true. Chicago is well known for not treating every person with honor, dignity, and respect. It particularly eschews those values for the poor souls who would buck the machine.

Here Emanuel was speaking in a statistical sense. He did not mean all Chicagoans’ values are opposite of Chik-fil-A’s, but he meant all of its leaders do not hold with treating people with dignity, etc. In particular, almost certainly a majority of Chicago’s black, and a majority of Chicago’s Hispanic, and a majority of Chicago’s immigrant, and probably a just-plain majority of Chicago’s population profess values the same as Chik-fil-A’s, and say the same thing about gay “marriage” as Cathy does.

This has always been an embarrassment for the (mostly white) progressive leadership who when asked about this “disparity” change the subject faster than an cash “application fee” disappears into the dark pockets of a, well, of a Chicago alderman. But no worries: progressive leaders see it as their duty to look after these disadvantaged folks; they’ll bring ‘em around to the left point-of-view eventually.

And they’ll do it the Chicago way: by dangling trinkets in front of constituents which can be had by merely signing over their souls, or, if that fails, by good, old-fashioned intimidation and thuggery. Refuse to swear the oath of progressive allegiance? So far we can be thankful the only punishment is banishment.

The Chicago way is the Obama way. It is Mr Obama’s demand, and his value, that all women be given “free” birth-preventing pills and “free” pills which will kill the life which has somehow managed to slip by the first “free” pill. It is his demand, and his value, that those whose religions forbid the use of these controlled poisons (for lack of a better word; they are not drugs to heal or to medicate) be made to abandon their convictions.

Mr Obama, not speaking in the statistical sense, calls this stance a “compromise.”

True Value Of A Parameter

Jelle de Jong writes in to ask:

Working as a quant analyst in finance I recently got interested in the Briggsian/Jaynesian/Bayesian interpretation of probability but am still struggling a bit with it. When reading your book/blog I was wondering what you mean when you say the ‘true value of a parameter.’ For a situation where we can imagine a (clearly defined) underlying population (say a population of people of which we have measured some property for only a sample) it’s seems clear what the connection is between the model (parameters) and the data-generating system, but if you would ‘estimate’ a binomial parameter how would you interpret this ‘estimated’ probability? Jaynes writes in his book that estimating a probability is a logical incongruity (Jaynes’ Probability Theory 9.10 on p. 292). Do you interpret the estimated parameter as an property of the (hypothetical) underlying distribution (i.e. the fraction of successes in an infinite sample) that can be estimated with corresponding quantification of uncertainty? But as this is just a model, in what sense can we speak of a true value of this parameter (The only truth is that the process will generate a number of successes). Can we give the estimated binomial parameter such a physical interpretation or is it only possible to assign a success probability, but then it would be incoherent to assign a distribution to this estimated probability.

I hope you can take the time to shed some light on my question.

With Kind Regards,

I’d start by putting my name last, in smaller font, and in parentheses, and then prefixing Laplace, Keynes (yes, that one), and especially David Stove who all took a logical view of probability. Historically, this turned out to be wise because the logical view is the correct one.

Consider the evidence—assumed as true—that E = “We have a six-sided object which when tossed shows and one side and just one side is labeled 6.” Given this evidence, I deduce the following:

     Pr( ’6 shows’ | E ) = p = 1/6.

Let’s add to our evidence by saying we’re going to A = “toss this six-sided object n times.” Then we can ask questions like this (to abuse, as they say, the notation):

     Pr( ‘k 6′s show’ | E & A) = binomial(n,p,k)

where we again have deduced what the probabilities are. The ‘n’, ‘k’, and ‘p’ are all parameters of the binomial; and they are the true values, too. They follow from assuming as true E and A and by assuming we’re interested in k ‘successes’, i.e. k 6′s showing. And this is not the only time where we can deduce the value of a parameter, i.e. have complete knowledge of it; many situations are similar.

Now suppose instead we observe a game in which a ball is tossed into a box the bottom of which has holes, only some of which are colored blue. The box is a carnival game, say. We want to know, given all this information which we’ll label F, this:

     Pr( ‘ball falls in blue hole’ | F ) = θ

From just F the only thing we can deduce is that 0 < θ < 1: θ isn’t 0 because some of the holes are blue, and it isn’t 1 because we know that not all holes are blue; beyond that, F tells us nothing. The point to emphasize is that we have deduced the true value (in this case values) of the parameter, which is 0 < θ < 1. (Actually, we do know more; we know the number of holes are finite, and this is actually a lot of information; however, for the sake of this post, we’ll ignore that information: but see this paper which works out this entire point rigorously.)

If we add to F another “A”, and consider n tosses of the ball, we deduce this:

     Pr( ‘k blue holes’ | F & A) = binomial(n,θ,k)

where again 0 < θ < 1. We have complete knowledge of two parameters, n and k, but θ remains (mostly) a mystery.

And here we must stop unless we gather more evidence. We can make this evidence up (why not? we’ve done so thus far) or we can add evidence in the form of observational propositions: “On the first toss, the hole was not blue,” “On the second toss, the hole was blue,” and so on.

Given F and A and this new observational evidence we can call “X” (where the number of tosses in X are finite), we can deduce:

     Pr( θ = t | F & A & X) = something(t)

for every possible value of t (where we have already deduced t can only live between 0 and 1; the value of ‘something’ relies on t). Very well, but this only gives us information about θ, which is only of obscure interest. It says nothing, for instance, about how many balls will go into blue holes, or the probability they will fall into blue holes. It’s just some parameter which assumes F, A, and X are true.

To get the probability of actual balls going into k actual (new) holes, we’d have to take our binomial(n,θ,k) model and hit it with Pr( θ = t | F & A & X), which you can think of a weighted average of the binomial for every possible value of θ Mathematically, we say we integrate out θ because the result of this operation is

     Pr( ‘k new blue holes’ | F & A & X & n new tosses) = something(k)

where you can see there is no more (unobservable) θ and where the ‘something’ relies on k. This works even if n = k = 1 (new tosses).

It’s not useful to speak of θ as the “probability” of a ball going through a blue hole: that last equation gives that, and there is no θ in it.

Now, all statistics problems where new data is expected can and should be done in this manner. Almost none are, though.

Hope this helps!

What Probably Isn’t: Heat Waves and Nine Feet Tall Men: Part II

McKibben’s Folly

Suppose it is true that we have E = “A six-sided object, just one side of which is labeled 6, and when tossed only one side will show.” We want the probability of R1 = “A 6 shows.” That is, we want

     (2) Pr( R1 | E ) = 1/6.

This result only follows from the evidence in E to R1. It has nothing to do with any die or dice you might have. We are in French-speaking-cat land (see Part I and links within) and only interested in what follows from given information. In particular, we don’t need in E information about “fairness” or “randomness” or “weightedness” or anything else.

(For those seeking depth, (2) is true given our knowledge of logic.)

Now suppose I want to know Rn = “A 6 shows n times.” This is just

     (3) Pr( Rn | E & n) = (1/6)n,

where if want a number we have to change the equation/expression/proposition. Anyway, let n be large, as large as you like: (3) grows smaller as n grows larger. Except in the limit, (3) remains a number greater than 0, but if n is bugambo big, (3) is tiny, small, wee. Indeed, let n = 126 and (3) becomes 8.973 x 10-99, a number which is pretty near 0, but still of course larger than 0. Right, Mr McKibben?

Since we assume in (3) that E is true (and n is given), the result is of very little interest. All we have are changing and decreasing numbers as n increases. End of story.

Now suppose we turn equation (3) around and ask

     (4) Pr( E | Rn & n) = ?

In words, given we have seen a 6 show in n consecutive tosses what is the chance E is true? That is, given we have seen, or we assume we have seen a 6 show in n consecutive tosses, what is the chance that we have a a six-sided object, just one side of which is labeled 6, and when tossed only one side will show? On this reading, “a 6 shows in n consecutive tosses” implies at least part of E: although not completely. There isn’t information that the object is 6-sided, or that just one side can show, but there is evidence that a 6 is there.

In other words, “Rn & n” gives us very little information except that E is possible; therefore

     (4′) 0 < Pr( E | Rn & n) < 1

and that is the best we can do. And this is so no matter how large n is, no matter how small (3) is. What follows from this is that E is not proved true or false no matter what (3); indeed, in (3) we assume E is true. Again: E cannot be proved true or false whatever (3) is or whatever n is. And this is it. The end.

Yet somehow E “feels false” if n is large. It is not false, as we have just proved, but it might feel that way. And that’s because we reason something like this: given our B = “experience at other times and places with data and situations that vaguely resembles the data and situation I saw this time, simple models like E turned out to be false and other, more complicated models, turned out to be true.” That is,

     (5) Pr( E | B ) = small.

This is well enough, though vague. It has nothing to do with (4) however: (4) is entirely different. Plus, (5) says absolutely nothing about any rival “theories” to E. And then there are these interpretations:

     (6) Pr( E | The only theory under consideration is E ) = 1,


     (7) Pr( E | E can’t be true [because of (3)]) = 0.

Equation (7) is true, but circular; equation (6) is also true because unless we have specified a rival theory for how the data arose, we have no choice but to believe E (this is key). Equation (7), which contains a fallacy, is what many have in mind when judging E.

Now let M = “A 6 shows on roll 1, a 6 shows on roll 2, …, a 6 shows on roll n.” Then

     (8) Pr( M | Rn & n) = 1.

Our “model” M is obviously true, though (8) is also circular. But we can “uncircle” it by changing M to M’ = “A 6 always shows.” Then

     (8′) Pr( M’ | Rn & n) = 1.

Of course, if at the n+1-th roll a 6 does not show, we have falsified M’ (but not M; we’d have to write Pr(M’|Rn & n & not-six on n+1) = 0).

M and M’ are very good models because they explain the data perfectly. But equations (8) or (8′) have nothing to do with (4) or (4′). To judge between E and M, we’d have to start with a statement which assigns a prior belief (before seeing the data R) which of these models were true; then after seeing the data we can update the probability either model is true. But we cannot, using just R, say “E is false” or “M is true.”

The main point, if it is not already obvious, is that any observation will not prove a model or belief false, unless it’s a very special and rare situation like M’ (and observing something M’ said was impossible). What we really or often mean when we say “Just look at R; E is false!” is that we have some rival model N in mind, a model which we are sure is true. McKibben is convinced that N = “death-from-the-skies global climate disruption” is true but uses an equation like (3) to prove N. This is a fallacy. Eq. (3) has nothing in the world to do with N; if E is true then no observation can prove E false, or even show it is unlikely because, of course, E is true.

Now for real temperatures, the model N could be true; but so could many other rival models, and so could a model like E, suitably modified. The climate Ec can be “the temperature can fall into one and only one of three buckets, labeled low, normal, high.” Thus Pr(high | Ec) = 1/3 and so on. Or Pr(high n times in a row | Ec & n) = (1/3)n as before. This means

     (9) 0 < Pr( Ec | high n times in a row & n) < 1

just as before. Since there is no news about N in (9), N is irrelevant.

And just as before, we can start with an a priori judgment about the likelihood of N or Ec being true; and after seeing the data we can update these judgments. It will be the case, unless the a prior judgment is very skewed towards Ec, that N will be morel likely than Ec given the data.

But this does not mean that N is more likely true when judged against other models of the climate. We can, as we have just seen, compare N only against the straw man Ec, but this gives no evidence whatsoever about N and (say) W = “a climate model which does not assume the world will soon end unless new taxes are raised and given to politicians” (or any other climate model we might imagine).

It is therefore cheating, like all straw man arguments are cheating, not to use the best available competitor to N.

What Probably Isn’t: Heat Waves and Nine Feet Tall Men: Part I

Probability is screwy, and we statisticians do a horrible, rotten job of teaching it. The first thing students learn in normal statistics classes is about “measures of central tendency” or some such thing. The idea of what probability means and why anybody would have the slightest interest in “central tendency” is never broached. As a consequence, students leave statistics classes with a bunch of half-remembered formula and no clear idea of what probability is.

This is unfortunate, because it allows educated men like Rolling Stone’s Bill McKibben to write the following:

June broke or tied 3,215 high-temperature records across the United States. That followed the warmest May on record for the Northern Hemisphere — the 327th consecutive month in which the temperature of the entire globe exceeded the 20th-century average, the odds of which occurring by simple chance were 3.7 x 10-99, a number considerably larger than the number of stars in the universe.[see note at bottom of page]

Poor man! Poor readers! McKibben actually believes he has said something of interest; he has worked himself into a lather over these numbers and goes on to say things like “the seriousness of our predicament”. McKibben figures that such a small number can only mean that we are doomed—unless, of course, massive amounts of money is taken from this country’s citizens and given to its politicians to apply as they see fit.

Now over the last week I tried to explain, via two examples, just what probability is and what it isn’t, and why numbers like McKibben’s aren’t of the slightest interest. See this post about global warming and this one about nine feet tall men. And if you find yourself disagreeing with me, read this one about foundations. You must at least read the first two posts because I assume it below.

What Probability Is

Suppose I let the symbol Q stand for “There are no men taller than nine feet,” and the expression D = “I observe a man 8.979 feet tall.” Let’s take this equation, or as some readers prefer to say, expression:

     (1) Pr(D | Q)

and try to solve it.

Equation (1) is a matter of logic. It is just the same as Lewis Carroll’s French speaking cats: We know that if R = “All cats are creatures understanding French and some chickens are cats” that the proposition F = “Some chickens are creatures understanding French” is true; that is Pr(F | R) = 1. And this is so even if nobody ever, not ever never, in no possible world in no possible time, never never never measures or observes or sees or posits on genetical arguments any cats understanding French. It is true even if we learn tomorrow from God Himself that He has decreed that it is a logical and physical impossibility that any cat could understand French. F given R is true and that is that: and it is true because, again, logic only makes statements about the connections between propositions. Logic is mute on the propositions themselves.

All logic, which is to say all probability, because it is solely interested in the connection between expressions, must regard propositions as fixed. In any given equation, we cannot add or subtract from these expressions: we must leave them as they are: they are not to be touched: they are sacrosanct: they exist as they are and are carved out of uncuttable stone: we are forbidden upon pain of death to manipulate them in any way. For I testify unto every man that heareth the words of these theorems, If any man shall add unto these propositions, God shall add unto him the plagues that are written in Greenpeace press releases: And if any man shall take away from the words of these propositions, God shall take away his part out of the Book of Life. I am not sure how much more of a dire warning I can issue. Don’t touch Q or D!

Equation (1) says that assuming Q is true, assuming, that is, that there are no men taller than 9 feet, that it is true that there are no men taller than 9 feet, that it is impossible there are men taller than 9 feet, that God himself has willed that there are no men taller than 9 feet, that in any possible world there cannot be men taller than 9 feet, that it is just a fact, immovable, imperturbable, irrevocable that no man can be taller than 9 feet—even if we want one to be, even if we can imagine it to be so, even if real men are actually observed to be taller than 9 feet, even if you yourself are 9’1″—given, as I say, all that, what is the chance you see a man a quarter-inch short of 9 feet?

Well, on reading D to mean seeing a man shorter than 9 feet, (1) is certain, i.e. Pr(D|Q) = 1; or on reading D to mean seeing a man precisely 8.979 feet—the actual writing of D after all, and we know we should not touch D—the best we can say is 0 < Pr(D|Q) < 1 because we have no information on how heights are distributed; all we know is that heights are contingent, meaning it is not certain (given the information we have) that all men must be precisely 8.979 feet. And therefore all we can say is “I don’t know.”

We must judge equation (1) as written! Not as we imagine it to be written, or how it might be written differently is we change the meaning of Q and D. Or about how we feel about Q and D. How it is written and nothing else.

It’s kind of funny, but if we turn probability into math there wouldn’t be the slightest interest or confusion. Suppose instead Q = “X < 9″ and D = “X = 8.979″ where X is just some number unrelated to any physical real thing. Then Pr(D | Q) no longer seems mysterious. In this case it’s hard to see where to add bits about, “In my opinion, we might see X larger than 9″ or “I would suspect that if X did equal 8.979 then X will be greater than 9.” Indeed, if anybody did announce the latter, you would regard him as eccentric. You’d say to him, “Listen, pal. These are just numbers. They don’t mean anything. And by assumption, no number can be greater than 9. So you are speaking out of your hat.”

Or change them again: Q = “Just half of all winged blue cats who understand French are taller than 9 feet” and D = “Observe a winged blue cat who understands French standing 8.979 feet”. Once again, we are not tempted to change Q and D and we interpret them as written.

Today’s lesson: don’t touch the propositions!

In Part II: McKibben’s Fantasy


If there were only 3.7 x 10-99 stars in the universe, there would not even be 1 star. 3.7 x 10-99 is of course less than 1.

Men Nine Feet Tall And Bayes Theorem

The OFloinn put up a most readable and recommended essay When is Weather Really “Climate”? and in one of the comments a reader named Gyan in part said:

Many economists and radical empiricists claim to reduce the whole of rationality to the Bayes’ Theorem. But John Derbyshire in his popular book on Riemann Hypothesis provides a curious counterexample.
Suppose you have a proposition that no man is more than nine feet tall. Then you find a man just a quarter inch short of nine feet.
Should your confidence in the proposition increase or not?
By Bayes’ it seems it should but common sense tells me that it should decrease.

I admit to flying somewhat blind here, because I don’t have Derbyshire’s book and can’t read his example; nevertheless, nothing ventured etc.

The economists and radical empiricists are partly right: it’s Bayes all the way, but only in the logical sense, i.e. the sense in which Bayes describes the probabilistic relationship between propositions, just as traditional logic only describes the logical relationship between propositions. About the origin of the propositions, and of fundamental truth, about which propositions are worthy of entertaining and which not, Bayes and logic are silent. In other words, radical empiricism is false, as it just-plain empiricism, and most of what economists say is best left unsaid. But of these things, another day. On to the example!

For ease of writing, let Q = “No man is more than nine feet tall,” and let D (for data) = “You find a man just a quarter inch short of nine feet.” These are two propositions and we can use Bayes, i.e. extended logic, to say something about their relationship. For instance, we can ask

     (1) Pr( D | Q )

or we can ask

     (2) Pr( Q | D ).

These probabilities are not the same, and are rarely the same for any two propositions; and unless you are clear about which you mean, you can easily mix them up.

Equation (1) is easily solvable. It says given that we know, or accept as true, that no man can be taller than nine feet, what is the probability of seeing a man less than nine feet, specifically a man a quarter inch shorter than nine feet. The answer is, in this interpretation, 1, or 100%. Of course it is! We have just said that it is a fact that no man can be taller; and here is a man who is indeed not taller.

This interpretation is not the same as F = “Any man a quarter inch shy of nine feet”. That would be

     (3) Pr( F | Q )

and to answer it fully would require we know more about the distribution of heights (F is about any old man; D is about a man). What we do know about heights is this: we know, via deduction, they are greater than zero feet, and, by assumption, they are less than nine feet. Therefore, the best we could say about (3) is that its probability is between 0 and 1. Now you might be tempted to say it is closer to 0 than to 1, but that is because you are implicitly adding information to Q, to the right-hand-side. That is, you might add information to Q about your experience with real heights of real men, experience which suggests a decreasing probability for very high heights. If you say (3) is closer to 0 than to 1, you are actually answering

     (4) Pr( F | Q & My experience about actual heights)

which you can see is not (3) and is therefore not an answer to (3).

Now turn the question around and answer (2): this is the chance that no man is taller than nine feet given we have seen one just shy of that number. The answer feels like it will be close to 0, but again that is because we are not strictly answering (2)—the strict answer to (2) is unknown, or perhaps just between 0 and 1 if we assume the contingent nature of these events. But what we really think we are answering is

     (5) Pr( Q | D & My experience about actual heights),

and that seems to make (5) close to 0. Let’s call E = “My experience about actual heights.”

What about Bayes’s theorem? Well, it’s easy to work out that (5) is equal to (via Bayes’s theorem):

     (5′) Pr( Q | D & E) = Pr( D | Q & E )Pr( Q | E )/Pr( D | E ).

This “updates” our belief in Q from Pr(Q | E) to Pr( Q | D & E) based on observing our “data” D. About the exact value to Pr(Q | E), I don’t know (here’s another point where we depart from economists and empiricists: Bayes does not claim all probabilities are quantifiable). As long as E doesn’t contain information contradictory to Q, such that Q is false given E, then we’re okay. In my mind, using my E, Pr(Q | E) is high, close to 1 (my E says I don’t know of any man taller than nine feet).

That leaves us Pr( D | Q & E ) and Pr(D | E) to figure out. We can attack Pr(D|E) directly or it turns out that Pr(D|E) = Pr(D|Q&E)Pr(Q|E) + Pr(D|not-Q&E)Pr(not-Q|E). The first part is just a repeat of the numerator, and “not-Q” means “it is false that no man is more than nine feet tall.” Let’s be lazy and answer Pr(D|E) directly: this is the probability of seeing a man 8′ 11.75″ given E. Pr(D|E) might be close to 0. But then so will Pr( D | Q & E ).

We already assumed Pr(Q|E) was “large”, so that if Pr( D | Q & E ) < Pr(D|E) then Pr(Q|D&E) < Pr(Q|E), i.e. our belief in Q shrinks after seeing D. But if Pr( D | Q & E ) > Pr(D|E) then Pr(Q|D&E) > Pr(Q|E) and our belief in Q increases after seeing D. Whether “Pr( D | Q & E ) < Pr(D|E)” or “Pr( D | Q & E ) > Pr(D|E)” is true depends entirely on E, which since it is so fuzzy makes this problem difficult and (sometimes) seemingly against intuition.

« Older posts Newer posts »

© 2014 William M. Briggs

Theme by Anders NorenUp ↑