Here’s the PDF of my talk DDP 30th Annual Meeting. “Statistical Follies and Epidemiology.” They picked the title.

Have fun!

Here’s the PDF of my talk DDP 30th Annual Meeting. “Statistical Follies and Epidemiology.” They picked the title.

Have fun!

This is a picture of Google’s Eric Schmidt & Chicago’s Rahm “Crisis” Emanuel from today’s *Wall Street Journal* story Google Move Buoys Chicago Tech Hub.

Chicago! The city of the walking, and voting, dead: in Chicago when they say they are a “Democrat forever”, they mean it. The Windy City! A fell wind: Chicago is murder central: yes, the bodies are really stacking up—just in time for November. The city of Clout! Where knowing the secret aldermanic handshake is a must to do business.

Schmidt and Emanuel worked out a deal to shift a few taxpayers from the suburbs to the city to create, in Emanuel’s words, a “digital Mecca.” A place where pilgrims go to worship and pray, to knell and adore. Chicago’s version of an Apple store.

But forget all that and look at the picture. Schmidt is not a poor man: his net worth isn’t a googol, but it’s orders of magnitude more than thine or mine. This is a man who if he wanted a candy bar could buy Hershey’s. And Emanuel, he is Boss. Snap his fingers and dozens of scurrying staffers will appear from beneath the cracks to do his bidding.

Both men could therefore afford to dress like men. Both men chose not to.

Schmidt looks like he has just pulled an all-night coding session, played a few hours of Warcraft to unwind, then crashed in the corner atop the (soft) pizza boxes, only to be awakened for a meeting he nearly forgot. The wrinkled, ill-fitting, billowy pullover does nothing to mask his sinking paunch; if anything, it exaggerates the swell. He hasn’t learned that short sleeves are only for vacation and while sporting. He gets points for having a man’s haircut, though, and more for workmanlike glasses.

It’s nice he’s humble enough to wear a name tag announcing “Hi I’m Eric Schmidt”, but a man in his position should embrace the authority that is his. It simplifies. And did you notice he’s carrying a Macbook Air? Wonder if he was able to install Chrome on it.

We can’t see his leggings, but there is a dark patch in the photo which suggests jeans. But I believe they are dark cotton; teacher pants, as Joe Queenan’s daughter calls them.

The overall effect is slobby. There is nothing to him which commands respect, except for knowledge of his bank balance which, while important, should never be the sole criterion of moral worth.

Emanuel is entirely different. He is a politician and is anxious to dress as a “man of the [computer] people”, though he knows he is not. Schmidt pulls that look off because he is. Look carefully: Emanuel is wearing French cuffs, but rolled up. French cuffs are not man-of-the-people wear. The shirt overflows his pants because this was a shirt cut to fit inside pants whose waist is where it should be, and not for jeans, which rest low on the hips.

His jeans are rich-people jeans. These are defined as jeans, made out of cotton, but made to look as non-jean-like as possible while still being jeans. They say, “I’m hip, but I have more money and taste than you.” The last point is debatable because a pair of trousers cut for a gentleman would be cheaper and would look better; but then they wouldn’t be jeans. His belt is expensive, though at least the buckle is simple and tasteful.

Emanuel also scores for his simple haircut—and then immediately loses his points for strapping a cell phone to his belt. He loses more for the shirt pocket pencil. But this is balanced out by his *lack* of a name tag. Chicagoans had better damn well know who he is. No computer or other toy for Emanuel, either. This is what staffers should carry.

No tie, or other neck-balancing accessory for either man, of course. And no jacket, either. A jacket is nature’s best creation: it covers flaws and accentuates merits. Schmidt’s gut and slopping shoulders need the masking a jacket would have provided, and Emanuel’s diminutive stature could have been heightened by a well-cut coat.

Just when was it that out bettors started dressing worse than us?

There is a sense in which we all argue that is statistical. For instance, if we say of a group that it sports an “arrogant attitude” we do not mean that *all* members of that group are arrogant, but that many of the group are, or that a clear majority of the leaders of the group are arrogant.

Or we might say, “Progressives are inflexible and unwilling to listen to counter-argument” and by this we do not mean that each and every soul who labels herself a progressive is inflexible and unwilling to listen to counter-argument, but we do assert that many who self-label as progressive are stubborn in this way, and we do claim that the clear majority of progressive leadership fits this bill.

This is a harmless manner of speaking. It saves time and words as long as there is a clear understanding that the claims are statistical and not intended to apply to each group member. As a for instance, take when Chik-fil-A President Dan Cathy said, “I pray God’s mercy on our generation that has such a prideful, arrogant attitude to think that we have the audacity to define what marriage is about” we do not assume Cathy means *all* souls who comprise “our generation”—a group which (by definition) includes Cathy himself—possess an “arrogant attitude.” He only meant the *majority* of this generation’s leaders are prideful and arrogant.

Proof is had in a letter Boston Mayor, and progressive leader, Tom Menino wrote to Cathy after Menino heard Cathy did not toe the progressive line. Menino admitted, re: the “prideful” charge, “guilty as charged.” But poor Menino, straining his education, has confused the shade of meaning of *prideful*. So skip to where Menino wrote “I was angry to learn [Chik-fil-A was searching for a site in Boston].” Menino goes on to bluff that Boston won’t allow Cathy to open his business in Boston. Guilty as charged on the arrogance, too.

More *arrogance* proof was furnished by Chicago alderman Joe Moreno, who also heard Cathy’s words, and who then denied Cathy’s company a building permit. Moreno said, “Chik-fil-A approached me with a paper bag in the usual way, but when I opened it, I was shocked to discover the ‘lettuce’ was real.” Chicago, the most corrupt of American cities, run entirely by Democrats for all of living memory, yet blaming all its ills on Republicans, has its rules that Cathy has not yet learned.

Chik-fil-A’s official values are Christian. It closes on Sunday. Its owner prays. It has touted Biblical passages. The company says that it is their intent to “treat every person with honor, dignity and respect — regardless of their belief, race, creed, sexual orientation or gender.”

Not letting this crisis go to waste, Rahm Emanuel said, “Chick-fil-A values are not Chicago values.” This is true. Chicago is well known for not treating every person with honor, dignity, and respect. It particularly eschews those values for the poor souls who would buck the machine.

Here Emanuel was speaking in a statistical sense. He did not mean *all* Chicagoans’ values are opposite of Chik-fil-A’s, but he meant all of its *leaders* do not hold with treating people with dignity, etc. In particular, almost certainly a *majority* of Chicago’s black, and a *majority* of Chicago’s Hispanic, and a *majority* of Chicago’s immigrant, and probably a just-plain *majority* of Chicago’s population profess values the same as Chik-fil-A’s, and say the same thing about gay “marriage” as Cathy does.

This has always been an embarrassment for the (mostly white) progressive leadership who when asked about this “disparity” change the subject faster than an cash “application fee” disappears into the dark pockets of a, well, of a Chicago alderman. But no worries: progressive leaders see it as their duty to look after these disadvantaged folks; they’ll bring ‘em around to the left point-of-view eventually.

And they’ll do it the Chicago way: by dangling trinkets in front of constituents which can be had by merely signing over their souls, or, if that fails, by good, old-fashioned intimidation and thuggery. Refuse to swear the oath of progressive allegiance? So far we can be thankful the only punishment is banishment.

The Chicago way is the Obama way. It is Mr Obama’s demand, and his value, that all women be given “free” birth-preventing pills and “free” pills which will kill the life which has somehow managed to slip by the first “free” pill. It is his demand, and his value, that those whose religions forbid the use of these controlled poisons (for lack of a better word; they are not drugs to heal or to medicate) be made to abandon their convictions.

Mr Obama, not speaking in the statistical sense, calls this stance a “compromise.”

Jelle de Jong writes in to ask:

Working as a quant analyst in finance I recently got interested in the Briggsian/Jaynesian/Bayesian interpretation of probability but am still struggling a bit with it. When reading your book/blog I was wondering what you mean when you say the ‘true value of a parameter.’ For a situation where we can imagine a (clearly defined) underlying population (say a population of people of which we have measured some property for only a sample) it’s seems clear what the connection is between the model (parameters) and the data-generating system, but if you would ‘estimate’ a binomial parameter how would you interpret this ‘estimated’ probability? Jaynes writes in his book that estimating a probability is a logical incongruity (Jaynes’ Probability Theory 9.10 on p. 292). Do you interpret the estimated parameter as an property of the (hypothetical) underlying distribution (i.e. the fraction of successes in an infinite sample) that can be estimated with corresponding quantification of uncertainty? But as this is just a model, in what sense can we speak of a true value of this parameter (The only truth is that the process will generate a number of successes). Can we give the estimated binomial parameter such a physical interpretation or is it only possible to assign a success probability, but then it would be incoherent to assign a distribution to this estimated probability.

I hope you can take the time to shed some light on my question.

With Kind Regards,

I’d start by putting my name last, in smaller font, and in parentheses, and then prefixing Laplace, Keynes (yes, that one), and especially David Stove who all took a logical view of probability. Historically, this turned out to be wise because the logical view is the correct one.

Consider the evidence—*assumed as true*—that E = “We have a six-sided object which when tossed shows and one side and just one side is labeled 6.” Given this evidence, I *deduce* the following:

Pr( ‘6 shows’ | E ) = p = 1/6.

Let’s add to our evidence by saying we’re going to A = “toss this six-sided object n times.” Then we can ask questions like this (to abuse, as they say, the notation):

Pr( ‘k 6’s show’ | E & A) = binomial(n,p,k)

where we again have *deduced* what the probabilities are. The ‘n’, ‘k’, and ‘p’ are all *parameters* of the binomial; and they are the true values, too. They follow from assuming as true E and A and by assuming we’re interested in k ‘successes’, i.e. k 6’s showing. And this is not the only time where we can deduce the value of a parameter, i.e. have complete knowledge of it; many situations are similar.

Now suppose instead we observe a game in which a ball is tossed into a box the bottom of which has holes, only some of which are colored blue. The box is a carnival game, say. We want to know, given all this information which we’ll label F, this:

Pr( ‘ball falls in blue hole’ | F ) = θ

From just F the only thing we can deduce is that 0 < θ < 1: θ isn’t 0 because some of the holes are blue, and it isn’t 1 because we know that not all holes are blue; beyond that, F tells us nothing. The point to emphasize is that we have *deduced* the true value (in this case values) of the parameter, which is 0 < θ < 1. (Actually, we do know more; we know the number of holes are finite, and this is actually a lot of information; however, for the sake of this post, we’ll ignore that information: but see this paper which works out this entire point rigorously.)

If we add to F another “A”, and consider n tosses of the ball, we deduce this:

Pr( ‘k blue holes’ | F & A) = binomial(n,θ,k)

where again 0 < θ < 1. We have complete knowledge of two parameters, n and k, but θ remains (mostly) a mystery.

And here we must stop unless we gather more evidence. We can make this evidence up (why not? we’ve done so thus far) or we can add evidence in the form of observational propositions: “On the first toss, the hole was not blue,” “On the second toss, the hole was blue,” and so on.

Given F and A and this new observational evidence we can call “X” (where the number of tosses in X are finite), we can deduce:

Pr( θ = t | F & A & X) = something(t)

for every possible value of t (where we have already deduced t can only live between 0 and 1; the value of ‘something’ relies on t). Very well, but this only gives us information about θ, which is only of obscure interest. It says nothing, for instance, about how many balls will go into blue holes, or the probability they will fall into blue holes. It’s just some parameter which assumes F, A, and X are true.

To get the probability of actual balls going into k actual (new) holes, we’d have to take our binomial(n,θ,k) model and hit it with Pr( θ = t | F & A & X), which you can think of a weighted average of the binomial for every possible value of θ Mathematically, we say we integrate out θ because the result of this operation is

Pr( ‘k new blue holes’ | F & A & X & n new tosses) = something(k)

where you can see there is no more (unobservable) θ and where the ‘something’ relies on k. This works even if n = k = 1 (new tosses).

It’s not useful to speak of θ as the “probability” of a ball going through a blue hole: that last equation gives that, and there is no θ in it.

Now, all statistics problems where new data is expected can and should be done in this manner. Almost none are, though.

Hope this helps!

**McKibben’s Folly**

Suppose it is true that we have E = “A six-sided object, just one side of which is labeled 6, and when tossed only one side will show.” We want the probability of R_{1} = “A 6 shows.” That is, we want

(2) Pr( R_{1} | E ) = 1/6.

This result only follows from the evidence in E to R_{1}. It has nothing to do with any die or dice you might have. We are in French-speaking-cat land (see Part I and links within) and only interested in what follows from given information. In particular, we don’t need in E information about “fairness” or “randomness” or “weightedness” or anything else.

(For those seeking depth, (2) is true given our knowledge of logic.)

Now suppose I want to know R_{n} = “A 6 shows n times.” This is just

(3) Pr( R_{n} | E & n) = (1/6)^{n},

where if want a number we have to change the equation/expression/proposition. Anyway, let n be large, as large as you like: (3) grows smaller as n grows larger. Except in the limit, (3) remains a number greater than 0, but if n is bugambo big, (3) is tiny, small, wee. Indeed, let n = 126 and (3) becomes 8.973 x 10^{-99}, a number which is pretty near 0, but still of course larger than 0. Right, Mr McKibben?

Since we assume in (3) that E is true (and n is given), the result is of very little interest. All we have are changing and decreasing numbers as n increases. End of story.

Now suppose we turn equation (3) around and ask

(4) Pr( E | R_{n} & n) = ?

In words, given we have seen a 6 show in n consecutive tosses what is the chance E is true? That is, given we have seen, or we assume we have seen a 6 show in n consecutive tosses, what is the chance that we have a a six-sided object, just one side of which is labeled 6, and when tossed only one side will show? On this reading, “a 6 shows in n consecutive tosses” implies at least part of E: although not completely. There isn’t information that the object is 6-sided, or that just one side can show, but there is evidence that a 6 is there.

In other words, “R_{n} & n” gives us very little information except that E is possible; therefore

(4′) 0 < Pr( E | R_{n} & n) < 1

and that is the best we can do. And this is so no matter how large n is, no matter how small (3) is. What follows from this is that E is not proved true or false no matter what (3); indeed, in (3) we assume E is true. Again: E cannot be proved true *or* false whatever (3) is or whatever n is. And this is it. The end.

Yet somehow E “feels false” if n is large. It is not false, as we have just proved, but it might feel that way. And that’s because we reason something like this: given our B = “experience at other times and places with data and situations that vaguely resembles the data and situation I saw this time, simple models like E turned out to be false and other, more complicated models, turned out to be true.” That is,

(5) Pr( E | B ) = small.

This is well enough, though vague. It has nothing to do with (4) however: (4) is entirely different. Plus, (5) says absolutely nothing about any rival “theories” to E. And then there are these interpretations:

(6) Pr( E | The only theory under consideration is E ) = 1,

and

(7) Pr( E | E can’t be true [because of (3)]) = 0.

Equation (7) is true, but circular; equation (6) is also true because unless we have specified a rival theory for how the data arose, we have no choice but to believe E (this is key). Equation (7), which contains a fallacy, is what many have in mind when judging E.

Now let M = “A 6 shows on roll 1, a 6 shows on roll 2, …, a 6 shows on roll n.” Then

(8) Pr( M | R_{n} & n) = 1.

Our “model” M is obviously true, though (8) is also circular. But we can “uncircle” it by changing M to M’ = “A 6 always shows.” Then

(8′) Pr( M’ | R_{n} & n) = 1.

Of course, if at the n+1-th roll a 6 does not show, we have falsified M’ (but not M; we’d have to write Pr(M’|R_{n} & n & not-six on n+1) = 0).

M and M’ are very good models because they explain the data perfectly. But equations (8) or (8′) have nothing to do with (4) or (4′). To judge between E and M, we’d have to start with a statement which assigns a prior belief (before seeing the data R) which of these models were true; then after seeing the data we can update the probability either model is true. But we cannot, using just R, say “E is false” or “M is true.”

The main point, if it is not already obvious, is that any observation will not prove a model or belief false, unless it’s a very special and rare situation like M’ (and observing something M’ said was impossible). What we really or often mean when we say “Just look at R; E is false!” is that we have some rival model N in mind, a model which we are sure is true. McKibben is convinced that N = “death-from-the-skies global climate disruption” is true but uses an equation like (3) to prove N. This is a fallacy. Eq. (3) has nothing in the world to do with N; if E is true then no observation can prove E false, or even show it is unlikely because, of course, E is true.

Now for real temperatures, the model N could be true; but so could many other rival models, and so could a model like E, suitably modified. The climate E_{c} can be “the temperature can fall into one and only one of three buckets, labeled low, normal, high.” Thus Pr(high | E_{c}) = 1/3 and so on. Or Pr(high n times in a row | E_{c} & n) = (1/3)^{n} as before. This means

(9) 0 < Pr( E_{c} | high n times in a row & n) < 1

just as before. Since there is no news about N in (9), N is irrelevant.

And just as before, we can start with an a priori judgment about the likelihood of N or E_{c} being true; and after seeing the data we can update these judgments. It will be the case, unless the a prior judgment is very skewed towards E_{c}, that N will be morel likely than E_{c} given the data.

But this does not mean that N is more likely true when judged against other models of the climate. We can, as we have just seen, compare N only against the straw man E_{c}, but this gives no evidence whatsoever about N and (say) W = “a climate model which does not assume the world will soon end unless new taxes are raised and given to politicians” (or any other climate model we might imagine).

It is therefore cheating, like all straw man arguments are cheating, not to use the best available competitor to N.

© 2014 William M. Briggs

Theme by Anders Noren — Up ↑