Skip to content

Category: Statistics

The general theory, methods, and philosophy of the Science of Guessing What Is.

September 15, 2008 | 43 Comments

How many geniuses are there?

Here is a question I often give on exams:

What is the probability that the next child to be born will be a genius? Give me a number and fully explain your answer.

There is not, of course, a single correct answer. What I just said is an important point, so let’s not skip lightly over it: there is no correct answer; at least, there is no way anybody can know the correct answer.

That nobody can know with certainty answers to questions of this type is under appreciated. I want people to learn this because we are, as I often say, too sure of ourselves.

What I want to see in the answer is acknowledgment of the ambiguities. First, what is a genius? Surely that word is overused to a remarkable extent. For example, this list says, with a straight face, authoress JK Rowling and movie maker Stephen Spielberg are geniuses. I often have the idea that to not call some eminence a genius is nowadays taken as a slight. However, a moments’ thought suffices to show that people exaggerate—if you are willing to take that moment.

The next step is to think of some geniuses for the sake of comparison. It’s best to think of dead ones so that you are not overly influenced by current events. After all, only history can truly judge genius. If you agree with even part of this, you will have made the next most important step: admitting that you can be biased.

How about some dead geniuses? Einstein pops into nearly everybody’s head first. Then, for me, Mozart, Beethoven, Shakespeare, Newton, and the guy who invented beer. No, I’m not joking about that last name. The point is my historical knowledge is modest, and most of the names I pick are men from the last 500 years, and most are from Western culture. Humanity is older than 500, of course, and there are other cultures besides our own, so I know that my knowledge of who is a genius is limited. That’s what got me to thinking about the brilliant soul who invented beer. He did so, probably in Sumer, before people wrote down incredible deeds of this sort.

This line of thought eventually leads to other cultures (Confucius, maybe Lao Tzu) and other times where writing was non-existent (was there just one person responsible for the wheel and agriculture?). There must be a lot of geniuses I don’t know, and some that nobody can ever know.

Next step is to count, and to acknowledge that exact counting is an impossibility. Still, we can count to the nearest order of magnitude. This means “power of 10”, and it represents an enormously popular method of approximation. If you can get your answer to within “an order of magnitude” (a power of 10), you are doing good. The first power of 10, or 101, is just 10. The second power is 102=100, and so on.

So how many geniuses? Certainly more than 10, definitely less than 10,000, or the 5th order of magnitude. Could there have been a 100 geniuses? Given my above list, I say yes. 1000? I’m less likely to believe this number, but since I have said that there were lots of geniuses who went unsung, I can’t exclude it. Still, an order of magnitude more than this seems too large.

We have done a lot so far, but we still haven’t answered the question “What is the probability that the next child to be born will be a genius?” The answer will look something like # of geniuses who have ever lived / # of people who ever lived. Coming to this equation is crucial. This is because the question implies—I emphasize, it does not explicitly state—we are asking a question about all humanity. And all humanity certainly means all humans who have ever lived.

Thus far, we have nailed down the numerator in this equation to the nearest order of magnitude or so (102 to 103). How about the denominator?

What evidence do we have? Well, there about about, to an order of magnitude, 1010 or 10 billion people alive today. 100 years from now, nearly of these people will be dead and a new set, probably the same order of magnitude will take its place. Anyway, 100 billion people alive 100 years from now feels way too large to me, and 1 billion way too small, especially given recent population trends.

100 years ago, there were about an order of magnitude less people alive (nearly all of them different from the set we have today), or about 109 or 1 billion. How many 100s of years can we go back? About 2000, since the best guess is humanity arose about 200,000 years ago. That’s close enough; it’s within an order of magnitude. Without doing any math—just going by gut—we can guess that adding today’s 1010 to last century’s 109 (11 billion so far), and to the previous 199 centurys’ diminishing contributions (each previous century had fewer people), we arrive at about 1011, or 100 billion.

Was that larger than you had first guessed? This number usually surprises most people. But having a guess gives us our denominator as that we can finally solve our equation, which is

102
—— = 10-9
1011

of, if there were 103 geniuses, 10-8. In words, it’s anywhere from 1 in a billion to 1 in 100 million.

Not very good odds, right?

This was a lot of thinking for such a simple question, wasn’t it? If you would have written down, as student’s often do, an answer “1 in a 100” or “1 in 1000” you would have got the answer wrong. Both answers imply that we should be flooded with geniuses, an answer which no observation supports.

Of oft-heard complaint among professors is that students don’t think about the answers they give. I agree with this, but I think it’s more than just students. It holds for professors and ordinary civilians, too.

“1 in a 100” is absurd, and far too certain. Just a few moment’s thought shows this. How many answers that we give in life are just as absurd?

Some kids will write, “I don’t know.” I usually give them 1 point for this because, after all, it is the strictly correct answer. But that answer is too certain itself. We do know something about the answer and we can answer it partially. We should always quantify uncertainty in any question and not seek the easy way out by given answers that are too certain.

Here, for fun, is another question I give:

How many umbrellas are there in New York City?

September 14, 2008 | 9 Comments

Much too certain: miscellaneous Sunday topics

Today, a topic that I mean to expand—greatly—in the coming weeks. That theme, as you might has guessed, is too many people are too certain about too many things. Nothing more than pointers to a couple of articles and some commentary for the moment.

Sad Day

Famed shark hunter Frank Mundus, who supplied the words to live by “A PhD don’t mean shit”, died this week. We earlier looked at Mundus’s philosophy in the essay “The BS octopus.” Mundus often proved that having letters after his name didn’t make him a better fisherman.

Don’t Go To College

On the same theme, Charles Murray’s new book on why most people don’t need to and shouldn’t go to college is out and making the blogs. Murray points out the obvious: not everybody is equipped to go to college, but most people are encouraged to do so. Many businesses want applicants to “have a degree” (same phrase we used in The BS Octopus). Meaning that the business doesn’t care what the applicant knows, since he can learn what he needs on the job, but they only want the letters after the name. Ridiculous.

Murray also does the simple math to show that most people would be better off skipping college altogether and heading for the trades: electrician (what I would have done), plumber, craftspeople of all types, farmers, and so on. Not everybody—and this is a shocker—graduates with distinction and makes the salaries at the top end of the range. Besides, Murray says, college should be saved for the people the brains to do it. Sound harsh and non-egalitarian? Well, not everybody has the body to be an athlete, nor the looks to be a model, nor the talent to become an actress or musician. We never have a hard time telling people the hard truth in those cases, but we’re squeamish about telling others that they might not be smart enough for school. It also goes smack in the fact of one of the guiding principles of the Enlightenment: education can cure all ills

High school guidance counsellors push too many people towards college. And colleges take them in. Generates big business, too. I speak from experience when I tell you that college is not for the majority. Since that is a true statement, but people desire college for the majority, this, in part, explains why college is not what it used to be, and why there are so many of them. Many modern-day colleges operate like expansion leagues in sports: too little talent to spread around, leading to watered down performance.

I taught too many kids who should not have been in my class. Sweet kids, mostly, big hearts. I never gave anybody a grade less than they deserved, but I admit to helping contribute to grade inflation. I recall two young men in one class. Both were part-time bouncers, both struggled, worked hard and came on time. They never missed a class or a quiz. Both were dumb as posts, but I loved them. They should have failed, but I weakened (I am a softy at heart) and I passed them. These guys were not isolated incidents, not for me, and not for many, many other professors I know.

Voting Complete

My study for guessing who will win the presidential election is now closed. It went well until one gentleman, who called himself a “godless liberal”, posted the survey on his blog. He called me a “right wing” blogger, and this somehow gave permission for his readers to go nuts and stuff the ballot box with all kinds of nonsense, despite my pleas that behave like good citizens. The good news is that I know who these “voters” are so I’ll be able to remove them from the answers.

Can’t, of course, post results on the actual survey until after the election. I might say some things about the ballot box stuffers before then.

September 12, 2008 | 48 Comments

The limits of acceptable criminal behavior to combat global warming

I want to ask a favor of my regular readers and of those who occasionally come here to seek an alternate view. You can help me spread the word.

Yesterday, we discussed the sad plight of Dr X, a once eminent scientist who appeared in a British court with the express purpose to justify criminal behavior. His argument was that the crime committed was necessary to bring attention to, and to modify the consequences of, harmful global warming.

Let’s not talk about whether Dr X had the right to speak as he did (I would argue that he did; even if he represented his employer; others, obviously, take the opposite view). Let’s also not talk about whether Dr X’s climatological theory is sound (I would say parts are, parts aren’t; others religiously say all right or all wrong).

What I want to do is to build a list of what criminal behavior that supporters of Dr X would say is justifiable or acceptable with regard to anthropogenic global warming. The list of non-criminal behavior is obviously long and varied and not of interest here. This list will only include acts that are expressly forbidden by law (misdemeanors or felonies or their equivalents in other countries).

I do not want to be facetious nor do I want people who are not supporters of Dr X writing and saying “I think they would accept Y.” Let’s only use this list to keep track of criminal activities that are legitimately believed to be allowable or justifiable. Please send in real quotes and verifiable links. Let’s also keep this fair—no euphemisms.

So far, this is what we have:

Acceptable Crime Supporters
Vandalizing property belonging to energy companies.    Dr Hansen of NASA

Obviously, we can find lots of people on various message boards who would justify any behavior, so let’s try and keep this list only for people or organizations of authority (we can be somewhat loose by what that means).

Gav, Tam, what do you say?

September 8, 2008 | 47 Comments

Demonstration of how smoothing causes inflated certainty (and egos?)

I’ve had a number of requests to show how smoothing inflates certainty, so I’ve created a couple of easy simulations that you can try in the privacy of your own home. The computer code is below, which I’ll explain later.

The idea is simple.

  1. I am going to simulate two time series, each of 64 “years.” The two series have absolutely nothing to do with one another, they are just made up, wholly fictional numbers. Any association between these two series would be a coincidence (which we can quantify; more later).
  2. I am then going to smooth these series using off-the-shelf smoothers. I am going to use two kinds:
    1. A k-year running mean; the bigger k is, the more smoothing there is’
    2. A simple low-pass filter with k coefficients; again the bigger k is, the more smoothing there is.
  3. I am going to let k = 2 for the first simulation, k = 3 for second, and so on, until k = 12. This will show that increasing smoothing dramatically increases confidence.
  4. I am going to repeat the entire simulation 500 times for each k (and for each smoother) and look at the results of all of them (if we did just one, it probably wouldn’t be interesting).

Neither of the smoothers I use are in any way complicated. Fancier smoothers would just make the data smoother anyway, so we’ll start with the simplest. Make sense? Then let’s go!

Here, just so you can see what is happening, are the first two series, x0 and x1, plotted together (just one simulation out of the 500). On top of each is the 12-year running mean. You can see the smoother really does smooth the bumps out of the data, right? The last panel of the plot are the two smoothed series, now called s0 and s1, next to each other. They are shorter because you have to sacrifice some years when smoothing.

smoother 1 series

The thing to notice is that the two smoothed series eerily look like they are related! The red line looks like it trails after the black one. Could the black line be some physical process that is driving the red line? No! Remember, these numbers are utterly unrelated. Any relationship we see is in our heads, or was caused by us through poor statistics methodology, and not in the data. How can we quantify this? Through this picture:

smoother 1 p-values

This shows boxplots of the classical p-values in a test of correlation between the two smoothed series. Notice the log-10 y-axis. A dotted line has been drawn to show the magic value of 0.05. P-values less than this wondrous number are said to be publishable, and fame and fortune await you if you can get one of these. Boxplots show the range of the data: the solid line in the middle of the box says 50% of the 500 simulations gave p-values less than this number, and 50% gave p-values higher. The upper and lower part of the box designate that 25% of the 500 simulations have p-values greater than (upper) and 25% less than (lower) this number. The outermost top line says 5% of the p-values were greater than this; while the bottommost line indicates that 5% of the p-values were less than this. Think about this before you read on. The colors of the boxplots have been chosen to please Don Cherry.

Now, since we did the test 500 times, we’d expect that we should get about 5% of the p-values less than the magic number of 0.05. That means that the bottommost line of the boxplots should be somewhere near the horizontal line. If any part of the boxplot sticks below above the dotted line, then the conclusion you make based on the p-value is too certain.

Are we too certain here? Yes! Right from the start, at the smallest lags, and hence with almost no smoothing, we are already way too sure of ourselves. By the time we reach a 10-year lag—a commonly used choice in actual data—we are finding spurious “statistically significant” results 50% of the time! The p-values are awful small, too, which many people incorrectly use as a measure of the “strength” of the significance. Well, we can leave that error for another day. The bottom line, however, is clear: smooth, and you are way too sure of yourself.

Now for the low-pass filter. We start with a data plot and then overlay the smoothed data on top. Then we show the two series (just 1 out of the 500, of course) on top of each other. They look like they could be related too, don’t they? Don’t lie. They surely do.

smoother 2 series

And to prove it, here’s the boxplots again. About the same results as for the running mean.

smoother 2 p-values

What can we conclude from this?

The obvious.

BORING DETAILS FOLLOW
Continue reading “Demonstration of how smoothing causes inflated certainty (and egos?)”