Skip to content

Category: Philosophy

The philosophy of science, empiricism, a priori reasoning, epistemology, and so on.

May 17, 2010 | 33 Comments

You Too Can Be A Genius After 10,000 Hours

No, you cannot. That title is a lie, and, judging by a recent spate of books on the subject, a popular one.

Ann Hulbert of Slate has compiled a list of books which preach the Gospel of Success (HT A&LD).

Gladwell’s Outliers: The Story of Success was not, appropriately enough, a bolt of original genius when it appeared in November 2008. Geoffrey Colvin’s Talent Is Overrated: What Really Separates World-Class Performers From Everybody Else had come out a month earlier. The following spring brought Daniel Coyle’s The Talent Code: Greatness Isn’t Born. It’s Grown. Here’s How. …This spring David Shenk’s The Genius in All of Us: Why Everything You’ve Been Told About Genetics, Talent, and IQ Is Wrong has gotten several raves. Hot on its heels arrives Bounce: Mozart, Federer, Picasso, Beckham, and the Science of Success, by Matthew Syed, a former Olympic ping-pong player turned journalist.

Gladwell and his followers are rotten statisticians. They look upon their sample of the successful and say, “Hark! These shiny examples have all worked hard; their dedicated efforts brought them to the top. So too can elbow grease alight you on the pinnacle.”

Diligence is key! After reaching a certain level of practice, anybody can reach the height of their professions. Talent is a nicety, not a necessity.

These are obviously false, beliefs based on bad sampling. It is fine to catalog the habits of the successful, but it is a mistake to conclude that those habits are what are solely responsible for achievement. Why? Because this neglects the vastly larger–and hidden—pool of people who have adopted the same habits but who were not successful.

It’s true that mere talent is rarely sufficient to propel one to the top, but without it, one will not go far. Pete Rose had hustle, but he also had talent. Edison was right: genius is one percent inspiration and ninety-nine percent perspiration. The error comes with believing that one-hundred percent perspiration can make up for the lack of one percent inspiration.

We can trace the error back to Enlightenment—particularly Locke and his tabula rasa. If everybody was a blank slate, then all were equal, all could achieve the same. Yet we observe differences; therefore, those differences must have arisen because of disparities in education and culture. Remove the disparities and—voilà!—equality is restored.

This unsound argument—its premises so earnestly desired—was seized by intellectuals, who to this day unquestioningly claim it as an obvious truth. They pet it lovingly; it is their precious. It lies to them. It tells them that they are great, too, but unrecognized. Paradoxically, it tells them that there were no great men, there was only circumstance, prejudice, effort, and luck. Anybody could be a Newton had they only had the proper upbringing.

How did such a ridiculous belief spread? David Stove said, “a twentieth-century professor of history can hardly be a hero himself, and he naturally finds it comfortable to believe that no one else can either.” Well, envy is, after all, one of the seven deadly sins.

Now, I am 6’2″ and 200 pounds, but I will never, no matter my willpower, no matter how many hours I put in, become a successful jockey. Nor will I ever be found on the offense line of the Detroit Lions, sad as that team is. Equally, successful jockeys, even if they expend 20,000 hours of “deliberate” practice, will never become successful NBA players.

These kinds of statements are never (well, rarely) controversial. Because why? Because physical differences are readily observed: even academics can appreciate that short people do not make great basketball players.

But differences in mental ability are not easily observed. The science of phrenology having falling into disrepute, one cannot, at a glance, tell a mundane brain from an excellent one. Therefore, the reasoning goes, since I cannot see a difference, it does not exist.

Our culture suffers dreadfully from the natural corollaries of this specious argument: all can be educated and should go to college, all can learn calculus and evolutionary theory, all are talented and deserve a ribbon, your business will succeed if you press these buttons, it’s what’s inside that matters, learn to love yourself, everybody is good at something, it’s not your fault.

The worst is that if only more money were spent, then circumstances could be fashioned so that all students will be above average. Dollars per-student is ever-increasing, rising faster than inflation, yet performance stagnates. The solution? Spend more.

May 9, 2010 | 18 Comments

Are Men Smarter Than Women?

The Question

No. That is to say, Yes. But not really. Actually, what we have here is an badly phrased question: just what do we mean when we ask “Are men smarter than women”?

We’re asking this again, because (via HotAir) The Daily Mail has asked. And, even though that paper is, as many readers have insisted, England’s equivalent of the New York Post, an article by retired professor Richard Lynn has, as they say in journalism, stirred up controversy.

Judging by the comments garnered at the paper and at HotAir, most do not understand the question, or purposely—or willfully—misunderstand it. Part of the difficulty is that the question is badly put.

Interpretation one: all men are smarter than all women. This is false, obviously. And not on any theoretical grounds: its falsity rests on a solid empirical base.

Interpretation two: some men are smarter than some women. This is clearly true; it has been amply empirically verified. But so has its inverse: some women are smarter than some men.

Interpretation three: some men are smarter than all women. This is true; but once more, so is its opposite: some women are smarter than all men. How can this be?

What’s Smarter?

Because we, like everybody else, have been playing fast and loose with the word “smarter.” We’re letting everybody interpret the word, via the question, in any way they like.

Before we go farther, we have to understand the differences in types of evidence used in answering who’s “smarter.” There are three: empirical observations, “theory”, and counterfactual arguments.

We can dismiss theoretical “evidence” immediately. There are only two theories of any importance. The first we might call political correctness. This states that the sexes are equal, no matter what, and if there are any observed differences between the sexes, it is because one sex has successfully dominated another for century upon century.

This theory falsifies itself, and obviously. For if one sex has successfully dominated another for centuries, it is clear that that sex is smarter in the art of domineering, which is to say, politics. Therefore, both sexes are not equal in all things. Any riposte based on physical differences also fails.

Further, the idea behind this theory is counterfactual. It says that if a certain situation did not hold historically, the observational evidence (discussed next) would have been different. There is no way to know whether any counterfactual like this is true. Desire is no substitute for evidence.

The second theory is that, by fiat, all women are smarter than all men at mental activity X, whatever X might be. It has to be “mental activity” because of the obvious physical differences between the sexes. This theory might be true for some as-yet rigorously classified mental activity, but it has not been observed to be true for common mental activities. Which is to say, some men have been observed to be better than some women at some activities, even though more women are better than most men at those activities. I have in mind “local” politics, though it’s not important whether I’m right about that.

Direct empirical observations tell us, in particular mental activities, who the smartest person (singular!) was. There is, of course, subjectivity in this, because of the difficulty of rigorously defining the scope of the mental activity. Take physics: here, Isaac Newton is probably tops. Therefore we can say that this man was smarter than all women—but also smarter than all other men.

The difficulty here, and a big one, is that only a certain few activities are gauged worthy of tracking. Physics and math are two of these worthies, local politics (“interpersonal relations”) is not. Therefore, whenever we say “smarter” we are including a value judgment about which mental activities are important, and in what contexts. This is why, when asking the main question, we must specify an activity.


Again, take physics. Any list—from all of history; we cannot exclude portions of our sample—of the best of the best includes a majority of men. However, what’s not tracked is the worst of the worst. That is, we cannot draw on all of history to find the stupidest. But we can look locally (in time and geography): here, we discover that the stupidest include a majority of men, too.

From this evidence, we can conclude that males exhibit more variability than do women. We also know that the averages of scores used to track these activities show men and women are roughly the same. These observations hold across a wide variety of routinely tracked mental activities. The implication of these facts gives us a working answer to the big question.

Take any equally sized group of men and women. Given the evidence we have compiled, and knowing nothing else except the sex of these individuals, the probability that more men than women in this group will be at the top on a commonly tracked (and valued) mental activity than women is greater than 50%. The “top” has to be some fraction less than half.

Switch “top” to “bottom” and the conclusion remains the same. We have no or little evidence whether this holds for non-tracked, or non-valued mental activities: it probably does not hold.

Also, the conclusion holds only for groups of sufficient size. If, say, there were only one man and one women, the probability, given the same evidence, is (approximately) 50% that the man will be smarter than the woman. This also means that the man is just as likely to be stupider.

Note that the “nothing else” includes ages, education, country of origin, and on an on. If we do know other probative information, then this naturally modifies the conclusion.


IQ is measured by taking a test or tests. From this observational evidence, it is supposed that those who score high are “smart” and that those who score low are “stupid.” The difficulty is that the IQ is said to measure “general intelligence” and not intelligence of a more specific type, like physics ability. IQ is in large measure a distraction.

Whether or not there is such a thing as general intelligence, or whether intelligence is multi- or unidimensional, is not relevant to our main conclusion. If we can narrowly and rigorously define a mental activity—like ability to do physics—than that is enough for us.

Whether or not somebody who evinces a large IQ score might be able to perform any given mental activity well is not relevant.

May 5, 2010 | 11 Comments

Confidence Intervals, Logic, Induction


“Because all the many flames observed before have been hot is a good reason to believe this flame will be hot” is an example of an inductive argument, and a rational one.

An inductive argument is an argument from contingent (not logically necessary) premises which are, or could have been, observed, to a contingent conclusion about something that has not been, and may not be able to be, observed. An inductive argument must also have its conclusion say about the unobserved something like what the premises says about the observed.

In classical, frequentist statistics inductive arguments are forbidden—not frowned upon, but disallowed. Even some Bayesians have adopted the skeptical belief that inductive arguments are “ungrounded”, or that there is a “problem” with induction. This is not the time to enter into a discussion of why these thoughts are false: David Stove’s masterpiece “The Rationality of Induction” can be consulted for particulars.

Anyway, only academics pretend to be mystified by induction, and only in writing. They never act as if induction is irrational. For example, I’ve never met a skeptical philosopher willing to jump off a tall building. I assume inductive arguments to be rational and unproblematic.

There are deductive arguments and non-deductive arguments; not all non-deductive arguments are inductive ones, though it is a common mistake to say so (perhaps because Carnap often made this slip). Logical probability can be used for any type of argument. Frequentist probability is meant to represent a substitute for deductive arguments in non-deductive contexts, somewhat in line with Popper’s ideas on falsification. We will talk about that concept in a different post.

Confidence Intervals

In frequentist statistics, a confidence interval (CI) is function of the data. Custom dictates a “95%” CI; though the size is irrelevant to its interpretation. Often, at least two data points must be in hand to perform the CI calculation. This, incidentally, is another limitation of frequentist theory.

The CI says something about the value of an unobservable parameter or parameters of a probability model. It does not say anything about observables, theories, or hypotheses. It merely presents an interval (usually contiguous) that relates to a parameter.

Its interpretation: If the “experiment” in which you collected your data were to be repeated a number of times that approached the limit of infinity, and in each of those experiments you calculated a CI, then 95% of the resulting (infinite) set of CIs will “cover” the true value of the parameter.

Please read that over until you have assimilated it.


Problem one: The “experiment” must be, but almost never is, defined rigorously. But even when it is, the idea that you could recreate an experiment that is identical to the milieu in which you collected your original data an infinite number of times is ludicrous. That milieu must be the same in each re-running—except that it must be “randomly” different. That “randomly” is allowed to remain vague and undefined.

Problem two, and the big one. Even if you can satisfy yourself that an infinite number of trials is possible, you are still confronted with the following fact. The CI you collected on your data has only one interpretation: either the true value of the parameter lies within it or it does not. Pause here. That is all you are ever allowed to say.

The italicized statement—a tautology—is so important, so crucial, so basic to the critique of frequentist theory that few can keep it in focus, yet nothing can be as simple.

Whatever interval you construct, no matter how wide or how small, the only thing you are allowed to say is that the true value of the parameter lies within it or it does not. And since any interval whatsoever also meets this tautological test—the interval [1, 2] for example—then the CI we have actually calculated in our problem means nothing.

Yet everybody thinks their CI means something. Further, everybody knows that as more data collected, the calculated CIs grow narrower, which seems to indicate that our confidence about where the true value of the parameter lies grows stronger.

This is false. Strictly false, in frequentist theory.

Making any definite statement about a CI other than the above-mentioned tautology is a mistake. The most common is to say that “there is a 95% chance that the parameter lies within the CI.” That interpretation is a Bayesian one.

The other, just mentioned, is to say that narrower CIs are more certain about the value of the parameter than are wider CIs. That is an inductive argument which attempts to bypass the implications of the tautology.


The gentleman that invented confidence intervals, Dzerzij (Jerzy) Neyman knew about the interpretational problems of confidence intervals, and was concerned. But he was more concerned about inductive arguments, which he thought had no business in statistics.

Neyman tried to take refuge in arguments like this: “Well, you cannot say that there is a 95% chance that the actual value of the parameter is in your interval; but if statisticians everywhere were to use confidence intervals, then in the long run, 95% of their intervals will contain their actual values.”

The flaw in that workaround argument is obvious (make sure you see it). And so, with nowhere else to turn, in 1937 Neyman resorted to a dodge and said this: “ The statistician…may be recommended…to state that the value of the parameter…is within [the just calculated interval]” merely by an act of will.

Since that time, statisticians having been illegally willing CIs to mean more than they do.

Induction and CIs

You will often read of “numerical simulation experiments” in which a statistician tries out his new method of estimating a parameter. He will simulate a, necessarily finite, run of CIs where the true value of the parameter is known and note the percentage of simulated CIs that cover the parameter.

If the percentage is close to 95%, then the statistician will state that his procedure is good. He will convince himself that his method is giving proper results: that is—this is crucial—he will convince himself that his estimation method/theory is likely to be true.

Just think: he will use an inductive argument from his observed experimental data to infer that future CIs will be well behaved. But this is forbidden in classical statistics. You are nowhere allowed to infer the probability that a theory is true or false: nowhere.

Any such inference is the result of using induction.

Of course, classical statisticians everywhere use induction, especially when interpreting the results of studies. We just never seem to remember that the frequentist theory of probability forbids such things. Challenge two: find one study whose conclusions do not contain inductive arguments.

Logical Probability, Bayesian

Any theory of probability should be all-encompassing. It shouldn’t just work for the technical apparatus inside a probability model, and not work for events outside that limited framework. A proper theory should apply to its technical apparatus, its conclusions and the extrapolations made from them. The Bayesian and logical theories of probability, of course, are general and apply to statements of any kind.

Somehow, frequentists can use one form of probability for their models and then another for their interpretations of their models. This inconsistency is rarely noted; perhaps because it is more than an inconsistency: it is a fatal to the frequentist position.

Now, if you have ever had any experience with CIs, you know that they often “work.” That is, we can interpret them as if there were a 95% chance that the true value of the parameter lies withing them, etc.

This is only an artifact caused by the close association of Bayesian and classical theory, where the Bayesian procedure opts for “non-informative” priors. This is coincidental, of course, because the association fails to obtain in complex situations.

There are those who would reconcile frequentist and Bayesian theories. They say, “What we want are Bayesian procedures that have good frequentist properties.” In other words, they want Bayesian theory to operate at the individual problem level, but they want the compilation, or grouping, of those cases to exhibit “long-run” stability.

But this is merely Neyman’s mistake restated. If each individual problem is ideal or optimal (in whatever sense), then the group of them, considered as a group, is also ideal or optimal. Plus, you do not want to sacrifice optimality for your study for the tenuous goal of making groups of studies amicable to frequentist desire.


As always: we should cease immediately teaching frequentist mathematical theory to all but PhD students in probability. In particular, no undergraduates should hear of it; nor should casual users.

Next stop: p-values.

May 2, 2010 | 13 Comments

The Truth of Things: When Probability & Statistics Cannot Be Used

I am always struggling with (my limited ability of) finding ways to describe the philosophy behind logical probability, especially to people who have a difficult time unlearning classical frequentist theory. This post is more for me to test a sketch of an explanation than to be complete explication of that theory. I am writing to those who already know statistics.

If a theory—or hypothesis, argument, whatever—cannot be deduced it remains uncertain. Formally or informally, probability is used to quantify this uncertainty.

Consider your trial for murder. Your guilt must be established in the collective mind of the jury “beyond reasonable doubt.” That phrase acknowledges that the certainty of your guilt or innocence is unattainable—in their minds—but its probability can still be established.

Incidentally, whether the phrase “beyond reasonable doubt” is a historical accident or the result of careful logical reasoning is irrelevant. It’s common meaning is enough for us.

Through an obviously informal process, each jury member begins the trial with an idea of your guilt, which is modified as each new piece of evidence arises, and through discussions with other jury numbers. There are mathematical ways to model this process, but at best these models are crude idealizations.

Bayes’s formula can illustrate how jurors update their evidential probabilities, but since nobody—even the jurors—knows how to verbalize how each piece of evidence relates to the probability of guilt, these models aren’t much help.

A probability model can be written in the form of a statement: “Given this background evidence, my uncertainty in this observable is quantified by this equation.”

The background evidence is crucial: it is always there, usually (unfortunately!) tacitly. All statements made, no matter how far downstream in analysis, are conditional on this evidence.

You possess different background evidence than does the jury. Your evidence allows you to state “I did it” or “I did not do it.” In other words, this special case allows you—and only you—to say, “Given my experience, the probability that I did the deed is zero”; or one, as the case might be.

Deduction then, is a trivial form of probability model.

The observable is in this case is the crime: in statement form “You committed the murder.” The truth of that observable is not knowable with certainty to the jurors given any probability model.

The probability model itself is where the confusion comes in. It cannot exist for this, or for any, unique situation.

A classical model a statistician might incorrectly use to analyze the jury’s collective mind is “Their uncertainty in your guilt is quantified by a Bernoulli distribution.” This model a simple mathematical form, which is this: θ, where 0 < θ < 1. Notice that those bounds are strict. People mistakenly call the parameter θ of the Bernoulli "the probability" (of your guilt). It is not---unless the parameter θ has been deduced and equal to a precise number (not a range). If we do not know that value of θ, then it is just a parameter. In itself, it means nothing.

The probability (of your guilt) can be found by accounting for the uncertainty in the parameter. This is accomplished by integrating out the uncertainty in θ—essentially, the probability (of guilt) is a weighted average of possible values of θ given the evidence and background information.

But just think: we do not need θ. We have a unique situation—your trial!—and the jury, not you, ponder your probability of guilt; they certainly do not invoke an unobservable parameter. The trial evidence modifies actual probabilities, not parameters.

Now, if we wanted to analyze specific kinds of trials—note the plural: that “s” changes all—then and only then, and as long as we can be exact in the kinds of trials we mean, we can model trial outcomes.

This model is useless for outcomes of trials we have already observed. And why? Because the evidence we have is the outcomes of those trials—whose outcomes we know! Silly to point out, right? But surprisingly, this easy fact, and its immediate consequences, is often forgotten.

Another way to state this: We only need to model uncertainty for events which are uncertain. We can model your trial, but only assuming it is part of a set of (finite!) trials, the nature of which we have defined. The nature of the set-model tells us little, though, about your trial with your unique evidence.

The key is that there does not exist in the universe a unique definition for kinds of trials. We have to specify the definition in all its particulars. This, of course, becomes that background information we started with. Under which definition—exactly!—does your trial lie?

It is the uncertainty of ascriptions of guilt in those future trials that is of interest, and not of unobservable parameters.

Oh, remind me to tell you how mathematical notation commonly used in probability & statistics interferes with clear thinking.