Tyranny Of The Mean

The mean can be nasty. This should come as news to no one. But since, the good Lord knows why, this topic made it into the New York Times, it has become news and so must be discussed.

One Stephanie Coontz, associated with something called the Council on Contemporary Families—and since “families” is modified by “contemporary” one immediately has the suspicion that “families” does not mean “families”, but never mind—trumpeted a report called “The Trouble with Averages” written by some guy who sees fit to end his name with “PhD.”

The sin warned against is using just one number to summarize uncertainty in a thing which can take more than one value. The one number is the numerical average, a quantity which is often asked to bear burdens far beyond its capability.

The average is often used to define what is “normal”, with the implication that deviations from it are “Abby Something”, to quote Igor. The more slavish the devotion to this concept, the more the world appears insane, because hitting the average becomes increasingly difficult.

This applies to people and things. You can say the normal temperature is X degrees, and as long as you define exactly how this was calculated, you’re on solid ground, but only an activist would fret at any departure from this number and suspect foul play.

It might be that the average man grieves (say) 8 months after the death of his wife (one of Coontz’s example), but that doesn’t mean that a man who stops crying at 2 months is heard-hearted, nor that a man who wears sackcloth for two years is insane.

Using just the average to define “normal” in people is dangerously close to the fallacy of defining moral truths by vote. Come to think of it, isn’t that what the Diagnostic and Statistical Manual of Mental Disorders does? Plus, even “extremes” might not be “abnormal” in the sense of undesirable or harmful; it all depends on the behavior and our understanding of biology and morality.

Planning on the average for physical things can make sense, but only in the rare cases where the average is all that matters. Engineers don’t design bridges to withstand only average loads.

Unless the item of interest is fixed and unchanging, and in which case the numerical mean is all that can occur, the idea of calculating an average is to assist in quantifying the uncertainty of the thing. If a thing varies, the mean will always be incomplete and reliance on it alone will lead to over confidence.

And don’t forget: probability doesn’t exist as a physical thing; it is instead the measure of uncertainty.

Anyway, not much of a post or a lesson today. Instead I’ll put the burden on you. What are some good examples where the mean, and only the mean, is an adequate summary?

Update Coontz used the word “outliers”. There are no such things. There can be mismeasured data, i.e. incorrect data, say when you tried to measure air temperature but your thermometer fell into boiling water. Or there can be errors in recording the data; transposition and such forth. But excluding mistakes, and the numbers you meant to measure are the numbers you meant to measure, there are no outliers. There are only measurements which do not accord with your theory about the thing of interest.

Far too often I find people throwing out real data because it doesn’t fit their preconceptions, i.e. model. Nutty behavior.


Thanks to Andrew Kennett to pointing us to this topic.


  1. This should appeal to your professional interests: what about using the mean value of exam scores in order to arrive at a semester grade? Back in the days of Excel 4.0 I had a spreadsheet that kept track of all of my students’ grades and at the end of the semester I merely assigned grades using the old fashioned scale of 90+=A, etc. Is this a valid use of a mean?

  2. An example of the mean as sufficient? That’s easy. The calculation of student grades as the average (usually weighted) of assignment, lab, and test marks. In this case no one would accept outliers as sufficient. I’ll take my best answer to represent the whole semester’s work please. Dream on kid.

  3. Baseball batting averages. For decades it was an adequate measure of offensive power – simple, understandable, clean. The SABRmetricians have gone way beyond it now, but it’s still the handiest metric for comparing players or their accomplishments over a career. It defines Ted Williams, for example — the last player to bat over .400 for a season.

    In the same way, a hockey goalie’s goals-against average tells right away whether he’s a Vezina trophy candidate or not.

  4. When dealing with data reasonably modeled by Poisson, the mean is sufficient, because the std. dev. equals the mean. (So there.)
    + + +

    I’ll have to disagree slightly about outliers; at least so far as statistical process control goes. They are useful in determining when data may have come from a different process or population. Parts may have been pulled from the wrong lot, or the setting on a machine may have changed. Examples: tablet potency in tablets pressed from a second bulk; PbO paste weight on battery grids after a change in paste density from the mill; etc. Of course, there is a model. There are no self-explanatory facts. There must always be a theory from which the facts take their meaning and value. But an outlier, in my craft, was a value outside the expected range (however defined) of process variation.

    Naturally also, as you note, for detecting errors in the measurement process; as when an engineer measured a metal residual quite different from those recorded by QC, said difference being traced to the use of a damaged gauge. In another case, a liquid pharmaceutical fill weight was transcribed 76 instead of 67; and so on.

    What is illegitimate, as I think we agree, is to simply discard outliers qua outliers, or to automatically “adjust” them with a computerized algorithm. (Does Al Gore have rhythm?) Our procedure was to use outliers as signals for investigation for assignable causes of variation.

  5. “probability doesn’t exist as a physical thing”-> what about quantum mechanics?… :] #gasolineinthefire

  6. Off-topic, but I thought I’d let you know that I found your book Breaking The Law Of Averages very useful in my own work recently. I’m a physical scientist and I know I’m no statistician. Your book enabled me to use correlation coefficients and p-values correctly, whilst understanding many of the ways in which such tools are misused.

  7. “Engineers don’t design bridges to withstand only average loads.”

    But, I have to say, they DO design them using computer models 🙂

  8. Poisson distribution of positive values?
    Ganz natürlich. Basic situation: low probability events, but vast opportunity. (Infinite opportunity, fapp).

    Example 1: number of calls received at a maintenance hot line in a pharmaceutical plant. Outliers: scientists who called multiple times on same issue.
    Example 2: number of times pumps failed in a chemical plant. Deviation from Poisson due to pumps being in distinct services (e.g., water vs. caustic), analyzed separately. Identified pumps in need of special maintenance attention for that service.
    Example 3: stones (bits of refractory material) in glass bottles. Outliers (to Poisson model) flagged problems with glass shop walls.
    + + +
    Other situations called for normal, lognormal, exponential, extreme value, et al. models. Normal=sum of many small random causes. Lognormal=product of many small random causes. Extreme value=maximum (or minimum) of stresses or loads on a system. Etc.

  9. Ye olde —

    A Perito / exponential / power law distribution may be a more appropriate for situations that are a product of may small causes.

    Energy released in earthquakes, brightness of stars, money, street adresses all have these extremely left tailed distributions.

    I am looking for a distribution that describes market returns — something bell shaped, but with much fatter tails than a normal distibution. T disribution works okay, not for any theoretical purpose, but it is more representative of my data. I have had Cauchy suggested, but it can be tricky to work with.

  10. @DougM
    You mean Pareto, right?

    The exponential shows up a lot in reliability engineering as a model for component failures with a constant hazard rate. And indeed the examples you mention all have the property of having an operational zero point, usually (but not always) numerically zero. Example: the number of calibration tests needed to calibrate an underwater telecommunications repeater. Since there is always one test, the variable is actually c-1, the number of re-tests.

  11. A bimodal distribution is one example of the failure of the mean. If 90% of the population has an income of 1000 and 10% has an income of 10000 you get a mean of 1900 which no one in the group would have. The mode is often better than the mean and a lot of people interpret the mean as being the mode.

  12. I can’t tell you exactly what distribution a variable has, but here is what I would do to fit an appropriate one to the data.

    For a continuos variable, regardless of what the variable is, plot a histogram for the data first. A histogram helps one decide the shape of the data distribution. Next, visit this forever increasing list for possible choices – http://en.wikipedia.org/wiki/List_of_probability_distributions.

    A discreet probability distribution usually has some special properties and assumptions, and therefore is used for specific cases. For example, binomial and Poisson distributions.

  13. Brian Joiner did histograms of random normal numbers and found markedly “non-normal” looking histograms even with samples of n=100. You’d be better off with a probability plot. Even so, you need to consider the nature of the cause system. In some cases, the causes are such that they are as likely to increase the variable as decrease it; in others, most variation is more likely to be to one side than the other. For example, a die depresses a cylindrical column after which the metal “springs back” slightly (provided the force of the die exceeds the yield point of the metal). The height of the columns afterward will produce a modestly skewed distribution. The height can never be shorter than the travel of the die, but might be taller by the amount of spring-back. Similarly, the amount of residual material in batches of reactor crude following a distillation can never be greater than the amount of residual from the reaction itself, but might be less depending on the efficiency of the distillation. (In fact, the effectiveness of the distillation column is better measured by the delta between %X in the crude vs. %X in the distillate, since the reactor is also subject to variation.
    + + +
    More commonly overlooked is the fact that any process output is the combined result of the process itself plus the measurement process. And as we segue from mechanics to electronics to chemistry to biology to (hawk-ptooei) social “sciences,” the repeatability of the measurement becomes more and more problematical. I worked with chemical lab tests whose repeatability and reproducibility was as great as 50% of the tolerance range.

Leave a Comment

Your email address will not be published. Required fields are marked *