William M. Briggs

Statistician to the Stars!

What happened to sultry?

I like Jessica Rabbit. Her voice, I mean—Kathleen Turner. Throaty, a hint of edgy raspiness, alluring, damn sexy.

But I just found out that I was wrong. Turns out I do not like sultry voices like I think I do. Instead, peer-reviewed research has proved that “High-pitched voices are most attractive.” This can only mean that I require re-education to correct my incorrect choice. Experts have weighed in!

Actually, of course, and because I can’t continue being facetious, I want to highlight a very common piece of poor “science” journalism, based on questionable research. I want to dissect this article paragraph by paragraph (don’t worry, it’s not long), to show how to spot garbage.

The article, entitled “High-pitched voices are most attractive“, by Dave Munger over at Cognitive Daily, summarizes a paper by Feinberg and others entitled “The role of femininity and averageness of voice pitch in aesthetic judgments of women’s voices” in the journal Perception.

First, the title (the reporter’s, not the paper’s). It is false. Not just “maybe not true” but false as in “ridiculously untrue.” It is not true in my case, nor in many, many other cases. So why would somebody write such a headline? Laziness, probably.

The article starts with some unmemorable fluff about some celebrity, finally moving to the sentence “In general we perceive higher voices as more feminine.” This is true. But it is one of those statements that every single human already knew was true—who didn’t know females had higher voices?—only it could be “proved” true until “research” said it was. This unfortunate attitude is now commonplace. It isn’t true until some “researcher” does “research” to show it’s true. Nonsense.

Let’s not lose sight of what they author is trying to prove: “High-pitched voices are most attractive.” What is the evidence for this (false) statement? Well, this: researchers “recorded the voices of 123 young women as they pronounced five vowel sounds: ah, ee, eh, oh, and oo. Then ten male volunteers rated each voice for attractiveness.”

How many men? 10, or “ten”, or “just one more than nine.” No doubt these men were chosen from a broad population to ensure to capture a wide range of opinion. Just kidding. They weren’t. Like many papers, the “researchers” grabbed a bunch of men who were close at hand and hoped for the best. That is, these young American men were assumed to have the same tastes as Vietnamese, Yemenese, Chinese, Siamese, and other-eses, of all age, economic, social, etc. backgrounds. The same comments can be made about the women. Did they all speak English? Have the same accent? Did they vary their tone to fit the circumstance? Etc. etc.

The chance that the sample used was representative of all humans? The words “near zero” come to mind. And we haven’t even begun to ask why just five English vowel sounds would be representative of all sounds, nor how the manner of speech and the words used are mixed up in how attractiveness is rated.

A statistical graph is then shown, which I do not have the heart to reproduce. It is a scatterplot, showing the relationship of the frequency of the spoken vowel sounds with the attractiveness rating. A straight (and curved) line is drawn through the points. It is said to be “statistically significant.” This means that the p-value of the slope of the regression line is less than 0.05. What is a p-value? It says that the chance of seeing a statistic (which is a function of the estimate of that slope) larger than the one we actually got given the experiment were an infinite number of additional times and given the slope is actually 0. Yes, complicated and confusing. That’s classical statistics for you.

But, statistically speaking, the line is crap. Pick a frequency, a low one, like 180-190 Hertz. Attractiveness ratings range from just over 2 to about 5.5, just the same as they do for higher frequencies. For one high frequency lady (about 250 Hertz), the attractiveness was low; and there were far fewer high pitched sounds to sample from. The researchers have made the common error of conflating the “statistical significance” of the parameter (slope) of the regression line with the actually difference in observable attractiveness ratings. We do not see which women was which — some women’s voice might have been better than others regardless of pitch. To prove that the reporter does not understand what he has just seen, he repeats “Higher-pitched voices are more attractive” right after discussing the graph.

There are other problems with this graph. There are about 123 numbers. Fine. There were 123 ladies. But each recorded 5 sounds, and there were 10 men. Shouldn’t there be 123 x 5 x 10 = 6150 points? Because there are not, it means some prior data manipulation has taken place. A summarization has been done (the mean of attractiveness per women). I have no idea how this summarization would effect the results. Nor would the researchers, because it would require some sort of model, which is not present.

The researchers knew at least some of their limitations because, as the reporter reports, they asked “[W]ill simply raising the pitch of a female voice make it more attractive, or are there other factors involved?” So they picked “three groups of five voices…:five low-pitched, five medium-pitched, and five high-pitched.” After this— I swear this is true—“a computer program was used to artificially raise and lower the pitch of each of these voices.” Then “volunteers listened to high- and low-pitch versions of each voice and indicated which was more attractive.”

Again, they claim that higher pitched voices were picked slightly more often as being attractive (another bad statistical graph is shown).

But wait a minute. A “a computer program was used to artificially raise and lower the pitch of each of these voices”? Yes, “”a computer program was used to artificially raise and lower the pitch of each of these voices.”

Good thing that was the end of the article, because I couldn’t take any more. From 123 women, just 15 (that’s “three groups of five voices”) were supposed to represent all human females everywhere. These 15 voices were changed by a computer algorithm to sound how a computer programmer thought they should sound. After they made the voices sound like they wanted them to sound, they asked other people how they sounded. Oh, good grief.

If the overall finding is that “men prefer high-pitched voices”, then I would say that it is true, but everybody already knew it was true because everybody knows women have higher-pitched voices than men, and that men prefer women (generally), therefore they must prefer higher-pitched voices (generally).

But just writing that into a paper will not get you published.


  1. Matt:
    Would a better test be to explore the revenue records of those ladies(?) on the 900 chat lines at night? Clearly we would not be limited to 10 listeners. One could then correlate revenue with pitch. This would also have the redeeming feature of potentially increasing the revenue stream of these companies by helping the select more productive employees. Of course this may mean we have to reframe the hypothesis to include “sexiness” rather than “attractiveness” — perhaps this is where Jessica Rabbit comes in.

    By the way, the R2 on the chart looks pretty lame. Was it reported in the article?

  2. Clearly the study doesn’t hold up to scrutiny. However, as a male with a bass voice, I’d like to think that women strongly prefer lower pitched voices. Sadly, there’s nothing in my life experience that confirms it, and look at all the adoration the tenors get. 🙁

  3. Briggs

    July 7, 2008 at 1:48 pm

    I do not love, and do no advocate using, R2 as a measure of model goodness.

    It measures how “good” that straight line is with the observed data compared to estimates of the observed data made from plug-in estimates of the unobservable coefficients.

    In this case, it would not be interesting as the picture shows the model is nearly useless. Quantifying that is not needed and should be discouraged because it gives an over-inflated sense of model’s true worth, even for models as bad as this.

  4. Point well taken, but does this mean that judgments about a random scatterplot should simply be that the implicit or explicit underlying model needs to be better defined? That, plus exclusion of “trivial” models, would certainly change how many models and research studies are concocted in the social sciences.

  5. That paper wold get a passing grade in my high-school science class. Goodness know how it actually gets published and picked up by mainstream media.

    I can’t believe what passes for science these days.

  6. Beauty is in the ear of the belistener. One man’s coo is another man’s screech.

Leave a Reply

Your email address will not be published.


© 2017 William M. Briggs

Theme by Anders NorenUp ↑