Statistics

Random Samples In Polls Not Needed; Other Common Poll Errors

Pajamas Media—my new pals–conducted, or rather commissioned, their own poll in the Scott Brown-Martha Coakley race.

It found, after calling more, but reporting on 946 “likely” voters, Brown ahead by 15.4%, a whopping lead. Other organizations had Brown up by no more than a point or two, or even down by the same amount. The huge—it is huge—difference between PJM and their rivals needs explaining.

Some said that PJM’s poll was not “random”; several insisted on it. Nobody questioned this truth—that polls have to be conducted randomly.

They do not and should not be. The word “random”, as we have discussed many times, does not mean what most think it does.

Suppose what is not true: the PJM’s poll correctly identified 946 voters. They did so by asking this question:

Only a small percentage of all voters will cast a ballot in this Tuesdays [sic] special election for US Senate. How likely is it that you will actually vote in this election on January 19th? If you will definitely vote press 1. If you might or might not vote press 2. If you probably wont [sic]1 vote press 3.

The results were stated in terms of likely voters (those who pressed 1 or 2). This is common practice, but it is sumamente (I’m in CA and practicing my Spanish) misleading. Likely does not mean “will vote”, it means “intends to vote or might not vote.”

There are many dangers betwixt wanting to and actually voting. Not everybody who said they will definitely vote will vote. Some who said they will not vote will. And the fraction of those who said they might or might not will be variable.

This ambiguity is a large source of polling error. To estimate it would require re-calling the same people (all of them, not just the 946 who pressed 1 or 2) and asking whether or not they voted. And then hoping they tell us the truth.

Thus, the second source of error: people lie. They lie like dogs, they fib, they prevaricate, they make statements at variance with the truth. If you call a lefty (righty) and he suspects that the polling organization is right (left), he will more likely lie. People lie not just in saying they will vote for the opposite person, but they say they are undecided when they are truly not. They say they will not vote when they will and vice versa.

Sometimes, people misunderstand the questions and answer oppositely of what they truly mean. So bad questions and poor English—by either party—is a third source of error.

Of these three, the largest is in reporting on “likely” voters instead of on actual voters.

Then there who was sampled. Ideally, we would only sample people who will actually vote; since this is impossible, we are led to report on who said they would vote (or are “likely” to). But again, let us suppose we have only polled actual voters.

The idea is, if we have polled 1000 people, the fraction of these who said they would vote for Brown will match the fraction of people who actually vote for him. A fourth source of error is if some who said they would vote for one candidate change their minds and vote for another, a non-rare occurrence. Ignore this error, too.

Why does it feel wrong to just call up 1000 actual voters and report on the fractions found? Assuming we had a list of actual voters, why not randomly call 1000 of them? Randomly here would mean grabbing out phone numbers with no set plan, no method. We could just call the first 1000 on the list, right?

Oh, yes, we could. If our total information is that we had two candidates and we knew they were people who were going to vote, we could just call the first 1000 and have an accurate poll (ignoring the other error sources).

Why it feels wrong is because this is not our total information. We know more. We know that a small percentage of those will actually vote will not have a phone (or, if they do, they will not all be at home when we call; however, given the first scenario of total information, non-answers are ignorable), we know that there are geographic clusters of kinds of voters, we know there are other clusters by sex, age, race, and so on.

That is, we have further positive information that we should use in a non-random fashion to ensure that our sample mimics the actual population who actually vote. We want the same percentage of cluster members in our sample as in the actual population.

In other words, we want anything but a random sample. We want a controlled sample. The only way to tell whether the PJM poll is well done beforehand is to check that the clusters it used match what we expect (via history) the actual clusters to be. After the fact, it will be easy to see what went wrong, or what went right.

Random merely means unknown. There is nothing spooky or mystical about it. Grabbing a sample by rolling dice does not guarantee better results than by systematically picking voters.

If you’ve understood this, you should be able to answer why, if our total knowledge was solely “that we have two candidates” etc., why calling the first 1000 names in the list is just the same as “randomly” picking names from that list.

—————————————

1 I do this kind of thing so often, that I had to indicate that these times weren’t—note the apostrophe—my fault.

Categories: Statistics

18 replies »

  1. Matt:
    Nice piece.
    The order of the names has an unknown (or unspecified) relationship with whether they will vote or not – which is the most critical factor in determining who should or should not be in the sample. It is therefore the equivalent of a random ordering of the individuals on the list. One creates a random list by assigning a number to each person and then ordering the numbers in some systematic but arbitrary way that is unconnected to whether someone will vote. One could argue that in relation to whether a person will or will not vote, the names of the individuals are essentially random numbers ordered in an arbitrary and irrelevant way.

  2. Will be looking for a brief explanation of “confounding error” and how some pollsters condemn their work before the first sample is taken.
    .

  3. I usually lie to pollsters because I don’t think it is anybody’s business how I vote. Guess I should be up front and tell them that.

  4. A few questions

    1. Will the lead in “Only a small percentage of all voters will cast a ballot ….” influence the answer to that question?

    2. “We want a controlled sample. The only way to tell whether the PJM poll is well done beforehand is to check that the clusters it used match what we expect (via history) the actual clusters to be. ” Was that done? Did the poll use a “controlled” or “random” sample?

    3. Can one correctly conduct a poll using a random sample of people and then weigh the responses based on the cluster each person represents?

  5. Robert Burns,

    You know—off topic—I can’t stop thinking of a poem your namesake wrote whenever I see your comment. Title starts with “Nine inch”—and that’s as far as I can go on this family blog.

    1. Sure, it’s more likely that the fraction in the poll matches the fraction in the population as the sample approaches the population.

    2. I have no idea how much control they used. Polling organizations rarely, ever so rarely, release this information. It’s usually considered proprietary. The group that gets closest to the clusters does well, and therefore (ideally) their revenues increase.

    3. Yes, as long as you have a pretty good idea of the true percent of the clusters. But it’s fairly trivial, mathematically.

    49erDweet, bob,

    I usually just hang up instantly. The hope for pollsters is that nonconformists like me vote in the same way as conformists. Or we don’t, but they have a way to estimate our eventual vote (based on past data).

    OregonGuy,

    Amen. Many ways to confound.

    Bernie,

    And we forgot to say (I forgot) that we should only talk to people eligible to vote; i.e., the registered voters. Of course, people can lie about that, too. Right 49er?

  6. If you don’t ask the cluster questions, and if people lie, you can’t do the clustering either.

    I have worked on a few survey analyses. Phone calls, especially in the middle of the day, reach housewives, children, and unemployed people to a greater extent than they reach people who go to work. Not a broad sampling method, and tough to stratify.

    Same with surveys at malls, on the Web, etc. Respondents are essentially self-selected or the sampling is biased toward one demographic or another.

    The big pro pollsters deal with these problems with various corrective algorithms. A lot of research has gone into the art, mainly by marketing interests rather than political interests. “Will you buy product X?” is a better funded question than “Will you vote for X?”.

    Even so, what people say they will do and what they end up actually doing are two different things. New Coke is the classic example of poor sampling and analysis, but there are zillions more.

    Even exit polling, standing outside the polling stations and asking people who come out, “Who did you vote for?” has problems. The best measure is to count the actual votes, but if the chads are hanging or the little circles poorly colored in, even that method can result in controversial uncertainty.

  7. You need to design your survey and sample to reflect the precision of the answers that you need. In general people do not need really precise answers from surveys. This is just as well because to get really precise and meaningful answers takes a very large amount of money. In addition, I am highly skeptical about the real meaning of the margin of error that is frequently cited when survey results are reported. It suggests the data is far more precise than is justified given the quality of the data and the unmeasured sources of error. The level of precision strived for is also a waste of resources. In reality, candidates need to know which of three states apply:
    I am so far behind I should quit spending my husband’s/wife’s money and spend the weekend watching TV?
    I am close enough that it is worth taking out a second mortgage and working like a maniac?
    I am so far ahead that I should start finding those lobbyists who can help me recoup all the money I spent and begin to figure out how to run my opponent out of the State for saying such nasty things about me.
    Given the above – if by the last week or so, you are within 10% of your opponent – work like a maniac. If you are 15% behind – check out Monster.com. If you are 15% ahead, start lining up those lobbyists.
    The survey and sample design should be sufficient to determine where you stand. IMHO, Scott Brown and Martha Coakley are both looking for second mortgages and working like maniacs!

  8. The question that needs asking is why does anyone bother with these type of polls? It can be argued that election poll results can influence voters on polling day so should they be done?
    And given that some/many people don’t tell the truth anyway the results can be pretty meaningless.

  9. a couple of responses / questions to those statistically more savvy than I.

    1. the question of those who lie to pollsters: assuming that there is no systematic bias amongst these thrill seekers, wouldnt they essentially cancel each other out in a large enough poll ?

    2. Robopolling seems to be a realtively inexpensive way to poll; I would assume that the major expense and art is in defining and validating core assumptions, in the way the poll is conducted, not in the actual act of having the robot make the call. So why not use a larger “n” ?

    3. I predict a realtively large margin of victory for Brown. I would make that fuzzy extrapolation
    a. not from any actual single pol, but from the trend lines and the rate of change therein.
    b. I think many voters who want somethings to change ( talk of an overused word) are so un-confident of their vote making any impact, that they stay home, this and the enthusiasm gap makes it more likely that the brown voters come out and vote as opposed to Coakly voters.
    c. Political correctness/ spouting the appropriate progressive pablum – has largely replaced ” manners” within the gentiles. It is fashionable to be this way ( see – stuff white people like). So expressing or even having unprogressive thoughts invites social and personal disapprobation. it is looking increasingly like this particular “Berlin wall” is breaking down.

  10. Vic,

    1. Without going into the deep theoretical math, the basic rule is that error aggregates and inflates the variance — it does not cancel itself out.

    2. The people who talk to robots are not necessarily representative of the rest of us.

    3. Wishful thinking is not science.

  11. Speaking of lying to pollsters, when I was in my last year of medical school it came time for us students to help select the top 10 students overall. So I decided to vote for the 10 least likely deserving, in my experience. Well, 5 of them ended up in the top 10 overall. Pretty good, eh? The funny thing was that the other 5, I would have expected to be in the true top 10.

  12. Several years ago, one of local radio stations had their political horserace reporter commenting on the results of a poll conducted on a hotly contested Gubenatorial race. At the end of the interview, the show’s host asked the reporter how the sample was selected. The reporters response is etched in my memory for it’s “groan” value: “I’m not sure, exactly. But I’m sure it was scientific. Whatever it was, I can assure you it wasn’t random.”

    Still worth a groan after all these years. 😉

  13. J Peden:
    Your “non-authentic” behavior is not unusual among survey respondents and is a major source of “error” that is very difficult to estimate and correct.

  14. All I can say is I’m glad I’ll be in town to vote for Brown! :ob I’ll gladly vote for him to annoy the folks around here who thinks it’s sacrilegious to vote non-democrat for “Edward Kennedy’s seat”! Good grief! Get over it folks, he’s long gone! Move on!

  15. I don’t know if you know the history of the MA senate seat, but apparently when Kerry was the Democrats’ 2004 presidential nomination, it was his candidacy that led Democrats lawmakers to “fix” the state’s law on how a senate seat is to be filled if the current presiding senator must vacate the seat. You see, MA law concerning this issue used to be like NY Law — that it is left to the discretion of the state governor to select a senator to fill the seat for the remain term. But in 2004, MA’s state governor was Republican Mitt Romney! So has this article explains:

    By now, everyone knows the partisan past that’s prologue here. Beacon Hill Democrats, urged on by none other than Kennedy himself, changed the law in 2004 to establish a special election. It was a power grab, motivated by their desire not to let Republican Mitt Romney appoint someone for the last two years of John Kerry’s Senate term if Kerry won the presidency.

    Democratic lawmakers changed the process by establishing a special election to fill a vacant Senate seat. They also voted down a GOP amendment to allow for a interim appointment while that election took place. In 2006, Democrats rejected further Republican attempts to provide for an interim appointment.

    As the article notes, this very law that Kennedy promoted, became a problem for him last fall, knowing that he was near death prior to the voting of the healthcare bill (which you know was his pet project). Because the special election that Democrats passed would not happen until today, would leave the MA senate seat empty between the time of Ed Kennedy’s death until today. As you know the healthcare vote was crucial and was ongoing even after Kennedy’s death. So like hypocrites, the Democrat lawmakers and democrat governor Patrick, (very quickly I may add!) made an amendment to the law (which btw was the GOP amendment to the law back in 2004 and of which Democrats then voted down) that gave the governor power to select an interim appointment before today’s election, which of course was filled by another Democrat. And because of this Democrat shenanigan of the state law (of which was of their own making!) the Healthcare bill passed on Dec. 24.

    So now, we find ourselves voting today to fill that senate seat according to the law Democrats created back in 2004. But now that same law is going to bite back its creators! Democrats were so certain in the fall of 2009 that a Democrat would be voted to that seat, but apparently the people of MA think differently! And with the healthcare still at stake, this event is priceless! This is what happens when you create laws that are clearly motivated by political scheming … it can surely bring about your own demise! I sure hope Brown will win tonight, as lesson to the Democrats! 😀

Leave a Reply

Your email address will not be published. Required fields are marked *