How To Use A Poll To Predict Election Winners

The Scott Brown, Martha Coakley race is over. The results are not surprising nor undesirable. This is because anybody who dares call another—close your eyes if you cannot stand foul language—a Yankee fan deserves to lose.

Incidentally, congratulations, Mr 41. Running for President, are we?

How did the pre-election polls stack up? Not too badly. They were useful and accurate on average. But what do “useful” and “accurate” mean?

Polls can be viewed two ways. The first is as a snapshot of opinion. That is, as a survey of voters’ thoughts given the information known at the time the poll was taken. It is understood that voters’ opinions might change if the information changes. For example, polls are often used to tell candidates what information to emphasize or to offer—or to withhold!

A poll is also a prediction. It is a guess of who will win—again, given the information available to the voters at the time the poll was taken. The information the voters know might, of course, be real or imagined.

This note explains how to use a poll’s, or multiple polls’, results to predict a winner. Before the election, of course. This is just a sketch, as a complete explanation could fill a book.

Start with an example. The PPP poll of 17 January showed a 51%/46% split for Brown/Coakley (the remaining 3% were undecided or for the third candidate). This did not mean that there was a 51% chance that Brown would win. It also did not mean that there was a 100% chance that Brown would win.

The real chance—given the poll results—of Brown winning was between 52% and 100%. How could we have guessed that chance?

First, each poll is accompanied by a “margin of error”, usually three to five percent. These numbers can largely be ignored if they are the result of theoretical equations in the statistical theory of finite population sampling. If they are from the result of a model, then they can be informative (see below).

The “margin of error” in any poll is meant to mean something like this: There is a 95% chance that the actual fraction of people who vote for candidate A will be 51% plus or minus 4%; or 47% – 55%. The “95%” is never spoken; but it is implicit. It can be variable, say “90%”, but it never meant to be “100%”—which would mean the poll is guaranteed to be perfectly accurate within the margin of error.

What we wanted to know is this: What is the probability that Brown will have an actual vote fraction of 50% or greater? (In three- or more-party races, this fraction can change; this also ignores specialty rules of run-offs, etc.) Or, What is the probability Brown wins? This probability is always conditional on the information available to the voters at the time of the poll. And conditional on the mechanics of the poll, processes which we’ll ignore today (but see this link).

The only way to produce a probability of winning from a poll is to create a statistical model. The easiest way to do this is to combine polls from multiple sources, like the website 538.com did.

They first used a (proprietary) regression model, which gave a 15-point advantage to Coakley. They also created an average of third-party polls (like PPP, Rasmussen, etc.). This average (or mean) is also a model, albeit a simple one. They then combined their regression model, with the mean model, to produce a final mixed model, and that was used to calculate a 74% chance of a Brown victory (on the day before the election).

The 538.com model also wisely used polls from the same sources (like Rasmussen) through time, which acknowledges that the information available to voters changes. The most recent polls were given the most weight.

More difficult to model are the results of just one poll. If, for example, Rasmussen had several polls through time, then this is similar to the situation of having polls from multiple sources. All the polls are fed into the model, which can be recalculated as each new poll comes in. Presumably, the model improves as time progresses.

But if the organization only has one poll, they must rely on their past performance of similar polls to create a prediction model. Or they must incorporate information external to the poll; such as the unemployment rate, weather, or anything that is deemed probative.

In either case, several polls through time or just one poll, previous performance on similar polls must be used to create a model. A predictive model must take as input prior poll numbers married to their actual outcomes, plus information on the length of time before the election of the poll, the geography, and so forth. But it is the past performance—the difference between the poll and the actual fractions—that is more important. Incidentally, those past performance are what should be giving the margins of error.

Finding the best information to create predictive models is an art, which is why most polling firms consider their processes proprietary. Given the importance of polling in modern politics, these processes can be extremely valuable.

Briggs

1. Doug M says:

These are places where punters can wager on the outcomes of elections. They do not offer any new information in themselves. They show how the marginal gambler has interpreted the available data.

Briggs writes:
The Scott Brown, Martha Coakley race is over. The results are not surprising nor undesirable. This is because anybody who dares call anotherâ€”close your eyes if you cannot stand foul languageâ€”a Yankee fan deserves to lose.

Well I’m in MA and I’m a Yankee fan. But being that as it may, she did, of all people, called Schilling, the infamous Red Sox Pitcher with the bloody sox in 2004, singularly handed the Red Sox the first WS Championship after 86 years of drought, a Yankee fan?! Granted she’s not a sports expert, you still need to get to know the people’s likes and dislikes (even if you have no interest, at least appear to do so!) you’re trying to con to vote for you. Even those who are promoting your opponent! There’s a reason why MA is called “Red Sox nation”. A question was posted yesterday —Â How would Democrats explain a loss of Ted Kennedy’s Massachusetts Senate seat?

Here’s one response:

Alex in Seattle writes:
No party nor any one family “owns” a US Senate seat. The Dems lackluster candidate and her low energy campaign may very well loose. If this was such a “must-win” race she should have worked harder and hired a sports adviser.

😀 Coakley just thought she’d cake walk this election; after all historically MA is one of the Democrat’s strong hold. NOT!

On the flip side of the coin, there’s some truth on her assumption. Over a month ago, no MA resident could have foreseen the results of this election. History does have its effect. There were even those stating that it was unconscionable to fill, what has been coined “Ted Kennedy’s Senate seat”, with a Republican. I haven’t been in MA long enough to understand that logic, but prior to last night’s even, it did seem that history did have a strong hold on Massachusettian devotion (look at how they’ve never given up on the Red Sox after an 86 years drought!). You admitted as much when you stated, “But if the organization only has one poll, they must rely on their past performance of similar polls to create a prediction model.” If we were to reply upon past performance, would not last night’s result be a shocker? But then again, how far into the past should we rely on, to project our prediction? Here’s some interesting analysis from the NY times. Who would have thought a historically blue state turning red in a year’s time?!

3. 49erDweet says:

“Or they must incorporate information external to the poll; such as the unemployment rate, weather, or anything that is deemed probative.”

In my view Matt nailed the uncharacteristic MA voting result yesterday right there.  In addition to the assumptive candidate’s lackluster campaign performance, voter resentment of a legislative body seemingly more concerned with tuning and restringing their personal violins than putting out the nationwide fires of economic and unemployment disasters brought about a temporary “recoloring” of the MA body politic.  I doubt it lasts more than 60 days.

Hi 49erDweet,
the “recoloring” of MA body politics may not last for more than 60 days, but the message MA residents wanted to get across, was received —that enough is enough with the shenanigans in DC.

5. 49erDweet says:

Jade: Hope springs eternal. I pray you are spot on. But the short term memory span of national pols is notoriously brief. Tomorrow in some quarters it may well be, “What message”?
Matt: Thanks for the explanation. I’d always wondered about the “percentage chance of winning” guesstimates.

6. Based on their perfomance at polling for this race, Rasmussen is batting .000. That doesn’t mean they will never be right in the future, but it ought to give pause to potential purchasers of their service. ARG was also 0-for-2, but PPP was 2-for-2. PPP ought to raise their rates, assuming the market for polls is logical.

Whatever model 538.com is using, they should junk it. I do give them props for honesty, though. Many outfits in the prediction biz do their best to hide their failures and tout their (occasional) successes.

7. Whoops, I mis-read the chart. ARG was 2-for-2, and so was Cross Target.

8. http://classactioncause.com is seeking individuals frustrated with out current elected officials. We seek to unite in numbers to bring about significant change and to end corrupt practices in our governments. Are you tired of bi-partisan bickering? Do you feel that our system is broken and have little hope that we can change our leadership and the lack of integrity? Are you sick of tax dollars paying for elected officials while they campaign?
Go to http://classactioncause.com and let’s ise our numbers as muscle to bring about change!

9. Briggs says:

Mike D.,

It’s funny, but you never see polling outfits touting their success rates.

We’ll have you converted yet.

10. VIC says: