 32 CommentsI made a mistake when I wrote about wishcasting in the McCainObama presidential election.
I’d like to thank Patrick Hadley for keeping up his criticisms which led to me to see more clearly how to make the correct calculations. Mike D and JH also contributed, and Luis Dias put his finger on the problem. And I’d like to apologize to readers for misleading them.
The mistake I made can be illustrated with an example.
Let’s first remember that all probability is conditional on information or evidence. For example, I_{dice} = “This is a die, just 1 side is labeled 6” and if we want to calculate the probability of seeing a 6 on a throw, we have to condition on the evidence I_{dice}. Thus
Pr(See 6  I_{dice}) = 1/6,
and Pr(See “not 6”  I_{dice}) = 5/6, where “not 6” means 1, or 2, etc.
Now suppose I ran a poll where I asked a bunch of people “What number do you think will show on the next roll?” It turns out there are two groups in my bunch: regular guys off the street (GOS) and degenerate gamblers. Both groups naturally know I_{dice} and should base their guess on that information. We should, therefore, see the following results if both groups used the information:
The probability of seeing an outcome given a group:


What shows 


Not 6 
6 
Want show? 
GOS 
83% 
17% 

Gamblers 
83% 
17% 
That is, regardless of which group we consider, 83% of its members should guess a number 15, and 17% should guess 6, if they used I_{dice}. (We can continue adding as many groups as we like, but they should all break down 83%/17%.)
But suppose when we ran the poll, we saw this table:


What shows 


Not 6 
6 
Want show? 
GOS 
83% 
17% 

Gamblers 
50% 
50% 
Something has gone wrong with the gamblers. They have, as a group, evidently wishcasted the 6, to the tune of 50% – 17% = 33% bias.
The way we arrived at this number is the same way I arrived at the McCain/Obama numbers.
McCainObama
We can now see that to calculate the amount of group wishcasting requires knowing the conditioning information, like the I_{dice}. What did I use for the McCain/Obama information? I effectively used I_{79/21} = “The probability of McCain winning is 79%” because that was the overall vote in the sample we had (for why I used that, see below).
This information averages over both the McCain and Obama groups and perhaps does not adequately account for the amount of wishcasting in each group. I say “perhaps” because, of course, I_{79/21} might be correct. Why?
It would be better to write the information as I_{convention}=”All the relevant information available immediately after, but not too long after, the convention.”
Now, if I_{79/21} = I_{con}, then everything reported Friday was correct. But it can be argued, persuasively I think, that I_{con} would imply something less extreme than the 79%/21% breakdown. How much less extreme?
I can remember having conversations at that time with Obama supporters (and in New York City where I live, that’s about all there were), and many at that time thought that McCain would win. I recall hearing a gloomy lecture from two docs. One said, “I think race will be a factor in this election.” I said, “Yes, many people will vote for Obama because he’s black.” They took the opposite view. Anyway, point is, these two buttonwearing, rallyattending Obama supporters thought McCain would win. They were not atypical. Many people at the time of the conventions thought McCain had a good chance. That would all change, of course, as the campaign played out. Those two docs, for example, would later “forget” they had ever said McCain would win.
So what was the exact I_{con}? I don’t know, and nobody else does, either. But here is a graph of various possibilities. The xaxis shows the probability of a McCain victory implied by I_{con}: Obama’s probability is 100% minus this, of course. The red dots show the amount of wishcasting bias on the yaxis in the McCain group; the blue dots show the amount of wishcasting bias for the Obama supporters.
The vertical red line is at the sample we got, 79% (21% for Obama). It shows that the McCain group had a 10% wishcasting bias, and the Obama group had a 50% bias.
The blue line is at the 50%/50% split. The McCain people had a 40% bias at that point, and Obama group had 25% bias. But the 50%/50% split is probably too low in favor of McCain—at the point of the end of the Republican convention when this poll took place (see below about national polls).
The black line is at the point where the two groups had about the same amount of bias: about 32% at the 57%/43% split. That point might be, I think it can be argued, a reasonable interpretation of I_{con}.
There are two points where no wishcasting bias would be present in the two groups: I_{89%/11%} would give 0% bias for the McCain group, and I_{25%/75%} would give 0% bias for the Obama group. I don’t think either information set can be defended.
So, overall, both groups wishcasted, and both probably did so at the same amount. We can’t tell for certain because we can’t know or reliably estimate I_{con}: it is a matter of interpretation (I can well imagine my two doc friends now saying something I_{con} = I_{5%/95%}; whereas at the time they were estimating I_{con} = I_{60%/40%}).
What stands?
The breakdowns by age, philosophy, and sex are still accurate in terms of direction or nonmovement of the wishcasting bias. For example, Liberal Obama supporters, regardless of I_{con}, still had a larger bias than Conservative Obama supporters. As did older over younger Obama supporters. It is only the exact bias numbers that are uncertain.
How did it happen?
I wanted to get the results out quickly, and I was lazy. Plus, using the observed “marginal” I_{79/21} is standard practice in calculations of independence, so I used it and forgot about it. Then I got distracted by my real job and by some criticisms—well meaning criticisms—that were wrong, and I figured if they were wrong I must right. Not a good use of logic, that. This has taught me to take my time in the future.
A criticism that is wrong is that “the sample was skewed by McCain supporters.” That is false because it does not matter how many McCain supporters I had, as long as they were not wishcasting, just like in the gamblers/nongamblers example. But given that some of the McCain supporters were wishcasting, then I could not disentangle how much they were, or how much the Obama people were, or if either group used information other than I_{79/21}.
Some other criticisms focused on the fact that it not possible to separate desire from prediction. However, this is certainly possible. Polls were mentioned. But polls are surveys where the question, “Who do you want to win” is asked, and not “Who do you think will win?” So you cannot back out wishcasting from a poll, nor use its results because of that.
Normally, one calculates wishcasting by taking a series of forecasts from one person, or groups; but it’s the series that counts. Because if you have a series you can use it to back out what the information I should have been. Since you can estimate the information, you can estimate the wishcasting.
I will try, if I can find the time, and set up a series of polls, for nonpolitical topics, to show how this can be done. Say, that gives me a good idea…