I will try to be as clear as I can.
What would be true would be the statement:
- If vote miscounts favored either candidate, that candidate would have done better than he would otherwise have done.
That statement is so obvious, no-one would dispute it.
And we have a hypothesis:
- That the exit poll discrepancy was caused by vote miscounts.
If so, vote miscounts that caused the exit poll discrepancy would also cause Bush to do better
than he would otherwise have done.
The problem is that we don't
know how well he would otherwise have done i.e. in the absence of miscounts. This is where 2000 data becomes relevant to the problem.
Precinct level discrepancies in 2000 were, on average, near zero, although there was plenty of error, as in 2000, in both directions. However, in 2004 the average precinct level discrepancy was far from zero, the margin between the candidates in the exit poll at each precinct tending, on average, to favour Kerry more strongly than the margin in the precinct count. Jonathan Simon, I believe, coined the term "redshift" for this effect - the vote tended to be "redder" than the exit poll.
There was also an average "red-swing" in Bush's counted vote-share between the two elections. Precincts tended to be "redder" in the 2004 count than in the 2000 count, just as they tended to be "redder" in the 2004 count than in the exit poll.
If, therefore, some of the variance in "red-swing" was due to the same cause as some of the variance in "red-shift"- e.g. vote miscounts favouring Bush - the two phenomena will
tend to be positively correlated. "Red-swing" in a fraudulent exit poll precinct will tend to have an associated with "red-shift" in the count. This will not always happen, of course. You might get some precincts in which there was fraud favoring Bush, but not enough to improve his vote-share (say it was a precinct particularly disgusted with his performance). You might also get precincts in which there was no fraud, but Bush nonetheless did better than his average swing. However, if, overall, we found that red-swing was correlated with red-shift, that would be very suggestive of fraud. It would be telling us - aha! Bush is doing best where the exit poll is most discrepant! And he does worse where the exit poll is OK! It wouldn't
prove fraud but it would be strong support for the hypothesis.
However, in both Ohio, and in the nationwide sample of precincts, the two turn out NOT to be signifantly correlated. This does not rule out fraud. The fraudsters might have got lucky, and happened to have executed their fraud where Bush was doing badly relative to his average anyway. Or they might have been extremely clever and done that on purpose. Or they might have executed exactly the same amount of fraud everywhere so that it had no
variance. For various reasons I think these scenarios are unlikely, but they are theoretically possible.
Or, as I said, they might just have got lucky. In Ohio this certainly can't be ruled out, because the study does not have very much statistical power. All we can say is that the failure to find a correlation between redshift and redswing
does not support the conclusion that a shared variable - fraud - was responsible for both. It might have done. And in any case there are many forms of fraud that might not have shown up in those 49 exit poll precincts.
However, in the nationwide sample there are so many precincts (1250) that the probability that more than a tiny proportion of variance in "redshift" is accounted for by variance in "redswing" - ie. that the two phenomena shared a common cause, fraud, is very small. Which still leaves open the possibility that some of the redshift was caused by fraud, perhaps in precincts where it also occurred in 2000, and also the possibility that some clever means of calibrating fraud precisely to the expected level of Bush's performance was used. However, in the modelling exercises that OTOH and I have done, we have not been able to find anyway that this could have happened on a substantial scale unless the fraudsters were either extremely lucky (i.e extremly low probability), or had control of a vast majority of precinct counts, which for various technical reasons, also seems implausible (tabulator control won't work as most of these data are from the precinct counts).
My own view of Kathy's take on these analyses is that she has misunderstood the nature of the hypotheses and the nature of the conclusions drawn. The fact that she misquotes both suggests this. She seems angry that Lindeman and I object to her paraphrases, presumably because she cannot see that her paraphrases and our actual statements are crucially different. However, they are!
(Edited to correct error)