Democratic Underground Latest Greatest Lobby Journals Search Options Help Login
Google

Question for the Math Wizards

Printer-friendly format Printer-friendly format
Printer-friendly format Email this thread to a friend
Printer-friendly format Bookmark this thread
This topic is archived.
Home » Discuss » Archives » General Discussion: Presidential (Through Nov 2009) Donate to DU
 
Liberal Gramma Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-04-04 09:26 AM
Original message
Question for the Math Wizards
Can someone explain to me the least squares linear regression lines method of calculating the eventual electoral outcome on the Electoral Vote Predictor? The outcome predicted seems to be a landslide for *, something I can't see happening. Is there a flaw in their reasoning?
Printer Friendly | Permalink |  | Top
spooky3 Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-04-04 09:27 AM
Response to Original message
1. how bout a link?
Printer Friendly | Permalink |  | Top
 
Liberal Gramma Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-04-04 09:29 AM
Response to Reply #1
2. Here
Printer Friendly | Permalink |  | Top
 
Goldmund Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-04-04 09:36 AM
Response to Reply #2
7. Highlight Rhode Island in that map
and you'll see how credible it is.
Printer Friendly | Permalink |  | Top
 
yellowcanine Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-04-04 10:35 AM
Response to Reply #7
20. Good point - Bush 67, Kerry 31 in RHODE ISLAND? LOL.
Printer Friendly | Permalink |  | Top
 
Democrat 4 Ever Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-04-04 09:30 AM
Response to Original message
3. BoyGeorge would have got into a brain fart with just the first
twelve words of the question. Never mind he couldn't get any further, parse the sentence or even being to try to answer. That in a nutshell is what is wrong with Bushie.
Printer Friendly | Permalink |  | Top
 
htuttle Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-04-04 10:49 AM
Response to Reply #3
22. I'm sure Bush could just have one of his advisors 'explain' it to him
They would helpfully explain that it means he should invade Syria.
Printer Friendly | Permalink |  | Top
 
JuniorPlankton Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-04-04 09:31 AM
Response to Original message
4. They essentially assume
Edited on Mon Oct-04-04 09:32 AM by JuniorPlankton
that the trend will continue. That's the biggest problem.

You take the polls and try to fit them on a line. Based on the results of the last few months, the * numbers slope upward, Kerry's downward.
Therefore, they assume that if the trend continues, *'s edge will increase even further.

There are many problems with this approach:

1. It's not fair to assume the trend continue
2. The explanatory power of those regressions (R squared) must be very low.
3. I don't believe the raw poll numbers to begin with; we know a number
of biases embedded in them (Gallup, anyone?). Therefore, as the modelers say: Garbage In, Garbage Out.
Printer Friendly | Permalink |  | Top
 
Liberal Gramma Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-04-04 09:34 AM
Response to Reply #4
5. Thanks
Now I feel better
Printer Friendly | Permalink |  | Top
 
theorist Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-04-04 10:11 AM
Response to Reply #4
16. Also note that the gap is within the margin of error.
Another huge problem is that the margin of error is squared when the linear regression is calculated. In other words, this method will amplify any systematic errors in sampling, which has been discussed at length here on DU.
Printer Friendly | Permalink |  | Top
 
alarcojon Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-04-04 09:35 AM
Response to Original message
6. Least Squares
Edited on Mon Oct-04-04 09:37 AM by alarcoeg
is just a method for selecting the "line of best fit" through a set of points. Specifically, it computes the line for which the sum of the squares of the vertical distances from the given points to the line is minimized.

on edit: JuniorPlankton explains well the problems with this approach with electoral polls. Garbage in, garbage out, no matter how sound the statistical methods applied.
Printer Friendly | Permalink |  | Top
 
RafterMan Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-04-04 09:38 AM
Response to Original message
8. Okay
Draw a bunch of data points, scattered up and down the y-axis (vertical), with only one point per point on the x-axis (horizontal). For example, each x-axis increment represents a date, and each y-axis point represents a poll result on that date.

Now draw a straight line through the points. But, you say, the line must be straight and the points don't all fall in a line. So somehow, you want to draw the best-fitting straight line you can, even though you can't touch all the real sample points.

So construct a line where the sum of the distance (squared) between the line you draw and the actual sampled point is a minimum. Using squared distance instead of regular distance removes the negatives, and ensures a greater penalty for longer distances. You'll get something that looks like it might have been the "true" line, with the measured data points scattered randomly away from it. I'm sure you've seen pictures. You predict the trend by continuing the straight line beyond the points for which you have actual sampled data.

Not having seen what your data points relate to, I can't comment on how valid it is. But remember that whatever it's tracking, the method only predicts in one direction. You can only say "it's headed up" or "it's headed down". You can't predict a switchback or reversal, which of course can come easily in a campaign as we've seen.
Printer Friendly | Permalink |  | Top
 
spooky3 Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-04-04 09:40 AM
Response to Original message
9. I am very confused--so the bottom line is I can't explain it and
Edited on Mon Oct-04-04 09:44 AM by spooky3
I am very suspicious of it. Here are some of my questions.

The author doesn't say what the X is. What is s/he using to predict the Y's? (The Y's are said to be the % a poll reports favoring a candidate--so what is predicting this?) And why not report all the equations, including the betas and the variance explained?

At what level are the data collected? Is s/he tracking Person X's opinion at time1, time2, etc. and his/her eventual vote? Or are the data aggregated somehow (into percents favoring Kerry for example)? It makes a big difference.

Once we establish that, my next question is, why isn't s/he using time series analysis?
Printer Friendly | Permalink |  | Top
 
archineas Donating Member (171 posts) Send PM | Profile | Ignore Mon Oct-04-04 09:54 AM
Response to Reply #9
10. two thoughts
data that is strongly correlated will have a much better line of best-fit. let's consider for a moment the data being considered. the independent variable would be the date and then dependent on that date would be the polling number.

it doesn't make much sense logically to try and correlate those, so perhaps our line of best fit isn't a strong fit.

as a mathematician, i'd be more inclined to try fitting a curve (a much more difficult procedure...it's called a spline). i'd also look at changes over a much smaller period of time as peoples' opinions in july are not necessarily what they are today. over what periods do we see great shifts? was bush's good august and september leading up to a weak october, while kerry is showing numbers which are strengthening?

i stopped checking out the electoral-vote.com site because i'm not confident in the polls being cited. i live in ohio, and recently gallup showed kerry up by 3-4 points while polls done for local news outlets show differing results.

the webmaster notes that the data is 'noisy', so in many places there is just no consistency to what you're seeing. if you look at a place like massachusetts, which has been pretty consistently kerry, it will probably stay that way. the same would hold true for *. with any state that's shown mild to severe variance over recent weeks, i'd be less inclined to believe any numbers until there's some consistency.

truthisall uses monte carlo simulations to determine a probability, and i would equate electoral-vote.com to a probability as well. what ends up happening at the end of the election will go a long way in determining which of the two has had the more sound scientific approach to determining the appropriate outcome.

j
Printer Friendly | Permalink |  | Top
 
alarcojon Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-04-04 10:19 AM
Response to Reply #10
18. Welcome to DU
fellow mathematician!
Printer Friendly | Permalink |  | Top
 
Teaser Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-04-04 10:02 AM
Response to Original message
11. I can't explain it to you...
without some serious calculus, but it basically relies on using the minimized squared error between a fitted line and the actual data.

Problem is, it is a linear projection. If the momentum shifts at any point, you need to recalculate your fit.

And folks, the momentum just shifted.
Printer Friendly | Permalink |  | Top
 
Guaranteed Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-04-04 10:06 AM
Response to Reply #11
13. Exactly. nt
Printer Friendly | Permalink |  | Top
 
yellowcanine Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-04-04 10:05 AM
Response to Original message
12. Yes, using linear regressions to predict an outcome beyond the last
data point is an invalid use of the technique. So using a regression over time to predict a future event is always invalid - for good reason. Trends can change. With the race as close as it is in many states, the true state of affairs is not known as many state polls are within the margin of error. Also, all of the polls are treated equally, yet we know that some polls must be more valid than others.
Printer Friendly | Permalink |  | Top
 
BootinUp Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-04-04 10:09 AM
Response to Reply #12
15. without a huge caveat
quite correct.
Printer Friendly | Permalink |  | Top
 
Longhorn Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-04-04 10:07 AM
Response to Original message
14. I concur with the explanations; however,
the Votemaster himself has said that the predicted map won't make a lot of sense until October since there is a lot of "noise" -- wide variance in the polls -- beforehand.
Printer Friendly | Permalink |  | Top
 
Goldmund Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-04-04 10:13 AM
Response to Reply #14
17. It's October isn't it?
Printer Friendly | Permalink |  | Top
 
RafterMan Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-04-04 10:28 AM
Response to Reply #14
19. Still nonsense
The problem isn't "noise", it's that the data are non-trending.

It isn't noise in the polls, it's shifting in response to external factors. The poll for tomorrow only depends on the poll for today to the extent that polls are what's driving the election. Even the most cynical wouldn't say that accounts for the greatest portion of the day-to-day change. Even to the extent that polls *do* drive the election, it is only the most recent ones and they can only be projected so far without becoming nonsense.

You can draw a trendline over anything, but that doesn't make it valid.
Printer Friendly | Permalink |  | Top
 
pmbryant Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-04-04 10:49 AM
Response to Reply #19
21. Yep. Total, absolute nonsense.
The fellow using this technique assumes that all poll movement is linear, at a constant rate. There is no basis for such an assumption. In fact, common sense indicates that this assumption is false.

Candidates go up and down in the polls over time, at various rates, not just because of noisy poll data, but because of real changes in popularity. I doubt anyone disagrees with this. No linear trend can model such back-and-forth changes. And it is statistical malpractice, in my view, to pretend otherwise.

--Peter
Printer Friendly | Permalink |  | Top
 
ISUGRADIA Donating Member (1000+ posts) Send PM | Profile | Ignore Mon Oct-04-04 12:42 PM
Response to Reply #21
23. Any analysis which puts Bush getting only 53% in Kansas
and 55% in Maryland is questionable to say the least.
Printer Friendly | Permalink |  | Top
 
DU AdBot (1000+ posts) Click to send private message to this author Click to view 
this author's profile Click to add 
this author to your buddy list Click to add 
this author to your Ignore list Tue May 07th 2024, 04:50 AM
Response to Original message
Advertisements [?]
 Top

Home » Discuss » Archives » General Discussion: Presidential (Through Nov 2009) Donate to DU

Powered by DCForum+ Version 1.1 Copyright 1997-2002 DCScripts.com
Software has been extensively modified by the DU administrators


Important Notices: By participating on this discussion board, visitors agree to abide by the rules outlined on our Rules page. Messages posted on the Democratic Underground Discussion Forums are the opinions of the individuals who post them, and do not necessarily represent the opinions of Democratic Underground, LLC.

Home  |  Discussion Forums  |  Journals |  Store  |  Donate

About DU  |  Contact Us  |  Privacy Policy

Got a message for Democratic Underground? Click here to send us a message.

© 2001 - 2011 Democratic Underground, LLC