Polls Explained – Digby's Hullabaloo

A reader friend by the name of Sean Kelly, a scientist, seeing me flail about trying to deal with polling, offered this explanation and agreed to let me share it with you in case you are feeling the same way. It cleared a few things up for me:

So. Political polling. It is basically experts making data informed guesses, and the reported margins of error are pretty much meaningless in context. Experts making informed guesses are a useful thing, and the best realistic option, but poll results are not scientifically rigorous things. Let me explain some of that context.

The reality of political polling usually involves people that answer their phone when the call is from an unknown number. That skews the sample to older people, as many people avoid scammers and random sales people by not answering calls by unknown numbers. Older people grew up in a society where avoiding callers was impolite, so are more likely to pick up – not certain, but more likely.

Alternatively, a small army of people can buttonhole people on the street, but that skews the poll to shoppers or business people that are out on the street. Other polls might try a combination of sampling techniques in a quest to be more representative, but they all have their strengths and weaknesses.

This problem gets wrapped under the heading ‘representative sample’. The hard math pollsters imply they use by using statistics form and jargon gets reduced to guesswork, with the fudges each pollster applies to their data to make the data conform to that pollsters view of what a representative sample should look like.

Pollsters fudge their sample by ‘correcting’ the results to the population proportions. If the poll includes half the ratio of the young people the population contains, each young persons choice is weighted twice as much. In the last presidential election, young people voted at a lower rate than older people, so a factor is applied to represent that lower likelihood of voting. This is not a slight on pollsters, as, if properly executed, these fudges improve the results, and history shows it’s better than not correcting it.

A weakness is that many pollsters derive those corrections from past polls – comparing the past polls to the actual election results in the same period, and adjusting for changing demographics.

Older people in past elections were significantly more likely to vote than younger people, but there are reasons to suspect that margin may shrink this election. Particularly with young women, who have obvious concern over healthcare driven by the abortion bans in many states.

It is of note that Covid, statistically, killed more older voters in republican states than in democratic ones. Additionally, the proportion of young people is increasing over other age groups, in part because the baby boomers are inevitably dying off, and in part because the baby boomers had a lot of kids. Those demographic changes should be adjusted for, and the pollsters do that.

No poll of voters ever succeeds in being perfectly random in all respects, or perfectly corrected to reality, at least not without a sample size that approaches the entire voting population. The reported margin of error can understate the real poll error by a lot, depending on how those experts at voting predictions weight their results.

That is why the polls were so wrong about Hillary Clinton in 2016. The combination of sampling bias and incorrect corrections left the pollsters with egg on their face. The Trump electorate was unlike previous elections, so the model the pollsters used for their adjustments broke. Luckily, a poll that includes the entire voting population is done every few years, with nearly no significant errors. It is called an election.

Pollsters have coalesced on a reported margin of error, or confidence interval, at a 95% certainty (or 19 times out of 20). It is a mathematical construct that assumes a perfect sample. The construct states that a value calculated as 2 standard deviations (or 2 sigma) in a perfectly random poll gives that 95% margin.

A standard deviation is defined as 1/(square root of n), where n is the total number of people in the poll. Three standard deviations would give a 99.7% margin. One standard deviation gives a 66% margin. Pick your poison. The polling industry standardized on a 95% margin because reasons. Some areas of science like to see 6 or more sigma before declaring something proven, but they have hard data in their data set.

With the particular corrections each pollster makes, the errors added to that mathematically rigorous definition make the reported margin of error nothing of the kind in reality. The reported margin of error is a rigorous value calculated from a non-rigorous data set, full of mushy human responses.

Even if two polls are within that rigorously defined margin of error, it says little about the relative election prospects of the candidates. Additionally, in a rigorously applied situation, the error on each individuals result is actually twice the reported margin of error, so in a close election like this one, the math says whatevs, even if the data set was perfect.

So what use are the polls? They are particularly useful to determine trends. Is a candidates chances increasing or decreasing? If a polling methodology is consistent from one polling event to the next, even with questionable results, the trend lines are likely correct. The direction of the change is likely right, even if the magnitude of the results are less certain.

The other use for polls involves those corrections different pollsters apply. If a great set of social scientists apply perfect corrections that are representative of the voting population in this election, their results might be bang on. But it means a poll is effectively a data informed best guess as to the results from a group of experts. Which poll you are looking at matters.

The trappings of mathematical certainty are used to dress up those guesses, misleading the public about what is really going on. The pollsters are experts in how poll result should be transformed into that informed guess, but the details are much more fuzzy than the trappings imply. Expert opinion is a useful tool, but back to Hillary, it can be completely misleading.

There are reasons to suspect that the balance of error this time biases the democrats low (mostly in that young people thing), so Harris might blow this thing out, as opposed to squeaking it. I think Florida and Texas might be close, so yay! I also really want Trump to lose, so I might be interpreting things to let me sleep at night, but whatevs.

Of course, the world has completely transformed a few times this election, so things may look very different at some later date…or many later dates. It’s been a wild ride so far.

Yes, I’m Canadian, but most of the world really wants Trump to lose, and it matters to us economically (tariffs, world trade), politically (Ukraine war, NATO, Israel), and morally (Trump in general). Canada tends to follow American trends with a few years delay. Trump losing lets Canada correct without going into as dark a hole. Please vote!! (Unless you plan to vote for Trump. Trump voters should stay home)

It’s probably best to just stay away from polls right now. They will make you feel crazy. If you’re into writing postcards, phone banking or canvassing now’s a good time to get offline and do that. If you have to refresh 538 several times a day, just do it knowing that those is all not much more than educated guesswork. And have a drink. It’s going to be a long month.