Latest national poll median date: October 20
Projections reflect recent polling graciously made publicly available by pollsters and media organizations. I am not a pollster, and derive no income from this blog.

Wednesday, July 31, 2019

New Poll Weighting Methodology

Update Aug. 26: This post discusses national weights. Regional weights are discussed here.

One of the main methodological changes that I have made to the model concerns the weighting of polls. In the old formula, poll weights depended on sample size, recency and whether the same pollster has a more recent poll. Going forward, they will also depend on the presence of ALL other polls.

The old formula consisted of a number of ad hoc discount factors based on what seems sensible. It seemed to work OK most of the time, but when shifts in poll numbers occurred, one always wondered if the formula was too slow/quick to react.

The new weighting formula tries to approximate the optimal variance-minimizing (linear) formula under certain straightforward assumptions. That is, it is grounded in statistical theory rather than just being what seems reasonable to me. These assumptions will also allow me to propose approximate confidence intervals (to be explained in another post) not just for "what would happen if an election took place at the same time as the most recent poll," but also for Election Day.

Unfortunately, the new weights are derived through solving a system of (linear) equations (one equation per poll), so I can't tell you exactly when a poll would be discounted by 30% vs. 50%. Instead, I'll point out some notable effects of the changes:

- An old poll will be now discounted more aggressively if there are many other old polls. This makes sense as old polls' errors relative to current voting intentions are correlated through the change in voting intention since they were conducted. This can be very significant: the 6/27-7/2 Mainstreet poll of 2,651 would have a weight of over 50% of the most recent poll (Forum 7/26-28, 1,733 respondents) if they were the only two polls. But due to all the intervening polls, that Mainstreet poll currently counts less than 1/8 as much as the Forum poll.

As a result of this change, I will not need to change the formula to discount more aggressively when polls become more frequent: the model will automatically take care of this, and do it right.

- In certain cases, a poll can have negative weight! This appears surprising at first (my reflex was that I made a programming mistake), but it's actually not that hard to understand. Suppose there are two pollsters, A and B. Pollster A has conducted one recent poll and one old poll. Pollster B has conducted only one old poll. In order to guard against pollster A's potential in-house bias, the model wants to put significant weight on pollster B's poll. But that might put too much weight on an old data point, so it may be optimal to assign a slightly negative weight to pollster A's old poll.

Here are the details. I assume that polls have four potential independent sources of error for estimating current support:
1. Sampling variance: this is the pure statistical error, which is what a poll's "margin of error" refers to.
2. Changes in public opinion since the field dates: I mostly assume that voting intentions follow a random walk (a slight adjustment is made for potential short-term momentum).
3. The specific pollster's in-house bias: pollsters vary in their methodology.
4. Bias common to all pollsters: pollsters' methodologies may have common flaws.

Of course, there's no way to reduce #4 through averaging polls, so the weighting formula ignores it. (But it is very important in getting the right confidence intervals.)

#2 implies that any two polls' errors have a common component: the evolution of public opinion since the more recent poll. The more a poll's error is correlated with other polls', the less informative it is, and the less it should be weighted. Therefore, all polls' weights should depend on each other (over and above other polls "changing the denominator"). How much this is the case depends on how fast one thinks public opinion evolves.

Voting intentions tend to be much more volatile during a campaign - especially late - than before a campaign. Therefore, I will compute each poll's normalized age a by valuing the number of calendar days since the poll's median field date ("calendar age") as follows (updated with the tentative date of the first debate: October 7):
- Before September 1: 0.1
- September 1-20, before writs are issued: 0.2
- September 1-20, after writs are issued: 0.5
- September 21-30: 1
- October 1-7: 1.5
- October 8-21: 2
For example, a July 15 poll will be considered 3 days old on August 14. A September 15 poll will be considered 7.5 days old on September 25 (5*0.5 + 5*1), as writs must be issued by September 15. An October 10 poll will be considered 14 days old on October 17.

I assume that the variance of the evolution of public opinion over a normalized time period of length a is V(a) = 0.0001a. Assuming a normal distribution, this implies that, in a given week, for a given main party (i.e. 30%+ in the polls), there is approximately a 30% chance of a change in support exceeding:
- 0.8% before September 1
- 2.6% toward the end of September
- 3.7% in October, after the debate
I didn't do any formal analysis for this calibration, but to me, eyeballing how things evolved in the past two campaigns, this passes the smell test.

This formula implies that changes in voting intentions are uncorrelated over time. On a weekly scale (which is the scale of my "eyeball calibration"), the assumption is not too crazy. But on a daily scale, it's a poor assumption: "momentum" is clearly a thing when key movements happen during a campaign. To partly correct for this:
- Before the writs are issued, I adjust a poll's calendar age down by 1 day if it's older than 1.5 days, and down by 2/3 if it's less than 1.5 days old.
- After the writs are issued, I adjust a poll's calendar age down by 2 days if it's older than 4.5 days, down by 1-2 days if it's 1.5-4.5 days old, and down by 2/3 if it's less than 1.5 days old.

Finally, factor #3 is taken into account by assigning a individual-pollster variance of 0.0001 0.0002 (updated per this Aug. 10 post) (that is, 1 ~1.4% standard deviation). I don't have any great confidence in this parameter value, so let me know if you think it's unreasonable. But remember that this does not take into account any sampling variance, so it's normal that polls (even those in the field at the same time) vary by much more than this implies. The effect of this is that a poll will have more weight if there are fewer other polls - especially if there are no more recent polls - by the same firm. (The old formula crudely approximated this by applying a 50% discount to any poll that is not the firm's most recent one.)

What I'm NOT doing (and would ideally be doing if I had unlimited time)
- Except for the small adjustment mentioned above, I do not take into account serial correlation in changes in voting intention (in particular, I do not project forward from past trends), and the parameters I use are from "eyeballing" past patterns rather than a rigorous analysis.
- I am not adjusting pollsters' results for bias. That is, if pollster X consistently has better results for Party A than other pollsters, I am not adjusting X's numbers for A downward. Getting such adjustments right would require an analysis of recent years' polls for which I don't have time. Moreover, they should roughly cancel out across pollsters at crucial times in the campaign when most pollsters publish a poll. At quieter times, though, this will cause the average to be a bit more topsy turvy than it should be.
- I am not grading the quality of pollsters (and accordingly modifying the weights). As above, lack of time.
- I will maintain an ad hoc approach when incorporating provincial/regional polls. They contain useful information, so I won't ignore them and will "eyeball" the weight they should receive. However, dealing with them systematically (in particular, separately computing the weights on the provincial numbers of national polls) doesn't seem worth the trouble. I now deal with regional polls systematically (though not quite optimally). Details in the regional weights post.
- Riding polls will not factor into polling averages (though they can inform riding-level adjustments).

No comments: