It quickly became apparent that the problem is that I was using the national weights for regional breakdowns. National weights are derived using a statistical model that trades off the need for sample size, recency and pollster diversity. For regional breakdowns, sample sizes are much smaller, which means that ideal weighting would need to work harder for sample size, and therefore sacrifice recency. Thus, older polls should be weighted more heavily for regional breakdowns. As a result, going forward, regional breakdowns should be less volatile - though the national volatility will remain the same.
(Small) Update Aug. 26: Separate set of weights for the BQ. Also I'm now adjusting for imbalanced national polls (polls that oversample in some regions and undersample in others): their effective sample size for the national numbers is smaller.
Update Sept. 25: Standalone regional polls are now weighted differently from the regional breakdowns from national polls. Details underlined at the end of this post.
I'll get to the details later in this post, but for now, here is the projection with the new methodology:
Projection as of the latest national poll (midpoint: August 21)
CON - 146.3 (
GRN - 3.7 (9.5%)
IND - 0.5
PPC - 0.5
If you're new to my 2019 projections, view key interpretive information here.
As you can see, the totals are little changed. The Bloc loses
- All 7 seats that switched in the Atlantic stay LIB.
- In MB, Winnipeg South stays LIB.
- NDP regains Edmonton Strathcona from the CONs.
- GRNs regain Victoria from the LIBs.
Now the details of the methodology change. How to deal with regional numbers is tricky: it depends on the correlation between how different regions move. Consider these extreme cases:
1. Regions move in lockstep: their difference with the national vote is fixed. In this case, the national average should be calculated with national weights. Regional averages should be the sum of the national average and each region's difference from the country. The latter should be estimated without discounting polls based on age, as the nation-region gap does not evolve.
2. Regions move completely independently. In this case, regional averages should be calculated with region-specific weights, and the national average should be derived as a weighted average of regional averages. These region-specific weights would reflect two opposite considerations:
(i) smaller regional samples mean that one should weight older polls more (as explained earlier), and
(ii) in order for independently evolving regions to generate a certain level of national volatility, regional volatility would have to be much larger, which means that one should weight older polls less.
Consideration (i) wins out for smaller regions, and consideration (ii) wins out for larger ones. On average, the amount of discounting should be similar to what national weights would imply.
Reality is, of course, somewhere between these extreme cases. In both situations, the amount of discounting in optimal national weights remains suitable. Therefore, I do not make any change to how I derive national weights. Additionally, higher weight should be given to older polls for small regions (and also for large regions unless reality is very close to case #2).
Given this, I derive separate weights for each region using a procedure that is a blend of what would be called for in cases #1 and #2. The idea is for the regional weights to help estimate the difference between each party's regional and national support, rather than the regional support directly.
1. Regional weights are derived using regional sample sizes and a formula similar to that used for national weights, with the following modifications:
- The sample variance takes into account the fact that we're now thinking about the difference between regional and national vote (except, of course, for standalone regional polls).
- The time variability parameter is set assuming that the difference between the region and the rest of the country is half as volatile (so ~70.7% of the standard deviation) as the national support level, except in QC, where the two variances are the same. This is consistent with the observation that regional deviations from national movements tend to be smaller than the national movements themselves, except in QC, where they appear similar. An adjustment is made for the fact that the national average incorporates the regional movement (this matters mainly for ON).
- Update: A separate set of weights is computed for the BQ, with a time variability parameter that includes both the national and provincial variability that other parties have.
2. These weights are first applied to the difference between regional and national numbers in national polls with regional breakdowns. The weighted averages are added to the national averages (which also include national polls without regional breakdowns) to produce preliminary regional polling averages. These preliminary averages are adjusted upward or downward to match the national polling averages. This last step is necessary because regional weights no longer match national weights, and intuitively, it makes sense because recent movement in the rest of the country carries some information on recent movement in a given region, as they're correlated. [Of course, this paragraph does not apply to the BQ.]
3. Finally, standalone regional polls are incorporated using the regional weights to obtain the final (pre-turnout adjustment) regional polling averages. An adjustment is made to account for the fact that standalone polls need to be discounted more quickly, since, in addition of being affected by the region shifting relative to the country, they are also affected by national shifts. This adjustment takes the form of using a smaller sample size than the actual one for standalone polls, with the difference growing as time passes. The national polling averages are then obtained as a population-weighted averages of the regional polling averages.