The final published opinion polls from the 2017 UK general election gave the Conservatives an average lead of 8.5% over Labour. Individual polls from leading outlets gave the Conservatives leads of 10%, 12% and even 13%. The UK Parliamentary Election Forecast model, which combined data provided by the British Election Study with all the publicly released polls, gave a final projected Conservative majority (updated on polling day itself) of 82 seats. A Tory majority was projected to be "almost certain".
The final result was a lead of the Conservatives over Labour of just 2.5%. Far from an 82-seat lead, the Tories fell short of a majority.
Most forecasters and forecasting models failed spectacularly to see such an outcome coming - not for the first time in recent years. The UK election of 2015, the election of Donald Trump in 2016, and the result of the EU referendum, were, likewise, big surprises to those trusting in the opinion polls.
So what has been going wrong and is there a better way of reading the polls than looking at the polling averages or political forecasting models? How can we navigate the polls coming out in the 2019 election campaign?
How a poll is produced
To understand this, it's important to grasp what the pollsters are doing when they put these pieces of work together.
They collect data by soliciting responses from a sample of the voting population as to whether they intend to vote, and how they intend to vote if they do. There are different ways to do this. For example, ICM and YouGov conduct their polling online, while Ipsos-Mori and Survation conduct surveys by telephone.
Different companies have different policies for attracting a representative sample. Many will weight their findings to take account of something called past vote recall - how the people surveyed remember voting in previous elections.
It's more common than one might expect, however, for people to wrongly recall how they voted in the past and there's an important methodological debate about how to deal with this problem of "false recall". So, if the selected sample is composed of 35% who say that they voted Labour in 2017, and we know that 41% actually voted Labour in that election, how might we best adjust the raw Labour vote share?
There has also been an increasing focus in recent times on education level as one of the key demographic dividing lines in terms of voting intention, and in getting this mix right in the polling sample.
Pollsters also have to think about the best way to measure potential turnout. Is it better to ask someone how likely they are to vote (and to weight their expressed vote accordingly) or to ask whether they voted in the previous election and weight according to that answer? Or should they do both?
Of all the issues they face, one of the biggest problems for opinion pollsters is getting turnout projections as accurate as possible. Elections turn on who actually does cast their vote as much as which party people would support if they were to cast their vote.
The 2017 election was a case in point. A significant aspect of what distinguished the final ICM poll (which gave the Conservatives a 12% lead) from the final Survation poll (which only gave them a 1% lead) was a very different projection of the share of young voters in the overall voting population. ICM thought it would be much lower than it turned out to be.
In the 2015 general election, Labour-inclined voters opted in unusually large numbers to stay at home. In 2016, poorer and less educated voters came out in unusually large numbers to vote for Brexit and Trump. In 2017, younger voters turned out in unusually high numbers. This was, in each case, critical in determining the outcome.
Finally, there is the issue of tactical voting. Many voters who express an intention to vote for a particular party will switch their vote tactically on polling day in favour of the party best placed to beat the candidate they least prefer. How to allow for this during the campaign stage of the election is another problem.
How to pick a poll
So, are there some polls that we should trust more than others? In the US, the web resource and blog, FiveThirtyEight, publishes pollster ratings, based on the historical accuracy and methodology of each firm's polls. There is no equivalent in the UK, but UK Polling Report and Wikipedia provide archives of past polls.
There is also increasing attention being paid to a modelling technique known as MRP (multilevel regression and poststratification). This is a way of producing estimates of opinion for defined areas (such as constituencies) based on its demographic characteristics and what we know about the voting patterns of the area. The YouGov MRP model did better than most other projections in 2017, but was still out by a full 23 seats in its projected Conservative lead over Labour.
With so many polling methodologies and models to choose from, is there a better way to disentangle the threads? Perhaps the best option is to turn to the betting markets, on the basis that those trading the markets, especially to significant sums, will have taken note of the relative performance of each of the pollsters and the modellers and drawn appropriate conclusions, which they are willing to back with their purse. While the markets have, in the past few years, suffered almost as chequered a record as the polls, research I have undertaken using very large data sets indicates that over an extended period they have outperformed the pollsters and the models.
Some words of warning, though - the markets do tend to overreact for a short time to any outlier polls being published. And if they say that a result is 70% probable, it does mean that there is a solid three in ten chance that it just won't happen.
Author: Leighton Vaughan Williams - Professor of Economics and Finance. Director of the Betting Research Unit and the Political Forecasting Unit at Nottingham Business School, Nottingham Trent University