The UK’s general election took place last week, on Thursday 7 May, 2015. It was an election that had been hyped for being ‘too close to call’. According to the polls, the government was likely to be a coalition of one or more, with no party achieving a majority. It could have gone either way.
The UK’s general election took place last week, on Thursday 7 May, 2015. It was an election that had been hyped for being ‘too close to call’. According to the polls, the government was likely to be a coalition of one or more, with no party achieving a majority. It could have gone either way.
Imagine the shock when the BBC announced the exit poll results: a landslide victory for a single party – the Conservatives.
Election polling companies are reliant on various types of data to come up with accurate predictions. Like any business, they must apply quality control to their data. They must cleanse it, eradicate errors and duplicates, and ensure their contact records are up to date. They need to ensure they don’t call the same person twice, and they must encourage people to give accurate data in response.
How could so many companies get it so badly wrong? And was the data at fault, or was there another gremlin in the machine?
Precedents in polls
This is not the first time that polling data has let down the public, politicians and press.
During the US election in 2012, opinion polls predicted a tough campaign for President Obama. He wound up with a comfortable majority. And in 1992, there was a direct comparison with last week’s data disaster. A close race was predicted; the Conservatives won comfortably then, too.
Even this year, in March, polling companies in Israel underestimated support for Netanyahu’s Likud party. He was, in the end, a clear winner.
Even though polls are not binding, they influence voters in the run up to an election, and can even influence the policies that parties formulate as they seek to capture the mood of the electorate. It’s therefore critical that polls can be relied upon. And that makes it a matter of data quality.
How data is sourced
Election polling companies get their data from a variety of sources. YouGov has posted an excellent blog detailing how its surveys work.
In brief, YouGov (and similar organisations) collect data by determining preference for a particular party. Responses are gathered online, and over the phone.
Initially, it’s tempting to think that perhaps different voters have different ways of answering polls. But YouGov adjusts for this already. Labour voters are more likely to use online methods, and Conservative voters the telephone. But it says the data being collected was the same via each method. So we can rule out ‘mode effects’ based on this.
Another culprit is a change in turnout. The polling companies take a small data sample and extrapolate the results, based on the size of the actual sample. So they could be filling in blanks in the wrong way.
Dissecting data collection
YouGov says that there may be “methodological failure” in the way these polls are being conducted.
Interviewers may be asking questions in a loaded way, or influencing the answers by their mere presence.
In the US, polling companies are legally required to manually dial mobile telephone numbers. They do not have to manually dial landlines. This has lead to the landline being favoured, and it’s possible that – culturally – this habit has been persistent in the UK, too. Due to automated calls and marketing campaigns – both relatively new phenomena – some of us view landline telephone calls with suspicion. We want to get off the phone as quickly as possible, so perhaps the data we give is hurried, or we say what we think we should say to get it over with.
They may have asked the wrong questions: how people would vote on local issues, for example, rather than which leader or party was favourable at a national level.
But there may be a more mundane reason. When collecting data, you have to assume that the data is being provided truthfully and accurately. If someone gives you their email address, you need to trust that it will be genuine.
It may be that some people did not supply accurate data to the polling company in the first place: they “said one thing and did another”. This is what Peter Kellner, president of YouGov, thinks is most likely to be the problem. Marketers face this problem all the time: people provide fake information in order to avoid being added to marketing lists. Could it be that people are just less inclined to tell the truth?
The consequences of poor quality data
As providers of data quality software, we have a saying. “Junk in – junk out.”
Whether it’s a company balance sheet, a marketing report or an election poll, the outputs that are generated are only ever going to be as good as the data that was used to build them. This is why data quality is so critical to success and profitability.
Think of it in simple terms. If you compile a list of 1,000 contacts and send a marketing letter to all of them, you need a clean list: one free of mistakes, duplicates, misspellings and ‘gone away’ records.
If you get 500 letters ‘returned to sender’, you can safely assume that your mailing list is seriously decayed. Not only is that inconvenient, but it’s effectively doubled the cost of your campaign: 50% of the effort was wasted.
For polling companies, their entire business model is based on obtaining accurate data, and purifying data to ensure they have a reliable snapshot. They need to take small, frequent samples via polling to gain an understanding of wider trends. This means mistakes are going to be amplified, as they clearly were last week. Now, more than ever, data accuracy is the number one goal.
If telephone polling is less reliable, companies now face a new era, where response rates and accuracy need to be higher. The British Polling Council is conducting an enquiry into the reasons behind last week’s failure. Data quality will undoubtedly be placed under the spotlight. Without it, we may never fully trust election polls again.