Choice models beat polling

Discrete Choice Modelling – Why it Beats Polling

I think it’s time for a comprehensive but (I hope) readable post on discrete choice modelling, using the timely application of opinion polling: why and how it works, given my success in the first proof of concept polling study using the most modern type of choice model – Best-Worst Scaling, for which I literally wrote the textbook.

To understand why conventional survey questions – especially the ‘who would you vote for?’ polling questions – can and must fail, it’s necessary to first understand the insoluble problem underlying them. For the statistics nerds out there the short (tl;dr) answer is ‘heteroscedasticity in limited dependent variable models’. For the rest of us, here’s the intuition, starting with a trivial model and adding some simple features that turn it into a realistic 21st century election.

Mr and Mrs Smith have broadly similar left-wing views – let’s assume a UK context and that they are usually Labour rather than Conservative supporters. Change this to ‘Democrat vs. Republican’ for the US if you like. A  polling company obtains their answers on Monday: they both say ‘Labour’. On Friday, following a slow news week where nothing material has happened either nationally or to them, the exact same survey is run, but Mr Smith says Conservative whilst Mrs Smith says Labour again. What gives?

The answer is that most people are inconsistent on repeated occasions purely due to random factors – maybe by a lot, maybe by very little, but 90 years of research in mathematical psychology has established that outside some specialised contexts (like religion), you can rely on this, and use it. Now, suppose the true probability that Mr Smith will display ‘Labour party loyalty’ (leaving aside specific policy issues – we’re talking about Labour purely as a ‘brand’) is 60% whilst his wife’s figure is 90%. On average, if you leave aside all the important stuff – like individual policies(!) – you’ll correctly classify their preferred party. But that’s the thing – the poll is a one-shot snapshot and isn’t seeking merely to measure ‘party loyalty to the name of the ‘brand’ (party) per se. It cannot realise that Mr Smith IS basically a Labour supporter, but that there’s greater inconsistency in his level of support, compared to his wife. And this matters. A lot.

To understand why, let’s drill down a bit deeper into how a choice modeller would model an individual’s views. This focus on the INDIVIDUAL is crucial to understanding what is going on. (Please don’t ‘straw man’ me with arguments based on comparisons across people. WRONG PARADIGM.) An individual has some internal unmeasurable (‘latent’) scale, indicating ‘level of support for Labour vs. Conservative’. He/she (policy issues aside for the moment) places ‘the Labour brand’ at some point on this (higher up in both the Smiths’ cases) and conversely ‘the Conservative brand’ further down. These are the mean (average) positions on the scale. However, uncertainty/error means there is a measure of inconsistency around each position – it has a variance – there is ‘noise’ in the model. Thurstone, back in 1927 realised this and how we could capitalise on it. Essentially:


How often an individual chooses Labour over Conservative gives a quantitative estimate on a proper mathematical (e.g. ratio – probability) scale of how much (s)he values Labour over Conservative.


His insight was that provided the ‘noise’ is not too small, then a signal-to-noise model (Random Utility Theory) could allow us to estimate the position of ‘Labour’ and ‘Conservative’ on this latent scale by observing how OFTEN the INDIVIDUAL chose A over B (Labour over Conservative). Crucially, we DON’T ask people to try to place the parties on some numerical scale themselves – it has long been established that people are awful with such tasks and they don’t link to real life anyway – the TASK should be:

  • as close to the REAL-LIFE DECISION (a discrete vote) as closely as possible AND
  • be capable of providing real probabilities of support (e.g. odds ratios)

if we are to claim statistical validity (and prediction) in terms of “voting probabilities”.

So, in repeated surveys we could establish that Mr Smith chooses Labour 60% of the time and Mrs Smith 90% of the time. However, whilst a step forward, this is where the real problems arise. There are actually an infinite number of ways in which we could observe Mr Smith’s 60% Labour support level and Mrs Smith’s 90% level. To understand why, let’s return to our latent Party Loyalty scale and consider two very different ways in which we could get these numbers.


Differences in Means

Perhaps both Smiths have the same variances – levels of uncertainty in their party loyalty. The 30% difference (90% vs. 60%) reflects a genuine weaker loyalty to Labour in Mr Smith compared to his wife. His ‘mean level’ is lower than hers.

Differences in Variances

Perhaps both Smiths actually have the SAME underlying level of party loyalty – their mean level for Labour (and thus conversely the Conservatives in a two-way fight) on the latent scale is the same. It’s just that Mrs Smith pays more attention to politics, she goes to Labour Branch meetings, is better informed, and is simply less likely to have random factors change her answer. Her variance is smaller and she will pick Labour more often. This difference in variances is called heteroscedasticity. Now, many statisticians (particularly in economics, but also health services research etc) deal not with ‘discrete outcomes’ (A, from a set of A, B & C…) but continuous ones (like level of GDP, systolic blood pressure etc). For continuous outcomes heteroscedasticity is simply an annoyance – a nuisance factor you need to make adjustments for so your standard errors are not wrong (i.e. you’ve got the right ‘margin of error’ to use a colloquial but not strictly accurate term). Your main estimates of effects (your ‘betas’ in your regression to explain how age, weight, sex etc affect blood pressure) are still unbiased (correct).

An absolutely crucial paper by Yatchew and Grilches in 1985 is where everything now goes to pot. They proved that in limited dependent variable models (where the outcome is discrete – such as party choice), heteroscedasticity is NOT ‘just’ an annoyance that affects  standard errors – it is an absolutely critical problem because NOW your beta estimates of the (mean) effects (such as underlying party loyalty) are wrong. They are biased. Typically in an unknown direction and certainly with unknown magnitude.


What does this mean in practice?

Quite simply, as Ben-Akiva & Lerman pointed out at around the same time, you can’t aggregate across people with different variances – in this case Mr and Mrs Smith’s answers. At least, not until you have identified whether their differences are due to differences in variances and, if so, adjusted for this first.


So what prevents us doing this?

Even if the pollsters got multiple comparisons from each of the Smiths, identifying that Mr Smith chose Labour less often than Mrs Smith, both of the above explanations for this – differences in means or differences in variances – are equally likely.

There’s nothing in Mr and Mrs Smith’s answers per se to tell us ‘which world’ we are in.

Thus, in any discrete choice model we have no way to distinguish which is the ‘real’ cognitive model producing these 60% and 90% support levels. For stats geeks, the ‘beta’ estimates from our model produced by any stats package are actually ‘beta multiplied by (some function of) the variance’ – the means and variances are perfectly confounded (inseparable). An analogy is:

y = 2xz (two times x times z where x and z are both unknown parameters of interest)


solve for x and z.

There are an infinite number of combinations of x and z that solve this. In a choice model there are an infinite number of mean-variance combinations that give us 60% (or 90%). The ‘all mean’ and ‘all variance’ effects described above merely give the two extremes – the actual cognitive model is probably somewhere in between and, even worse, we have no way of knowing if this model is the SAME for each person (in terms of variances). Statistics programs – and pollsters – ASSUME that differences in observed choice frequencies come from differences in the MEANS (absolute party loyalty levels). In their defence, they have to do *something* and setting the variance equal to one across all people is what they do; eminently OK until you think like a psychologist or choice modeller working in the real world.

IF AND ONLY IF that assumption is correct, then the aggregate vote estimates are correct. If this assumption is violated then your estimates are wrong. And you’ve NO WAY to correct them afterwards. Already a plethora of studies (summarised here – typically where the researchers have constructed designs that they know are likely to muck around with people’s certainty) – have established that variances change according to any/all of a whole host of factors, including education levels, age, context, information provided, task complexity, experience, etc.


So choice models tell us more than a poll does about the individual – but if they can’t separate mean and variance effects either, how does that move us forward?

Two reasons, one which makes our model into one that is recognisably a modern voting one, and a second, which recognises that choice modelling is part-art part-science.

  1. A realistic model of choices on offer.

A choice model gets a respondent to answer more than one question. It doesn’t just ask ‘Labour or Conservative’ ten times. That would be silly. Complete manifestos are offered (including leader!) In each of the (say) ten comparisons it varies the party manifestos in a systematic way (based on design theories perfected in recent years). We observe how the Labour-Conservative choice changes in response to changes in policy components of the manifestos. In other words, it typically breaks down the Labour and Conservative manifestos into the main policy areas, then manipulates these according to statistical designs in a series of comparisons to ascertain ‘how much each individual policy affects an INDIVIDUAL’s vote’. We can then break down a manifesto into its constituent parts – like a LEGO set – and construct ANY new potential manifesto to see how it fares against any other. However, a 60/40 predicted probability of support for two competing (one Labour, one Conservative) manifesto still doesn’t tell us if the individual has ‘small but certain’ preference for Labour (mean effect) or ‘large but uncertain’ preference (a variance effect).

Now, in isolation, this helps to some extent – since we typically have a better idea about people’s views on individual policies than about their view of a complete manifesto – but the mean-variance confound is still there. It is (unlike many so-called ‘Laws’ in Economics) a real statistical ‘law’. But in conjunction with the second reason, we can really move forward.

  1. Come at your data from various directions, use experience, and knowledge.

What, essentially, the most modern choice models do is look for ‘patterns’ of preferences that are more likely to be consistent with mean effects or more likely to be consistent with variance effects. This is what drives statisticians of the ‘turn-the-crank’ variety mad – I have the scars to show for it from the review processes of my papers published from my previous long academic career.

One simple example will illustrate, established early on. Split your data according to whether the respondent has a post (high) school education. Run the regression on each sample. Your stats program will assume a variance of one in each case, and your ‘betas’ then look much larger among the more educated group (suggesting larger means). Is that reasonable? Auxiliary questions to quantify numeracy and literacy strongly suggest that on average, those who went to university understand such tasks better and make fewer errors. Thus it’s more likely to be a variance effect – particularly if the pattern of betas (all their relative magnitudes) is maintained in both groups. That phenomenon is now – thankfully – fairly well acknowledged. However, you can’t apply it blindly. Areas like end-of-life care preferences often display smaller variances among older people (who’ve thought about the issue more) which can mess things up if they (as has historically been the case) have been less likely to attend university. However, in principle, we can do a lot to ‘rule out’ certain mean-variance combinations, identify ones that are ‘possible but unlikely’ and ones that ‘are likely given what we know from previous empirical work’.

So interpretation becomes an art and a science. Even when using state-of-the-art software that attempts to perform such separation cleverly, I can count on one hand the number of discrete choice models where I have followed the ‘proper’ statistical rule on which of a number of competing models is ‘the most likely to have generated the data observed’.


So where does that leave us in running and interpreting choice models?

Choice models are a powerful tool. Unfortunately they are in danger of getting a bad rap due to people using ‘black box’ models that don’t force them to understand what are the possible COMPETING explanations for their data. A good choice modeller can reduce the number of possible explanations from infinity to a small number by:

  • Knowing the data inside out – using a series of tools and tricks that are frequently not part of the analyst’s ‘toolbox’ in order to gain insights into the effects of individual policies
  • Recognising certain patterns that are more indicative of mean or variance effects – something that is more art than science and typically is only learnt via experience
  • Using information that is external to the model – attitudes currently seem to offer the best hope in doing this (see below on YouGov).


Some implications for parties and their canvassers

Suppose a canvasser for the Conservatives knows (from information external to the choice model), that Mr Smith has a large preference (mean value) for renationalisation of the railways. Maybe Mr Smith was one of the engineers in British Rail in the 1970s who helped develop the tilting train used to deal with the notoriously non-straight intercity routes of the UK, which Mrs Thatcher cancelled, was bought by the Italians for a song, then sold back to the UK (Virgin Trains).  The canvasser knows that in the time available (s)he probably won’t be able to reverse Mr Smith’s view. However, by introducing more information (whether true or not) about ‘the problems nationalisation more widely cause’ it may be much easier to increase his variance. Hey presto, a formerly 90/10 chance of a nationalisation policy being supported becomes 60/40. The Conservatives have destabilised the Labour vote on that issue, and whilst Mr Smith may still be in favour of the policy, his uncertainty may be enough to cause him, if not switch party, then maybe just stay home on election day.

However, suppose Mr Smith has a small preference for nationalisation but is pretty sure of it. The canvasser would do better to move onto challenging another policy – after all, Mr Smith is unlikely to have his view changed, and unless real demonstrably correct new information is provided to change the mean positions of ‘nationalisation’ vs. ‘privatisation’ then time is better spent introducing uncertainty in his mind on something else.


A final note on attitudes and YouGov’s model.

A small number of us working on frontier issues in choice modelling have realised that ‘intrinsic attitudes’ can help both in separating means and variances, and, in crucial areas like polling, provide a robust estimate as to the likelihood that Mr Smith will turn out to vote at all. I’ve already written how ‘immigration’ appears to engender a positive attitude among people in my region (the East Midlands) but it didn’t directly factor in their decision on whether to vote REMAIN or LEAVE in the EU referendum at all. (It is clear that it did have an indirect effect, via the strain on essential services and real wages, but I’ve covered that already.)

Coincidentally, YouGov has reached a similar conclusion. Their ‘new model’ purports to proxy that ‘big unknown’ – turnout – via strength of attitudes. Plus they move closer to an ‘individual-level model’ by getting down to constituency level. Kudos – I agree with this diagnosis based on my own work when based in Sydney (2009-2015) when virtually all the world experts in the field were located together. However, though they have the correct diagnosis, their cure will ultimately fail.

YouGov’s problem is that they don’t measure attitudes in the right way. This goes back to the issue of the TASK matching the REAL WORLD CHOICE. People don’t express attitudes on a (Likert) category rating scale, or simply answering yes/no to whether they agree with an attitude. Neither method tells us HOW OFTEN they’d (for instance) choose ‘more money for public services’ over ‘a balanced budget’ when those two policy objective compete and when they can’t both be satisfied.

Choice models have already shown their worth in quantifying attitudes. Ultimately YouGov’s model will fail, because the ‘importance’ of key attitudes – towards European objectives and others – are in flux and unlike in the heyday of polling, key issues don’t all lie on a simply ‘left-right spectrum’ (Europe for starters). My modelling has illustrated how I know that support in the UK is shifting towards SOFT BREXIT – For example I know that, when looking at two policies that CANNOT co-exist in any political reality – the SINGLE MARKET and CONTROL over MIGRATION – the former is now winning when competing head-to-head in a choice model. Thus I know how things have changed and are continuing to change since last year’s referendum.

Policy-makers should take note. Choice modelling got Professor Dan McFadden the 2000 ‘Nobel’ Prize for Economics for successful prediction. I won at the bookies for also doing so regarding BREXIT and to what extent it would help ‘Corbyn’s army’ turn out to deprive May of her majority.

The proof of the pudding is in the eating. And companies such as BOSE are full to bursting thanks to the method – they refused to let our Sydney unit mention them when marketing ourselves – they were our biggest client. They keep very quiet about it – after all, why tell your competitors how you ate their lunch?


Copyright 2017 Terry N Flynn – TF Choices LTD