Electronic Voting Machines and the Election

By Thomas Cooley, Ben Griffy, and Peter Rupert

Three states are facing or currently undergoing a recount of votes cast, after a number of computer scientists reported some evidence of problems with the electronic voting. This finding was heavily disputed in the media, and seemingly little evidence was produced to support the conclusion that there was malfeasance in counties with electronic voting. Indeed, following the initial media response, the lead computer scientist backed away from initial reports, saying that there are flaws in electronic voting that could be easily exploited, and that an audit is important, but there isn’t direct evidence. We use our data to explore the claim that counties with electronic voting exhibited different voting patterns than their paper peers. What we find is definitely troubling: in some of the swing states, and specifically in states that were projected to vote Democratic at the top of the ticket, those with electronic voting had a decrease in the percent of the total vote going for the Clinton-Kaine campaign, and an increase for the Trump-Pence campaign. We try to determine if this is spurious by checking for patterns in other places with electronic voting, as well as during the 2012 election. We only find this correlation for swing states during the 2016 election.


We use the American Community Survey (5-year) for demographics (race, age, gender, education), data from the BLS on unemployment (October 2016 preliminary estimate), data from the BEA on personal income (2015 estimates; more recent estimates include many fewer counties. We use data from Verified Voting for voting machine type (here), which lists type, make, and model of voting machine by county for all states. Finally, we use voting data from Politico for the 2012 and 2016 elections, as well as data from CNN for the 2008 election. We have updated our data slightly since our last post, and the updated file is available here.


First, we graphically explore areas where various attributes (i.e. race, gender, education, income, unemployment, population size) do a good job explaining election outcomes, and areas where they do a worse job explaining the outcome. We started this in a previous post, and continue along those same lines. We find how much of the shift in voting patterns can be explained by these attributes by running a regression (including state fixed effects). We then use these predictions to assess how far each county is from their predicted outcome. Graphically, these differences are as follows:


A “blue” county is one in which the Clinton campaign outperformed what would be predicted by the county’s demographic and economic characteristics, while a “red” county is one in which the campaign underperformed. The set of attributes do a good job explaining the election outcomes, with more than 90 percent of the counties falling less than 3 percentage points above or below our prediction. There do appear geographic patterns, however, in the over or under performance. Now, here’s the map of counties with electronic voting machines:


Green counties signify counties that exclusively employ paper balloting methods, while yellow counties are ones that employed either a mix of paper and electronic voting, or electronic voting exclusively. It’s worth noting that only 76 counties in the entire country use only electronic voting machines, with nearly all of these located in Pennsylvania. Now, as a visual explanation of what we will do, compare the two above maps. If you focus on the swing states (Wisconsin, Pennsylvania, North Carolina, and Florida), what you see is a pattern emerging in which our model underpredicts Democratic support in counties where paper ballot methods are prevalent, and overpredicts Democratic support in counties where electronic voting methods are prevalent. In other words, counties with electronic voting machines are (visually) less likely to vote for Clinton than we would expect given their demographic makeup. Importantly, this pattern does not appear to be  visually present in states that were never considered swing states, i.e. Texas, California, Washington, Illinois, where there is visually no correlation between voting methods and support. Focusing on Wisconsin, Pennsylvania, North Carolina, and Florida, we see


Here, we remove all counties with only paper voting, and focus on four key states that employ a mix of electronic and paper voting. Yellow counties are those with electronic voting who disproportionately voted for the Republican ticket when compared to their county demographics. Key areas, specifically population centers in each state appear to have voted less frequently for the Democratic ticket than would be predicted by their characteristics. But of course, visual inspection can be deceiving, so we now turn to more robust analysis.

To assess whether there were inconsistencies in swing states for counties with electronic voting, we use the same specification as above, but include an indicator variable for whether a county is in one of Florida, North Carolina, Pennsylvania, or Wisconsin, as well as an indicator employs electronic voting machines (EVM in the table below).


The coefficient of interest is the last one: This says that being in a swing state and having electronic voting in a county was associated with a 0.8 percentage point decrease in support for the Clinton campaign relative to support for the Obama campaign in 2012, after controlling for the attributes. This result is statistically significant, meaning that electronic voting machines in a county, or things that might be correlated with electronic voting machines in a county, are able to explain some of the results in these states. Ok, sorry, but here is a little “techy” stuff, we include state fixed effects (i.e., we account for how the overall state changed its vote during the election), employ clustered standard errors, and weight the counties by their population. This result is not limited to these four swing states (it is a larger effect if you include states that were considered swing states, but went Democratic, like Colorado). Our code and data are available here: code, data for those who wish to explore this result. We look at these four states because they were predicted to go Democratic before the election, and because exit polling the night of the election also put them squarely in the Democratic column:


If we expand our group of states to include other “swing states,” these results continue to hold as well. One notable exception is Ohio, whose counties exhibited a positive association between electronic voting and difference in voting patterns. For Ohio, it’s important to note that a large number of votes (over 20%) were cast by mail prior to the election, and that polls as early as October 28th were suggesting that the state would move to the Republican column. This may not be entirely satisfactory, but we wouldn’t necessarily expect to detect an effect if large numbers of ballots were cast in advance. Our exit poll data was obtained from TDMS Research, and are “unadjusted (night of)” exit polls; Edison Research alters their exit polls after the election to better reflect the electorate that they believe voted. It’s worth noting that these unadjusted exit polls have been shown to be unreliable in the past.

Of course, what we find could simply be spurious correlation, or simply a correlation between the placement of electronic voting machines and some underlying factor that was correlated with additional support for the Republican Ticket. We can’t directly discount these explanations, but we can explore the variation in voting patterns among states that were never considered swing states. If these “non swing states” exhibit the same type of pattern, i.e. electronic voting machines implied fewer votes for the Democratic ticket, then we would think that electronic voting machines are more common in places that changed their votes in the election for some other reason. We first explore this for four strongly Republican states, Arkansas, Missouri, West Virginia, and Kansas. The counties in these states exhibited approximately the same average change in support for the Democratic ticket when compared with the swing states, -6.6% on average for counties in swing states, and -7.4% for counties in the strongly Republican States. They also have about the same prevalence of electronic voting machines, with 53% of swing counties having electronic voting, and 50% of strongly conservative counties having electronic voting. The results are as follows:


Unlike before, there is no correlation between electronic voting and a change in support for either party. Note that we can include larger strongly conservative states like Texas, and the results still hold. Now, is there any pattern in strongly Democratic-leaning states, like California, Illinois, Washington, and Virginia?


Again, we find no correlation. Note that we use Virginia because it contains variation in electronic voting, though it is arguably still a swing state.

This is pretty strong evidence (we believe) that counties in swing states with electronic voting are different in some important way that isn’t captured by some underlying correlation across the country. If we thought that there was some non-random placement of electronic voting machines across the country, we would expect the pattern from the swing states to hold up nationwide. It does not, which suggests that these differences are limited to places that were expected to be close during the election.

Finally, we repeat the same exercise for swing states during the 2012 election. Data on electronic voting for the 2012 election is also available from Verified Voting, and is included in our data for analysis. For this, we choose Florida, North Carolina, Virginia, and Ohio, states that were expected to be close during the 2012 election and also contain counties with and without electronic voting. What we find is the following:


For the 2012 election, no correlation arises between electronic voting and states that were expected to swing the election. This again suggests that our results for the 2016 election are not simply spurious correlations.

It’s also worth noting that even if we assigned all counties in the country paper voting, the size of the effect is not large enough to change the election:


But, it’s hard to tell what the real size of the effect would be without more detailed data.

It’s tough to draw precise conclusions as to what these correlations mean. It’s still possible that there are other factors driving our results, other than electronic voting. But, what we do know is that results in key swing states differ in counties with electronic voting. Further, the patterns in these counties are not exhibited by other similar but not electorally important counties across the country. Additionally, electronic voting had no impact in swing states during the 2012 election. Taken together, it seems tough to dismiss the correlations that we have found in the data. While we don’t know how to interpret the findings practically, it certainly lends credence to the efforts to initiate recounts in several of the swing states.


uncleaned data: link

cleaned data: link

Stata code: link

github code (note, some of this code is mildly out of date; will update soon): link

Interactive maps:

Unexplained Variation map: link

Voting Machines map: link

Exit Polls map: link

Outcome with no Electronic Voting map: link



Download Our Election Data

In order to facilitate broader discussion of the election, we have written a set of python scripts to download and organize data relating to the election. There is still a fairly high barrier to obtaining election results, so we wanted to make a clean source available for those interested. In addition to the series discussed in our previous post (here), we have included data on voting machines for those who wish to explore questions related to the recount.The code will download election results, graph them, and merge them into a .csv for statistical analysis.

We have made them available through the following sources:

  1. github: here
  2. dropbox: here

To run it, install Python (we suggest Anaconda), open a terminal and run the “Main.py” program from the file in which it was downloaded after editing options. It is likely that with a fresh installation of Python, additional modules will be necessary. This can be done by opening a terminal and typing “pip install <module name>” without quotes, and the required module substituted for <module name>.

Series available (County-level):

  1. 2016 Election (President, House, Senate, Governor)
  2. 2012 Election (President, House, Senate, Governor)
  3. 2008 Election (President, House, Senate, Governor)
  4. 2004 Election (President, House, Senate, Governor)
  5. Economic Statistics (unemployment, income, establishments, industries)
  6. Demographics (race, age, gender, education)

Series provided but not merged (County-level):

  1. 2002 industry composition
  2. Voting Rights Act coverage
  3. Voting machine type (paper, electronic, etc.)

The available options are explained and edited in the “Main.py” file. We will be gradually updating our code to include options for more series, as well as merging the “extra” series currently not merged.

Any coding contributions or comments are much appreciated.

Q3 GDP Revised Up

By Thomas Cooley, Ben Griffy, and Peter Rupert
Today’s second estimate of real GDP from the Bureau of Economic Analysis shows an increase of 3.2% for Q3. The advance estimate for Q3 had an increase of 2.9%. The final estimate for Q2 was also revised up to 1.4% from 1.1%. The year over year change (blue line) had been trending down for the past 5 quarters or so.

The overall rise in real GDP was led by a 2.8% increase in real personal consumption expenditures (PCE) that contributed 1.9 percentage points to the gain in GDP. Compared to other recoveries this one is now quite mature, yet continues to grow at a steady pace.



There was also a large rise in exports, up 10.1% and imports also increased slightly, up 2.1%. Overall, net exports contributed 0.87 percentage points to GDP growth. Investment, on the other hand, continues to be weak, coming from both nonresidential (up 0.1%) and residential fixed investment (down 4.4%). Spending on equipment has declined 6 out of the last 8 quarters.


This GDP report certainly provides enough support for the FOMC to raise rates during their December 13-14 meeting. Friday’s jobs report is expected to reinforce the view that the economy is on a stable path and that monetary policy can be normalized.

How Trump Won

How Trump Won

By Thomas Cooley, Ben Griffy, and Peter Rupert

At the start of Nov. 8th, most pundits would have been equally shocked by a Donald Trump victory as they would have been by Harry Truman rising from his grave clutching a newspaper celebrating his 1948 electoral victory. Almost universally, onlookers predicted a large, if not resounding victory for Hillary Clinton. And now a week later, many of those pundits have begun to acknowledge their own hubris in their predictions.

We take the opportunity to explore this and the past several elections, to see what differences might have driven such an unexpected outcome. What we find is interesting: Once we control for the level of education and unemployment in a county, the proportion of white men in a county was not predictive of a higher likelihood of voting for Trump. Counties with higher unemployment and less education were much less likely to vote for the Democratic ticket than they were in 2012, while all race and gender groups appear to have been more likely to increase their vote for Clinton once demographics were included. Additionally, counties that were heavily employed in manufacturing closer to the enactment of NAFTA swung their vote away from Clinton and may have decided the election.

To do this analysis, we combined county-level election results for the previous two elections, 2012 and 2016, with a number of characteristics of those counties, including race and gender, education, unemployment, and employment by industry (2-digit), for the most recent years available (most often 2015). We also include the percent of the county employed in manufacturing jobs for the year 2002 (the earliest year available at the county level) to assess whether a narrative about NAFTA and trade may have had a role in determining the outcome of the election. We further merge information on the counties previously covered by the Voting Rights Act (prior to the Shelby decision, 2013) to see what impact lifting the pre-clearance requirement may have had on the election.

As one might expect, there is a strong geographic component to the outcome of the election. The coasts strongly supported Clinton, while the center overwhelmingly supported Trump:



An interactive version of the maps presented here, as well as instructions on how to use them are available at the bottom of the post. There are subtle, but important differences between the geographic distribution of votes in these two elections. Notably, Democratic losses were concentrated in areas that were strongholds as recently as 2012:


What drove these differences? There’s no doubt that the results are, at least to some degree, consequence of an undertow of racism, sexism, and homophobia, that voters were able to exorcise from the privacy of the voting booth. It’s also true Hillary Clinton was also an historically unpopular candidate, exceeding only her rival in popularity among presidential candidates. But it also seems that the economically dispossessed were willing to overlook these flaws to support Trump. The table below reports the marginal effect that a one percent change in a set of covariates had on the support for Clinton relative to Obama. We measure this change in support as the percent in a county voting for Clinton in 2016 minus the percent in that county that voted for Obama in 2012. The covariates are all the same scale, between 0 and 100, meaning that a 10 percent increase in the unemployment rate in a county implies a 5 percent decrease in support for Clinton relative to Obama in 2012 (see the number corresponding to unemployment in the table below). We also use state fixed effects, meaning that these results are relative to the average change in the state.


A quick read of this table reveals some interesting, and potentially surprising statistics. Counties with higher percentages of Hispanic and Latino voters turned out for Clinton, while counties with higher unemployment aligned with Trump’s populist message. The African American vote did not seem to improve Clinton’s outcomes, and we discuss some causes for this below. As has already been widely reported, counties with higher percentages of white men were more likely to support Trump, relative to 2012, which is shown by the cross-term in row 3 (remember that each variable is 0 to 100, so the cross-term ranges from 0 to 10000, potentially). There is an important subtlety here: once county-level demographic and economic characteristics are controlled for, counties with white men actually increased their vote for Clinton relative to how they voted in 2012, for almost all the combinations of percent male and percent white in the dataset. However, the cross-term in row 3 indicates that as either the percent white or percent male in a county increased, the margin got smaller, suggesting that highly white or male counties were less likely to vote for Clinton than their more diverse peers. Still, for all but the most white counties in the dataset, our model would predict that they would increase their vote for Clinton, relative to Obama in 2012. The dichotomy between what we see in our dataset and what we observed in the election is that the places that were overwhelmingly white and changed their votes to Trump also have higher rates of unemployment and higher percentages of residents with a high school degree or less. Nationally, the distribution of white males is shown below (counties in gray did not have the relevant data):


This seems at least geographically consistent with the narrative that white men swung the election for Trump. Our interpretation is that covariates that might be strongly correlated with certain geographic regions, like unemployment and education, are strongly correlated with support for Trump. As shown above, those with a high school education or less strongly decreased their support for Clinton relative to their support for Obama in 2012. That distribution geographically is displayed here:


Again, it appears that these groups are concentrated in states that had a substantial impact on the election, though not as densely as one might expect. Somewhat surprisingly, repeating the analysis above with a variable that represents the percent of counties with white men with a high school education or less does not yield significant results, that is at least partially suggestive that the most common narrative following the election, that low-education white male voters swung states from Clinton to Trump isn’t consistent with the data. Again, this is probably because there is a strong correlation between these groups and other characteristics. The result that we find most interesting comes from the variable labeled “Percent Employed in Manufacturing (2002),” the earliest year for which employment by sector is available at the county level. This means that counties with higher percentages of their workers employed in manufacturing sectors in 2002 were substantially less likely to vote for Clinton than they were for Obama just four years before. This could of course simply be correlation, but it’s also possible that these workers still hold the Clinton’s responsible for declining job prospects as a result of NAFTA in 1994. Where were these industries located? See below:


We see that the percentage of individuals employed in manufacturing is fairly evenly distributed among states in the Midwest and the South. Remember that several key states in the election, Pennsylvania, Wisconsin, and Michigan, were decided by about 1.5 percent or less of the total vote, meaning that the shift in voting in these manufacturing heavy counties could have played a large role. Equally as important as the percent employed in manufacturing is the number of potential voters who were employed in these industries, and where they were located:


Unfortunately, many counties lack data on share of manufacturing from the 2002 data source. From the data we can obtain, counties that switched votes from Democrat to Republican, those in the Midwest had higher percentages of their workforce employed in manufacturing, and larger numbers employed in those industries as well. Furthermore, these industries were highly concentrated in the “Rust Belt,” the states closest to the Great Lakes. These states had been traditional Democratic strongholds, but swung to the Republicans for the first time in several elections. With this data, we can only conjecture about whether this was a cause, but it does appear that counties with jobs that were more likely to leave following the adoption of NAFTA shifted their votes in large quantities to the Republican ticket.

Another interesting and important narrative in this election is the removal of the Voting Rights Act as a protection against impeding voter participation. Could this also have played a role in swinging the election? Prior to Shelby County v. Holder (2013), which ruled the pre-clearance requirement unconstitutional, there were a number of jurisdictions under the purview of Section 5 of the Voting Rights Act (link). When we repeat the same exercise as before, predicting the percent change in Democratic support within a county between 2016 and 2012, we come to an interesting and perhaps counter-intuitive conclusion: support for Clinton was higher in previously covered counties than for Obama in 2012, at least as a percentage of those voting. Doing the same analysis as above, but including an indicator variable for counties that were covered by the Voting Rights Act yielded the following:


What this suggests is that voters in counties that had previously been under the protection of the Voting Rights Act increased their support for Clinton by 2 percent relative to 2012. Some of this could be a result of much negative rhetoric on the Republican side being targeted at the minority groups that were previously protected by the Voting Rights Act.


Note that this is not the difference in the total number of ballots cast for the two candidates in this election, but the change in the number of ballots cast in total between 2012 and 2016. Thus, Given that the average number of ballots cast in a county was around 40,000, this decrease in counties that had been covered by the VRA is substantial. Overall, the number of ballots cast increased by an average of about 1,000 per county between 2012 and 2016, suggesting that the turnout was substantially depressed in counties that were previous covered by the Voting Rights Act, though per our analysis, this didn’t seem to translate into a higher percentage of votes for the conservative on the ticket. And, at least graphically, it doesn’t seem like these differences could have swung the election:


Given the geographic location of these covered counties, it seems unlikely that it directly played a role in shaping the presidential election, though it may have impacted North Carolina, and probably did have an impact in down-ballot races.

It’s still not entirely clear what drove such an unexpected result, but we think that the narrative needs some clarification. Having delved into the data, it appears that a long-standing disaffection for free-trade may have driven a lot of Midwest voters to switch party allegiances they held as recently as 2012 and vote Republican. In places that determined the outcome of the election, states like Wisconsin, Ohio, Pennsylvania, and Michigan, a disproportionate number of people had been employed in industries (in 2002) that were most likely to be impacted by NAFTA.

Interactive Maps: To use these maps, click on the corresponding link. You will either automatically or be prompted to download an html document. After downloading this document, either double click or drag-and-drop into your internet browser. This will open the interactive data. This slightly convoluted process is because we cannot embed the graphics in WordPress.

Clinton Voting Distribution: link

Obama Voting Distribution: link

Change 2012 to 2016: link

Manufacturing (Percent): link

Manufacturing (Levels): link

Race and Gender: link

Education: link

Voting Rights Act: link

GDP Report Shows Modest Gains

By Thomas Cooley, Ben Griffy, and Peter Rupert

There has been a rash of recent data showing the U.S. economy growing stronger. Rising incomes,  stronger domestic demand, and rising net exports are signaling a more robust economy. The upward revisions to second quarter GDP support that view and the hope and reasonable expectation is that the third quarter will be stronger still setting the table for a Fed rate hike later this year. The third estimate for second quarter GDP growth, released this morning by the BLS (link) estimates that the US economy grew at an annualized rate of 1.4% in the second quarter, up from a previous estimate of 1.1%, and up from the first quarter, in which the rate was 0.8%.


These gains were largely accrued from increases in personal consumption expenditures, which increased at an annualized rate of 4.3%, exports, which grew at a rate of 1.8%, and fixed nonresidential investment, which grew at a rate of 1.0%. Offsetting these gains were declines in residential investment, of 7.7%, government spending (national fell 0.4%, while state and local fell 2.5%), and imports, which rose 0.2%. It should be noted that the levels of the categories that grew dwarf the levels of those that declined, leading to the overall increase in headline numbers.



As noted, personal consumption expenditures (PCE) played the largest role in GDP growth last quarter. This bucked a trend over the previous four quarters in which percent growth in the component had fallen, while still remaining positive. Growth in the component more than doubled over the Q1 figure, and was nearly at its highest rate of growth since the Great Recession. This is a strong indication that US consumers are confident in the direction of the US economy. For comparison, many European economies have not seen consumption growth of 4.3% in total since the beginning of the Great Recession:Real Consumption-13.png

Which shows the importance that the US economy has as a global driver of growth. In total, changes in PCE would have caused a 2.88% increase in GDP, all else equal.



Residential and non-residential investment swapped growth trends, with non-residential investment ticking positive for the first time in three quarters, and residential investment turning negative for the first time since 2013Q4. Nonresidential investment was driven up by increases in expenditures on industrial equipment, and intellectual property products, with the latter increasing by 9.0% at an annualized rate. Overall, Gross Private Domestic Investment caused a decline in GDP of 1.34%, if all other components were held constant.

The report should be taken with cautious optimism that the US has absorbed some of the global economic uncertainty in stride. Even more cause for optimism is that GDP growth would have been 1.7% annualized if the contribution of government spending had been removed, a highly variable series that has limited forecasting ability on the health of the economy, particularly in an election year. These figures are perhaps inauspicious in comparison to the halcyon heights of US economic history, but in the context of our times, they continue to reflect the strength of the US economy in period of great uncertainty.