Last week we started to mess around with some methods of doing sentiment analysis and setting up some frameworks to work on that type of effort. This week we take a little different approach and are going to look at an election model. I’m actively working on election focused prompt based training for large language models for better predictions. Right now I have access to Bard, ChatGPT, and Llama 2 to complete that training. Completing that type of training requires feeding election models in written form as a prompt for replication. I have been including the source data and written out logic as a part of the prompt as well.
Party registration drives the signal. Everything else is noise. That is what I expected to see within this model. It was the headline that could have been, but sadly could not be written. It turns out that this hypothesis could be tested. You can pretty easily try to view the results as a March Madness college basketball style bracket. Accepting that chalk happens or to be put more bluntly the higher ranked seeds normally win. Within the NCAA tournament things are more sporting and sometimes major upsets occur. Brackets are always getting busted. That is probably why they have ended up branding it as March Madness. Partisan politics are very different in terms of the chalk being a lot more consistent. Sentiment can change over time and sometimes voter registration does not accurately predict the outcome.
We are going to move into the hypothesis testing part of the process. This model accepts a bi-model two party representation of political parties with an assumption that generally the other parties are irrelevant to predicting the outcome. The chalk model for predicting elections based on registration reads like this, the predicted winner = max{D,R} where D = registered democrats and R = registered republicans at the time of election. For example, the State of Colorado in December of 2020 that would equate to the max{1127654,1025921} where registered Democrats outnumber registered Republicans [1]. This equation accurately predicted the results of the State of Colorado during the 2020 presidential election. 30 states report voter statistics by party with accessible 2020 archives. Using the power of hindsight we can test the chalk model for predicting elections against the results of the 2020 presidential elections.
Several internet searches were performed using Google with the search, “(state name) voter registration by party 2020.” Links to the referenced data are provided for replication and or verification of the data. Be prepared to spend a little time completing a verification effort as searching out the registered voter metric for each of the states took about 3 hours of total effort. It will go much faster if you use the links compared to redoing the search from scratch. Data from November of 2020 was selected when possible. Outside of that the best fit of the data being offered was used.
Alaska max{78664,142266}, predicted R victory accurately [2]
Arizona max{1378324,1508778}, predicted R victory in error [3]
California max{10170317,5334323}, predicted D victory accurately [5]
Colorado max{1127654,1025921}, predicted D victory accurately [6]
Connecticut max{850083,480033}, predicted D victory accurately [7]
Delaware max{353659,206526}, predicted D victory accurately [8]
Florida max{5315954,5218739}, predicted D victory in error [9] * The data here might have been lagging to actual by 2021 it would have been accurate at max{5080697,5123799}, predicting R victory
Idaho max{141842,532049}, predicted R victory accurately [10]
Iowa max{699001,719591}, predicted R victory accurately [11]
Kansas max{523317,883988}, predicted R victory accurately [12]
Kentucky max{1670574,1578612}, predicted D victory in error [13] * The data here might have been lagging to actual voter sentiment. The June 2023 numbers flipped max{1529360,1593476}
Louisiana max{1257863,1020085}, predicted D victory in error [14,15]
Maine max{405087,321935}, predicted D victory accurately [16]
Maryland max{2294757,1033832}, predicted D victory accurately [17]
Massachusetts max{1534549,476480}, predicted D victory accurately [18]
Nebraska max{370494,606759}, predicted R victory accurately [19]
Nevada max{689025,448083}, predicted D victory accurately [20]
New Hampshire max{347828,333165}, predicted D victory accurately [21]
New Jersey max{2524164,1445074}, predicted D victory accurately [22]
New Mexico max{611464,425616}, predicted D victory accurately [23]
New York max{6811659,2965451}, predicted D victory accurately [24]
North Carolina max{2627171,2237936}, predicted D victory in error [25,26]
Oklahoma max{750669,1129771}, predicted R victory accurately [27]
Oregon max{1043175,750718}, predicted D victory accurately [28]
Pennsylvania max{4228888,3543070}, predicted D victory accurately [29]
Rhode Island max{327791,105780}, predicted D victory accurately [30]
South Dakota max{158829,277788}, predicted R victory accurately [31]
Utah max{250757,882172}, predicted R victory accurately [32]
West Virginia max{480786,415357}, predicted D victory in error [33]
Wyoming max{48067,184698}, predicted R victory accurately [34]
This model predicting a winner with the max(D,R) ended up with incorrect prediction outcomes in 6 states during the 2020 presidential election cycle including Arizona, Florida, Kentucky, Louisiana, North Carolina, and West Virginia. 5 of these states based on voter registration data should have yielded D victory, but did not perform that way in practice. Arizona worked the other direction. Some of these states clearly have shifted voter registration and I have added some notes to show those changes in Kentucky and Florida. It is possible that in both of those states voter registration was a lagging indicator compared to the sentiment of votes cast. The chalk model for predicting elections ended up being 24/30 or 80% accurate.
You can imagine that I was expecting to see a much more accurate prediction of elections out of this chalk model. Again, calling back to that March Madness and thinking about what it means to have a clear path to victory for registered voters, but it not working out that way. So, that is why we tested this hypothesis of the chalk model. You can obviously see here that it is accurate most of the time, but not all the time. It’s something that we will continue to dig into as I look at some other models and I do some other tests with voter data while we are looking at elections and how they intersect with AI/ML. The next step here would be to see if a model can be developed with enough agency through plugins to be able to conduct this effort without intervention in an automated way based on a single prompt.
Footnotes:
[1] https://www.sos.state.co.us/pubs/elections/VoterRegNumbers/2020/December/VotersByPartyStatus.pdf or https://www.sos.state.co.us/pubs/elections/VoterRegNumbers/2020VoterRegNumbers.html
[3] https://azsos.gov/sites/default/files/State_Voter_Registration_2020_General.pdf
[4] https://azsos.gov/elections/results-data/voter-registration-statistics
[5] https://elections.cdn.sos.ca.gov/ror/15day-gen-2020/county.pdf
[6] https://www.sos.state.co.us/pubs/elections/VoterRegNumbers/2020/December/VotersByPartyStatus.pdf
[8] https://elections.delaware.gov/reports/e70r2601pty_20201101.shtml
[10] https://sos.idaho.gov/elections-division/voter-registration-totals/
[11] https://sos.iowa.gov/elections/pdf/VRStatsArchive/2020/CoNov20.pdf
[12] https://sos.ks.gov/elections/22elec/2022-11-01-Voter-Registration-Numbers-by-County.pdf
[13] https://elect.ky.gov/Resources/Pages/Registration-Statistics.aspx
[14] https://www.sos.la.gov/ElectionsAndVoting/Pages/RegistrationStatisticsStatewide.aspx
[15] https://electionstatistics.sos.la.gov/Data/Registration_Statistics/statewide/2020_1101_sta_comb.pdf
[16] https://www.maine.gov/sos/cec/elec/data/data-pdf/r-e-active1120.pdf
[17] https://elections.maryland.gov/pdf/vrar/2020_11.pdf
[18] https://www.sec.state.ma.us/divisions/elections/download/registration/enrollment_count_20201024.pdf
[20] https://www.nvsos.gov/sos/elections/voters/2020-statistics
[24] https://www.elections.ny.gov/EnrollmentCounty.html
[25] https://vt.ncsbe.gov/RegStat/
[26] https://vt.ncsbe.gov/RegStat/Results/?date=11%2F14%2F2020
[28] https://sos.oregon.gov/elections/Documents/registration/2020-september.pdf
[30] https://datahub.sos.ri.gov/RegisteredVoter.aspx
[32] https://vote.utah.gov/current-voter-registration-statistics/
[33] https://sos.wv.gov/elections/Documents/VoterRegistrationTotals/2020/Feb2020.pdf
[34] https://sos.wyo.gov/Elections/Docs/VRStats/2020VR_stats.pdf
What’s next for The Lindahl Letter?
Week 135: Polling aggregation
Week 136: Econometric models
Week 137: Time-series analysis
Week 138: Prediction markets
Week 139: Machine learning election models
If you enjoyed this content, then please take a moment and share it with a friend. If you are new to The Lindahl Letter, then please consider subscribing. New editions arrive every Friday. Thank you and enjoy the week ahead.
Share this post