arXiv.org’s 10 biggest breakout papers
Prompted by a recent Stanford HAI report on the limits of agentic AI teamwork, this edition explores the raw power of open-access research. We trace the massive growth of arXiv.org and its breakouts
Thank you for reading this edition of the Lindahl Letter publication. This week the topic under consideration for the Lindahl Letter is, “arXiv.org’s 10 biggest breakout papers.”
Earlier this week, I was reading a recent paper about CooperBench by Hao Zhu and other authors. It was announced by the Stanford University Human-Centered Artificial Intelligence (HAI) team with the note that “AI Coding Agents Fail at Teamwork: Two models working together perform worse than one alone, exposing a critical gap in artificial intelligence capabilities” [1]. Yeah, that is a pretty interesting byline that is probably going to capture your attention. It certainly caught mine. The lengthy 33-page paper can be found here:
Khatua, A., Zhu, H., Tran, P., Prabhudesai, A., Sadrieh, F., Lieberwirth, J. K., Yu, X., Fu, Y., Ryan, M. J., Pei, J., & Yang, D. (2026). CooperBench: Why coding agents cannot be your teammates yet (arXiv:2601.13295). arXiv. https://doi.org/10.48550/arXiv.2601.13295
This paper actually references one of the 10 biggest breakout papers you will see below, called “Attention Is All You Need,” which, based on the alphabetical order I used, you will find at the end of the list if you want to skip ahead.
Ok, let’s zoom back out to the 30,000-foot view of what we are evaluating today. Professor Paul Ginsparg moved to Cornell University back in 2001, and arXiv moved at the same time, which set the stage for how the preprint paper repository has operated until the upcoming shift to becoming a nonprofit on July 1, 2026. That shift is very interesting, and we will look at it more in a future Lindahl Letter. They shared this note in that announcement: “arXiv now hosts over 2.9 million scholarly articles across eight subject areas. It received 284,486 submissions in 2025, representing a 17% year-over-year increase. Since surpassing the 2 million-article threshold in 2022, arXiv has seen nearly a 50% increase, with total submissions now just under 3 million” [2]. This week, I started thinking about which of those 3 million papers broke out after submission to arXiv and ended up being widely read and influential.
Seriously, the idea of being able to share preprints and make research generally available is amazing. It changed the dynamic from journal paywalls to easy access and publication. Keep in mind that for any academic paper to be cited more than 1,000 times means it was widely read and that scholars are reacting to it by citing and amplifying it in their own papers. That is a powerful signal within the academic community. I’m still not allowed to submit directly to arXiv, but that is partly a function of being a pracademic and not working on a publication with somebody at an institution or company where being part of arXiv involves very little friction. At this point, to publish something, I would have to partner with somebody who already has access to the club so that I could be included. Maybe that will happen one day. It’s certainly possible.
Here are the 10 breakout papers in alphabetical order and the associated links to them and citation counts.
Abbott, B. P., Abbott, R., Abbott, T. D., Abernathy, M. R., Acernese, F., Ackley, K., Adams, C., Adams, T., Addesso, P., Adhikari, R. X., Adya, V. B., Affeldt, C., Agathos, M., Agatsuma, K., Aggarwal, N., Aguiar, O. D., Aiello, L., Ain, A., Ajith, P., Allen, B., Allocca, A., Altin, P. A., Anderson, S. B., Anderson, W. G., & Arai, K. (2016). Observation of gravitational waves from a binary black hole merger (arXiv:1602.03837). Physical Review Letters, 116(6), Article 061102. https://doi.org/10.1103/PhysRevLett.116.061102 (Citations: 21,490+)
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding (arXiv:1810.04805). arXiv. https://doi.org/10.48550/arXiv.1810.04805 (Citations: 171,114+)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2020). Generative adversarial networks (arXiv:1406.2661). Communications of the ACM, 63(11), 139–144. https://doi.org/10.1145/3422622 (Citations: 95,900+)
He, K., Zhang, X., Shao, S., & Sun, J. (2015). Deep residual learning for image recognition (arXiv:1512.03385). arXiv. https://doi.org/10.48550/arXiv.1512.03385 (Citations: 323,725+)
Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network (arXiv:1503.02531). arXiv. https://doi.org/10.48550/arXiv.1503.02531 (Citations: 32,722+)
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization (arXiv:1412.6980). arXiv. https://doi.org/10.48550/arXiv.1412.6980 (Citations: 245,721+)
Maldacena, J. (1998). The large N limit of superconformal field theories and supergravity (arXiv:hep-th/9711200). Advances in Theoretical and Mathematical Physics, 2(2), 231–252. https://doi.org/10.4310/atmp.1998.v2.n2.a1 (Citations: 28,856+)
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space (arXiv:1301.3781). arXiv. https://doi.org/10.48550/arXiv.1301.3781 (Citations: 53,640+)
Perelman, G. (2002). The entropy formula for the Ricci flow and its geometric applications (arXiv:math/0211159). arXiv. https://doi.org/10.48550/arXiv.math/0211159 (Citations: ~1,500+)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need (arXiv:1706.03762). arXiv. https://doi.org/10.48550/arXiv.1706.03762 (Citations: 251,931+)
This view of breakout arXiv papers is in stark contrast to the essays shared during the last Lindahl Letter.
It’s always interesting to view contemporary writing alongside academic work to see the contrast between what somebody can openly write and share and what appears in a more rigorous, fully cited work. My guess is that some of those observations from the essays of Altman and Amodei will end up being quoted in papers throughout the next decade.
Did I miss your favorite arXiv preprint paper? What other papers would you expect to see in this list? Feel free to reach out and let me know with papers I need to add to my breakout paper reading list.
What’s next for the Lindahl Letter? New editions arrive every Friday. If you are still reading at this point and enjoyed this content, please take a moment to share it with a friend. If you are new to the Lindahl Letter, please consider subscribing. Make sure to stay curious, stay informed, and enjoy the week ahead!
Footnotes:
[1] https://hai.stanford.edu/news/ai-coding-agents-fail-at-teamwork

