The increasingly synthetic internet

The Lindahl Letter

0:00

-5:26

The increasingly synthetic internet

Dr. Nels Lindahl

Oct 11, 2024

Transcript

We are facing a new reality where the tipping point has been passed and a majority of content being generated online within the internet is now synthetic. Organic content generation simply cannot keep up with the flood of synthetic content. Those bot farms creating content never sleep. They just churn out content and pretend that it remains evergreen. My consideration on this topic started back on Wednesday, July 31, 2024, when I was invited to access SearchGPT from OpenAI [1]. Using that platform made me think a lot about how we access information and the ways that will change going forward. People are now getting summaries and completing searches that go beyond Googling something. You are probably well aware by now that I’m deeply concerned about how facts and knowledge are going to be stored and curated going forward.

Going forward whoever owns the stores of facts or knowledge will effectively own history and how it is presented which is truly a watershed change in our shared understanding of the world. Individual voices and publications will be overshadowed by these collections. Owning the datastores that provide definitive facts or knowledge will be the cornerstone of whatever emerges going forward and should not be underestimated in terms of future value. I don’t think the ownership of facts will become commoditized and open sourced. I really do think it will be privatized and tightly controlled. Somebody who wanted to pivot our understanding on a particular point of inquiry could just start serving up that alternative perspective. Instead of people funding think tanks to ultimately change the messaging the next step will be funding content farms to just flood the message delivery. Keep in mind that people generally are not really reading books anymore [2]. That means that reasoning during the course of interpreting information may be a diminishing skillset.

Organically written original content exists online. Synthetically generated content has been on the rise. A lot of bots are scraping content for model training and trying to figure out what is organic and what is synthetic has become increasingly difficult. One of the hallmarks of my writing efforts has been the originality or maybe novelty of my research efforts. Within the broader context of the academy of academic thought, original contributions are what build that content and strengthen it overall. Diluting, derivative, and otherwise mediocre publications just flood the overall academic community. It’s perfectly fine to write a publication and decide it was not a significant contribution. Instead of maybe holding back those lesser works they are now freely shared in online archives and unfortunately a new generation of journals. The increase in AI related publications has been astonishing from 2010 to 2022 the number of publications nearly tripled [3].

Now we have a mix of the poorly written articles mixing with the synthetically generated to create a truly problematic future of consuming content. I’m considering web traffic at the moment, but the overall storage of facts and knowledge is certainly in scope. We reached the tipping point around 2016 where more traffic is mobile traffic than from a desktop browser [4]. OpenAI has now launched SearchGPT and beyond the dichotomy between mobile and desktop traffic we are about to see the rise of LLM interpreted results where people may never actually leave the landing page or interface of the search engine. It’s possible that dichotomy will fade away and the majority of traffic will be from bots scraping things to share within the newly powered search interfaces. People may very well interact with the grand volume of online information from applications using APIs to respond that are completely disconnected from what was the open internet people surfed and experienced. From reading the thoughts of a single writer to interpreting the output of the largest language models ever created. Things are changing at an incredibly rapid pace.

Now it’s time for a brief editorial note. Please note that my writing output over the last few years became over indexed on artificial intelligence and machine learning. Going down that rabbit hole was good at first and it was an effort truly focused on depth and breadth within the subject. Unfortunately, my focus lingered and instead of writing research notes about technological innovation, civil society, and the intersection of technology and modernity that pesky over indexing occurred. Now thanks to a moment of reflective practitioning I’m breaking out of that pattern and returning to what I consider a better balance of writing topics. Thank you for being along for that journey and the upcoming course correction.

Footnotes:

[1] https://chatgpt.com/search

[2] https://www.pewresearch.org/short-reads/2021/09/21/who-doesnt-read-books-in-america/

[3] https://arxiv.org/abs/2405.19522 or https://aiindex.stanford.edu/wp-content/uploads/2024/04/HAI_AI-Index-Report-2024_Chapter1.pdf

[4] “>50% of web traffic comes from mobile.” Google Analytics Data, U.S., Q1 2016. https://www.thinkwithgoogle.com/marketing-strategies/app-and-mobile/mobile-web-traffic-statistics/

What’s next for The Lindahl Letter? I’m just going to sit down and write weekly or more likely bi-weekly so the topic won’t be planned out 5 weeks ahead of publication.

If you enjoyed this content, then please take a moment and share it with a friend. If you are new to The Lindahl Letter, then please consider subscribing. Stay curious, stay informed, and enjoy the week ahead!

The Lindahl Letter

The increasingly synthetic internet

Discussion about this episode