Doing machine learning work

Jun 18, 2021

Maybe it is time to get a little dangerous and start searching Google for, “Why do machine learning projects fail?” I’m sure that is going to cause all sorts of odd advertisements to come my way here in the next few days. You will no doubt get pushed toward a study from Gartner on the subject.[1] Perhaps the detail you should try to focus on from that one is that the press release came out in 2018. The reason people are still focused on it after three years is this one quote, “Gartner predicts that through 2022, 85 percent of AI projects will deliver erroneous outcomes due to bias in data, algorithms or the teams responsible for managing them.” Maybe for rather obvious reasons that one metric has remained in the forefront of people’s thoughts on machine learning projects.[2] One of the things that has happened over the years in my opinion is that two divergent trends are occurring. First, the practice where organizations stand up large machine learning divisions is evaporating. Second, the number of outside practices has grown rapidly. You can see this in the number of startups working in the machine learning space and the number of major organizations that are offering teams of consultants to help.

I’m still sitting here reading articles about why machine learning projects fail.[3] For the most part they all sort of blend together into a few common observations: production data is dirty, models are not dynamic, and operationalization requires investment for sustainment. This is one of the reasons that API delivered machine learning solutions tend to outlast the rest of deployments in the wild. Major companies that are caring and feeding for the API models they are serving up are providing both the sustainment of operations and ongoing maintenance as a service embedded in the price of the API transactions. Those efforts are for the most part very specific to the thing you are trying to accomplish with the API call and a lot of machine learning use cases work out well enough that way.

Based on my research an audience of about 15,000 to 20,000 people are actively consuming machine learning content online. You could go look at the average views of Yannic Kilcher on YouTube and get a sense for the audience size.[4] A potentially larger audience of academics and practitioners are publishing papers at record numbers. Let’s just consider the ICML conference paper submission number from 2021 was 5,513 vs. 1,037 in 2015.[5] That can give you a line of sight into active academic work and professional endeavors. You mix in the next layer of the audience and all of a sudden you are up to about the previously mentioned number. Consider that when you start to wonder about doing machine learning work and the overall audience of people really working within the space. You are starting to get a lot more people who are adjacent to machine learning where they have elements of it or connections to it in applications and services being provided. That localizes the actual team and spreads out the sales and hype a few layers within organizations.

Go check out: https://discuss.tensorflow.org/ more on that later…

Links and thoughts:

All right this one from Yannic Kilcher is pretty interesting “Efficient and Modular Implicit Differentiation (Machine Learning Research Paper Explained)”

I watched about 30 minutes of the WAN show with Linus and Luke. This is my guilty pleasure computer hardware content each week. I’m not sure the show has been improving over time, but I keep watching it

If you want to hear William Shatner read Ray Bradbury, then here you go…

Top 5 Tweets of the week:

Dieter Bohn @backlon

Vergecast! @alexhcranz and @chriswelch join @reckless and I to talk all things WWDC, Spatial Audio, and also the correct place to put the dock.

theverge.comVergecast WWDC: iOS 15, spatial audio, macOS Monterey, and moreThe Vergecast crew discusses all of the announcements from Apple’s WWDC this week — from a preview of iOS 15 to spatial audio on Apple Music to a new drag-and-drop feature between the Mac and the iPad.

Dr. Nels Lindahl @nelslindahl

I'm not sold on this argument... @VentureBeat: @DeepMind says reinforcement learning is ‘enough’ to reach general AI. venturebeat.com/2021/06/09/dee… via @googlenews

venturebeat.comDeepMind says reinforcement learning is ‘enough’ to reach general AIScientists at U.K.-based AI lab DeepMind argue true artificial intelligence will emerge from sticking to the principle of reward maximization.

Dr. Nels Lindahl @nelslindahl

I'm going to admit this is one story that I simply did not want to read, but did anyway... From @verge: Ten years of data breaches: LinkedIn, Dropbox, Facebook, and more. theverge.com/22518557/data-… via @googlenews

theverge.comTen years of breaches in one imageDrawing on data from the “Have I Been Pwned” service, we’ve mapped out 10 years of breaches, including prominent companies like Dropbox, LinkedIn, and even Facebook.

Dr. Nels Lindahl @nelslindahl

Interesting... The Verge: The next generation of startups is remote. theverge.com/22522731/decod… via @googlenews

theverge.comThe next generation of startups is remoteY Combinator’s managing director on guiding startups during a pandemic, diversifying founders, and when to critique platforms.

Dr. Nels Lindahl @nelslindahl

The Guardian: Microsoft’s Kate Crawford: ‘AI is neither artificial nor intelligent’. theguardian.com/technology/202… via @googlenews

theguardian.comMicrosoft’s Kate Crawford: ‘AI is neither artificial nor intelligent’The AI researcher on how natural resources and human labour drive machine learning and the regressive stereotypes that are baked into its algorithms

Footnotes:

[1] Here is that Gartner Press Release https://www.gartner.com/en/newsroom/press-releases/2018-02-13-gartner-says-nearly-half-of-cios-are-planning-to-deploy-artificial-intelligence

[2] https://iiot-world.com/industrial-iot/connected-industry/why-85-of-machine-learning-projects-fail/ and a lot of other examples exist.

[3] https://www.kdnuggets.com/2021/01/top-5-reasons-why-machine-learning-projects-fail.html

[4] https://www.youtube.com/c/YannicKilcher/videos sorted by newest videos… look at the counts…

[5] https://github.com/lixin4ever/Conference-Acceptance-Rate

What’s next for The Lindahl Letter?

Week 22: Machine learning graphics
Week 23: Fairness and machine learning
Week 24: Evaluating machine learning
Week 25: Teaching kids ML
Week 26: Machine learning as a service

I’ll try to keep the what’s next list forward looking with at least five weeks of posts in planning or review. If you enjoyed reading this content, then please take a moment and share it with a friend.

The Lindahl Letter

Doing machine learning work

Discussion about this post