Doing machine learning work
Maybe it is time to get a little dangerous and start searching Google for, “Why do machine learning projects fail?” I’m sure that is going to cause all sorts of odd advertisements to come my way here in the next few days. You will no doubt get pushed toward a study from Gartner on the subject.[1] Perhaps the detail you should try to focus on from that one is that the press release came out in 2018. The reason people are still focused on it after three years is this one quote, “Gartner predicts that through 2022, 85 percent of AI projects will deliver erroneous outcomes due to bias in data, algorithms or the teams responsible for managing them.” Maybe for rather obvious reasons that one metric has remained in the forefront of people’s thoughts on machine learning projects.[2] One of the things that has happened over the years in my opinion is that two divergent trends are occurring. First, the practice where organizations stand up large machine learning divisions is evaporating. Second, the number of outside practices has grown rapidly. You can see this in the number of startups working in the machine learning space and the number of major organizations that are offering teams of consultants to help.
I’m still sitting here reading articles about why machine learning projects fail.[3] For the most part they all sort of blend together into a few common observations: production data is dirty, models are not dynamic, and operationalization requires investment for sustainment. This is one of the reasons that API delivered machine learning solutions tend to outlast the rest of deployments in the wild. Major companies that are caring and feeding for the API models they are serving up are providing both the sustainment of operations and ongoing maintenance as a service embedded in the price of the API transactions. Those efforts are for the most part very specific to the thing you are trying to accomplish with the API call and a lot of machine learning use cases work out well enough that way.
Based on my research an audience of about 15,000 to 20,000 people are actively consuming machine learning content online. You could go look at the average views of Yannic Kilcher on YouTube and get a sense for the audience size.[4] A potentially larger audience of academics and practitioners are publishing papers at record numbers. Let’s just consider the ICML conference paper submission number from 2021 was 5,513 vs. 1,037 in 2015.[5] That can give you a line of sight into active academic work and professional endeavors. You mix in the next layer of the audience and all of a sudden you are up to about the previously mentioned number. Consider that when you start to wonder about doing machine learning work and the overall audience of people really working within the space. You are starting to get a lot more people who are adjacent to machine learning where they have elements of it or connections to it in applications and services being provided. That localizes the actual team and spreads out the sales and hype a few layers within organizations.
Go check out: https://discuss.tensorflow.org/ more on that later…
Links and thoughts:
All right this one from Yannic Kilcher is pretty interesting “Efficient and Modular Implicit Differentiation (Machine Learning Research Paper Explained)”
I watched about 30 minutes of the WAN show with Linus and Luke. This is my guilty pleasure computer hardware content each week. I’m not sure the show has been improving over time, but I keep watching it
If you want to hear William Shatner read Ray Bradbury, then here you go…
Top 5 Tweets of the week:










Footnotes:
[1] Here is that Gartner Press Release https://www.gartner.com/en/newsroom/press-releases/2018-02-13-gartner-says-nearly-half-of-cios-are-planning-to-deploy-artificial-intelligence
[2] https://iiot-world.com/industrial-iot/connected-industry/why-85-of-machine-learning-projects-fail/ and a lot of other examples exist.
[3] https://www.kdnuggets.com/2021/01/top-5-reasons-why-machine-learning-projects-fail.html
[4] https://www.youtube.com/c/YannicKilcher/videos sorted by newest videos… look at the counts…
[5] https://github.com/lixin4ever/Conference-Acceptance-Rate
What’s next for The Lindahl Letter?
Week 22: Machine learning graphics
Week 23: Fairness and machine learning
Week 24: Evaluating machine learning
Week 25: Teaching kids ML
Week 26: Machine learning as a service
I’ll try to keep the what’s next list forward looking with at least five weeks of posts in planning or review. If you enjoyed reading this content, then please take a moment and share it with a friend.