Valuing ML use cases based on scale
This week we are jumping right into valuing ML use cases based on scale. You will find that most of the time scale is everything when it comes to adding value. Being able to quickly scale up your machine learning solution to provide value in a workflow is generally where the benefits begin to accelerate.[1] Very few examples exist in the wild where only a handful of interactions are going to bring the majority of value. Those might be use cases where machine learning might not be ready to handle such a critical part of your business. Sometimes applying machine learning to a use case is not a viable path to success.[2] You have to evaluate and invest in machine learning use cases with high potential return on investment. The value derived from the work being undertaken is going to directly line up with ROI.
Conceptually I have been breaking down the categories of ML use cases into three buckets:
“Things you can call” e.g. external API services
“Places you can be” e.g. ecosystems where you can build out your footprint (AWS, GCP, Azure, and many others that are springing up for MLOps delivery)
“Building something yourself” e.g. open source and self tooled solutions
That categorization helps me think about where things for the use case are going to happen. It is a very tactical question vs. a strategic one. Each of the three buckets will probably receive more coverage later in this series. In terms of API services, you can scale very quickly. You can go out right now and get access to several API machine learning solutions on AWS, Azure, or GCP without any real barriers. The level of friction to enter that external technology driven use case is relatively low. Seriously these solutions are really low friction options with highly scalable infrastructure on the back end. People are using them all over the place right now.[3]
A fork in the road exists on this one where valuing ML use cases becomes a lit bit of a split set of use cases in the machine learning world. You have a ton of options that are ready to go out of the box with a little bit of configuration and setup that are ready to go based on models that are served up on a cloud API endpoint. Sure you have to get connected and you need to have a business relationship, but for the most part they are the easiest way to go about getting started in the machine learning space at scale. Those solutions are not highly customized deployments and they have a certain amount of per transaction cost baked into them from the start.
Alternatively, you can build out and serve your own models without going to a cloud based API machine learning solution. That means you have a natural conflict between out of the box solutions that are ready to go and scale and highly customized deployments that require a certain degree of initial and ongoing investments.
Links and thoughts:
You can now watch the Stanford University EE104 class videos for “Introduction to Machine Learning”
If that set of videos was not enough machine learning joy for you to watch, then maybe consider this 80 video collection from Cornell Tech CS 5787 (Fall 2020) https://www.youtube.com/playlist?list=PL2UML_KCiC0UlY7iCQDSiGDMovaupqc83
I have been digging into general ML use cases and produced the below chart. I’m going to update it a few times this year so you may see it again in a few weeks. Feel free to let me know if something obvious got missed or if you don’t agree with the placement between scale and maturity. All feedback is appreciated when it comes to chart evaluation and evolution.
I watched maybe 30 minutes of “NVIDIA gets Destroyed - WAN Show March 19, 2021” from Linus Tech Tips
I watched this Machine Learning Street talk episode with Andy Smith
If those links were not enough content for you, then go check out what is trending on GitHub right now… https://github.com/trending (this is a great place to get a sense for what is actively being worked in the open source community)
With a few hours of free time blocked off this week I started this new course “ML Pipelines on Google Cloud” and finished up week 1 https://www.coursera.org/learn/ml-pipelines-google-cloud/home/welcome
Top 5 Tweets of the week:








MLOps Spotlight #2:
Company: Open Source Project
Product: Kubeflow
Website: https://www.kubeflow.org/
Offering: This is an open source community driven project related to ML stack management on Kubernetes. The pipelines repository is super interesting. The base toolkit has over 10,000 GitHub stars. People are using this tool.
Footnotes:
[1] Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners by Jared Dean (2014)
[2] When not to use machine learning or AI (Towards Data Science) this one is by Cassie Kozyrkov who shares some great content on decision making here: https://decision.substack.com/
[3] Cloud API Market to Reach $1.78 Billion by 2026 (Wired Release)
What’s next for The Lindahl Letter?
Week 10: Model extensibility for few shot GPT-2
Week 11: What is ML scale? The where and the when of ML usage
Week 12: Confounding within multiple ML model deployments
Week 13: Building out your ML Ops
Week 14: My Ai4 Healthcare NYC 2019 talk revisited
I’ll try to keep the what’s next list forward looking with at least five weeks of posts in planning or review. If you enjoyed reading this content, then please take a moment and share it with a friend.