Valuing ML use cases based on scale

Mar 26, 2021

This week we are jumping right into valuing ML use cases based on scale. You will find that most of the time scale is everything when it comes to adding value. Being able to quickly scale up your machine learning solution to provide value in a workflow is generally where the benefits begin to accelerate.[1] Very few examples exist in the wild where only a handful of interactions are going to bring the majority of value. Those might be use cases where machine learning might not be ready to handle such a critical part of your business. Sometimes applying machine learning to a use case is not a viable path to success.[2] You have to evaluate and invest in machine learning use cases with high potential return on investment. The value derived from the work being undertaken is going to directly line up with ROI.

Conceptually I have been breaking down the categories of ML use cases into three buckets:

“Things you can call” e.g. external API services
“Places you can be” e.g. ecosystems where you can build out your footprint (AWS, GCP, Azure, and many others that are springing up for MLOps delivery)
“Building something yourself” e.g. open source and self tooled solutions

That categorization helps me think about where things for the use case are going to happen. It is a very tactical question vs. a strategic one. Each of the three buckets will probably receive more coverage later in this series. In terms of API services, you can scale very quickly. You can go out right now and get access to several API machine learning solutions on AWS, Azure, or GCP without any real barriers. The level of friction to enter that external technology driven use case is relatively low. Seriously these solutions are really low friction options with highly scalable infrastructure on the back end. People are using them all over the place right now.[3]

A fork in the road exists on this one where valuing ML use cases becomes a lit bit of a split set of use cases in the machine learning world. You have a ton of options that are ready to go out of the box with a little bit of configuration and setup that are ready to go based on models that are served up on a cloud API endpoint. Sure you have to get connected and you need to have a business relationship, but for the most part they are the easiest way to go about getting started in the machine learning space at scale. Those solutions are not highly customized deployments and they have a certain amount of per transaction cost baked into them from the start.

Alternatively, you can build out and serve your own models without going to a cloud based API machine learning solution. That means you have a natural conflict between out of the box solutions that are ready to go and scale and highly customized deployments that require a certain degree of initial and ongoing investments.

Links and thoughts:

You can now watch the Stanford University EE104 class videos for “Introduction to Machine Learning”

If that set of videos was not enough machine learning joy for you to watch, then maybe consider this 80 video collection from Cornell Tech CS 5787 (Fall 2020) https://www.youtube.com/playlist?list=PL2UML_KCiC0UlY7iCQDSiGDMovaupqc83

I have been digging into general ML use cases and produced the below chart. I’m going to update it a few times this year so you may see it again in a few weeks. Feel free to let me know if something obvious got missed or if you don’t agree with the placement between scale and maturity. All feedback is appreciated when it comes to chart evaluation and evolution.

I watched maybe 30 minutes of “NVIDIA gets Destroyed - WAN Show March 19, 2021” from Linus Tech Tips

I watched this Machine Learning Street talk episode with Andy Smith

If those links were not enough content for you, then go check out what is trending on GitHub right now… https://github.com/trending (this is a great place to get a sense for what is actively being worked in the open source community)

With a few hours of free time blocked off this week I started this new course “ML Pipelines on Google Cloud” and finished up week 1 https://www.coursera.org/learn/ml-pipelines-google-cloud/home/welcome

Top 5 Tweets of the week:

Rob May @robmay

Episode 3 of "Investing in AI" is out. Hear @jamescham from @BloombergBeta call me out for not accepting the "AI is the new electricity" argument.

buzzsprout.comInvesting in AI Episode 3: James Cham - Bloomberg Beta - Investing In AIIn this episode I interview James Cham from Bloomberg Beta. We discuss the book “The Man Who Lied to His Laptop” and the book “Prediction Machines” about the economics of AI. We discuss how startups compete with big companies when it comes to AI...

Casey Newton @CaseyNewton

Kim is one of the very best cybersecurity reporters in the game. This was an instant annual subscription for me. Check it out —>

Kim Zetter @KimZetter

I've launched a substack called Zero Day, focused on spies, hackers and the intersection between cybersecurity/national security. I'll still write for other pubs but I hope you'll subscribe to see what I do. Here's my 1st story w/ 🙏 to @k8em0 & @daveaitel https://t.co/6Kgh9GI5xS

Dr. Nels Lindahl @nelslindahl

This is a really interesting look at how a lot of people can try to do something really well and the outcome for the vast majority 99% was imperfection. Madness: Only 108 of 14.7M brackets still perfect

espn.comMadness: Only 108 of 14.7M brackets still perfectThere were 14.7 million brackets entered in ESPN’s Tournament Challenge prior to the start of the NCAA tournament. After just 16 games, only 108 remain unblemished.

The Verge @verge

Vergecast: The Snyder Cut, Samsung Unpacked 2021, and this week in EVs trib.al/pJmn0oO

Sebastian Raschka @rasbt

Deep Learning News #8 Mar 20 2021. This week with a math benchmark set probing the capabilities of GPT-3, an interesting approach to data augmentation, and repurposing GANs for semantic part segmentation:

youtu.beDeep Learning News #8 Mar 20 2021Topics covered in this video: Measuring Mathematical Problem Solving with the MATH Dataset- Paper: https://arxiv.org/abs/2103.03874- Dataset & PyTorch DataLo...

MLOps Spotlight #2:

Company: Open Source Project
Product: Kubeflow
Website: https://www.kubeflow.org/
GitHub: https://github.com/kubeflow/kubeflow
Offering: This is an open source community driven project related to ML stack management on Kubernetes. The pipelines repository is super interesting. The base toolkit has over 10,000 GitHub stars. People are using this tool.

Footnotes:

[1] Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners by Jared Dean (2014)

[2] When not to use machine learning or AI (Towards Data Science) this one is by Cassie Kozyrkov who shares some great content on decision making here: https://decision.substack.com/

[3] Cloud API Market to Reach $1.78 Billion by 2026 (Wired Release)

What’s next for The Lindahl Letter?

Week 10: Model extensibility for few shot GPT-2
Week 11: What is ML scale? The where and the when of ML usage
Week 12: Confounding within multiple ML model deployments
Week 13: Building out your ML Ops
Week 14: My Ai4 Healthcare NYC 2019 talk revisited

I’ll try to keep the what’s next list forward looking with at least five weeks of posts in planning or review. If you enjoyed reading this content, then please take a moment and share it with a friend.