Ongoing ML cloud costs

May 14, 2021

People make decisions about where to run instances of machine learning every day in the business world. Previously, we have talked about how that ongoing expense based cost of being in the cloud is moving toward APIs from Azure, AWS, and GCP. Back during my Substack Week 11 post, “What is ML scale? The where and the when of ML usage,” I introduced a concept that needs to be referenced here. That concept includes breaking down the categories of ML use cases into three buckets:

Bucket 1: “Things you can call” e.g. external API services
Bucket 2: “Places you can be” e.g. ecosystems where you can build out your footprint (AWS, GCP, Azure, and many others that are springing up for MLOps delivery)
Bucket 3: “Building something yourself” e.g. open source and self tooled solutions

Those three buckets are a useful construct to consider the ongoing ML cloud costs you might face. Those ongoing expense based costs are going to end up falling into one of those buckets and they present very differently.

Bucket 1: “Things you can call” e.g. external API services

This is an exciting and growing place to be. You found a use case that works for you and you are ready to go out and work with an API from Azure, AWS, or GCP. A ton of them exist and can present an answer for a ton of different workflows.

You get a lot of benefits from using an API service to support your machine learning use case. Most of those APIs have supported and will continue to support a multitude of different efforts. That means that the model delivery while not specialized to your use case is proven over time. You get to save out on all the training and ongoing maintenance of models. If you are running a smaller company, then you also get the benefit of having all that other data in the real work conditioning and grounding the model. Those API services are not going to be highly specialized and you sort of get what you get when it comes to using this method of getting your machine learning use case up and running. The cost is predictable per unit or in most cases per bunch of uses. That means you have to forecast how many times you are going to use the API on a monthly basis to be able to predict cost and sustain your return on investment models.

Bucket 2: “Places you can be” e.g. ecosystems where you can build out your footprint

Some cost forecasting calculators in the cloud are better than others. When you start building out your footprint in the cloud you are going to find compute and storage start to combine forces and introduce costs to your organization. The more data you have and the more compute you have to run against that data your cost modeling is going to change. Those first few sentences of this paragraph might not have said machine learning, but they are inclusive of considering it. The totality of the data you need to work with, pipeline, transform, modify, or train models on is going to introduce costs within your cloud footprint. Somebody getting really excited about training data in the cloud can rent out a ton of cool hardware to make it happen faster. That might seem to them in the moment like a most excellent use of the cloud; it could end up being expensive. Maybe it was worth it in a short burst to get going and have that machine learning use case and model deployed to production. In that case then the whole training setup needs to be turned off as soon as possible to constrain the costs. Leaving that on and training will just keep adding to the cost. You will find that you can get some pretty epic hardware in the cloud dedicated to your machine learning efforts. Some of it is priced in a much more reasonable way than other parts.

Bucket 3: “Building something yourself” e.g. open source and self tooled solutions

Depending on your setup this bucket might not have any appreciable cloud costs. A lot of startups and other organizations might train on a small physical machine or rack. When you are building something yourself you just have to figure out what the costs of that hardware would be and that includes any other associated costs like where it resides, how it is backed up, power, and internet.

Links and thoughts:

I watched (and enjoyed) this video from Yannic Kilcher this week called “Involution: Inverting the Inherence of Convolution for Visual Recognition (Research Paper Explained)”

I watched a video from Microsoft Developer this week called, “New to Anomaly Detector: Multivariate Capabilities”

Check this out “Scaling Laws for Language Transfer Learning | Christina Kim | OpenAI Scholars Demo Day 2021” Predicting machine learning performance seems interesting.

Top 5 Tweets of the week:

Dr. Nels Lindahl @nelslindahl

National Geographic: The Battle for the Soul of Artificial Intelligence | Podcast | Overheard at National Geographic. youtube.com/watch?v=Kr7gMQ… via @googlenews

youtube.comThe Battle for the Soul of Artificial Intelligence | Podcast | Overheard at National GeographicWith every breakthrough, computer scientists are pushing the boundaries of artificial intelligence (AI). We see it in everything from predictive text to faci...

Dale Markowitz 😷 @dalequark

GPT-3, BERT, T5, Meena, DALL•E, AlphaFold2 They're all powered by Transformers! But how does this neural network architecture work? New blog >> daleonai.com/transformers-e…

Dr. Nels Lindahl @nelslindahl

Bloomberg: Why Most AI Writing Can't Get Its Facts Straight. bloomberg.com/opinion/articl… via @googlenews

bloomberg.comBloomberg - Are you a robot?

Dr. Nels Lindahl @nelslindahl

The Verge: Why even the studios behind bestselling games shut down. theverge.com/2021/5/8/22412… via @googlenews

theverge.comWhy even the studios behind bestselling games shut downAn interview with Bloomberg games reporter Jason Schreier about his book Press Reset: Ruin and Recovery in the Video Game Industry, a postmortem of the studios behind BioShock, Kingdoms of Amalur, and other games.

Dr. Nels Lindahl @nelslindahl

The Next Web: Carnegie Mellon researchers trained AI to simulate our universe on a GPU. thenextweb.com/news/carnegie-… via @googlenews

thenextweb.comCarnegie Mellon researchers trained AI to simulate our universe on a GPUWhat “classical physics” refers to depends on the context. When discussing special relativity, it refers to the Newtonian physics which preceded relativity, i.e. the branches of physics based on principles developed before the rise of relativity and quantum mechanics. When discussing general relativ…

Footnotes:

[1] Data Analysis, Machine Learning and Microsoft Excel or check out this link from Venturebeat

What’s next for The Lindahl Letter?

Week 17: Figuring out ML readiness
Week 18: Could ML predict the lottery?
Week 19: Fear of missing out on ML
Week 20: The big Lindahl Letter recap edition
Week 21: Doing machine learning work

I’ll try to keep the what’s next list forward looking with at least five weeks of posts in planning or review. If you enjoyed reading this content, then please take a moment and share it with a friend.

The Lindahl Letter

Ongoing ML cloud costs

Discussion about this post