Figuring out ML readiness
Starting out along your ML journey can be an interesting and exciting time. People are probably trying to sell you on the benefits of a multitude of things. You can find academic interpretations and checklists of readiness for ML in production.[1] A lot of checklists and other rubrics exist. The very first place I start to question at the start of any ML journey happens to be the basic use cases the company handles. You need to start out with some back of the envelope sketches of what exactly the company is trying to do or hopefully is actually doing. Maybe they are selling something or providing some type of service. After getting that initial sketch of what is happening the next question is tell me about your data storage, KPIs, and objective measures. Mentally I’m sketching out the use case the company is trying to achieve and how they are reporting and monitoring that use case. Between those two things is where your ML readiness is going to unfold. Maybe a straightforward API integration with AWS, GCP, or Azure could get you going without a ton of effort. It's entirely possible that could be the answer and getting going might happen very quickly.
Outside of your basic API driven integrations for your use cases in the ML space you are going to really start to open the door to some fun and fantastic efforts that are highly specialized and very custom. That is where things are going to get harder and your data availability is going to determine your ML readiness. Building out something using internal tooling on your own or with a pattern is going to end up in a data exploration followed by MLOps planning.
Boiling down your efforts to figure out your ML readiness could be summed up within these 5 questions to get you started:
What are you actually trying to do? (the use case question)
How are you collecting and storing data?
Are your use case and data collection/storage aligned?
Should you partner with a vendor or are you ready internally to drive this forward?
Have you considered the ROI, compendium of KPIs, and budget level investments that are going to be required?
Links and thoughts:
I watched about 30 minutes of Linus and Luke hosting the WAN show this week. I’m never really able to watch the whole show. After about 20 minutes I tend to start wandering off to view other things online
I watched this video from Sebastian Raschka “L19.5.2.7: Closing Words -- The Recent Growth of Language Transformers”
Yannic Kicher was in rare form this week with the video “Research Conference ICML drops their acceptance rate | Area Chairs instructed to be more picky”
You could also watch Yannic talking about “DDPM - Diffusion Models Beat GANs on Image Synthesis (Machine Learning Research Paper Explained)”
Top 6 Tweets of the week:










Footnotes:
[1] Check out The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction
What’s next for The Lindahl Letter?
Week 18: Could ML predict the lottery?
Week 19: Fear of missing out on ML
Week 20: The big Lindahl Letter recap edition
Week 21: Doing machine learning work
Week 22: Machine learning graphics
I’ll try to keep the what’s next list forward looking with at least five weeks of posts in planning or review. If you enjoyed reading this content, then please take a moment and share it with a friend.