Luigi Patruno on writing ML in Production, thinking from a business perspective, and the future of software (PLUS: 2 open source ML tools and a new online course)

Looks like you missed last week’s ML Engineered newsletter, so I wanted to send it again in case it got lost in your inbox.

Have a great week!

Luigi Patruno: ML in Production, Adding Business Value with Data Science, "Code 2.0"

In this week’s episode, I interviewed the Luigi Patruno, the man behind ML in Production, my favorite blog on the topic of building machine learning systems for the real world.

He discusses best practices for putting ML into production, how to make sure your data science efforts are actually adding business value, and what the future of building software might be (“Code 2.0”).

I wrote out the best quotes and takeaways from the episode in this Twitter thread. Check it out and like/re-tweet if you found it helpful!

Click here to listen to the episode, or find it in your podcast player of choice: https://www.mlengineered.com/listen

Two New Open Source Tools for ML Engineers

People online like to argue what year the state of ML tooling is in compared to traditional software. But whether we’re 5 or 15 years behind, everyone can agree that we’ve got a looooong way to go.

Which is why I’m so excited whenever I see a new tool come out targeted specifically for people actually using ML in the real world, especially when they’re open source!

So today I’m highlighting two of the most recent ones I’ve seen released that I’ll be trying out when the use-case comes up.

Replicate: Version Control for ML

I’ve tried using various experiment tracking tools before (comet, DVC), but came to the same conclusions as Ben and Andreas: they were too heavyweight and inflexible. It’s great to see that instead of accepting that experiments will always be tracked in a spreadsheet (guilty!), they decided to do something about it.

They have the basic functionality working already and are building the rest of it with the community’s input. Ben ran community meetings when working at Docker and has started to do the same here. They have an open Discord and are very responsive to feedback!

Check it out here: https://replicate.ai/

Evidently: Data Drift Analysis Reports

One of the recurring themes on the podcast and this newsletter is the need for monitoring of both data and models in a ML pipeline. Evidently's first release deals with the former, tackling the issue of knowing when your production data has drifted away from training data.

Their tool takes two pandas dataframes as input (reference and test) and produces an interactive report either in the form of a notebook cell or a stand-alone html page. If you deal with numerical or categorical features, this is bound to be extremely useful!

Check out their release blog post here: https://evidentlyai.com/blog/evidently-001-open-source-tool-to-analyze-data-drift

They've also written an excellent series of articles on ML monitoring here: https://evidentlyai.com/blog/machine-learning-monitoring-what-it-is-and-how-it-differs

A New Online Course for Applied ML

Goku Mohandas released Made With ML and it quickly made a splash in the community with over 20k people signing up within months. Since then, he made the difficult decision to pivot away from the project sharing platform it started as:

The first project he’s working on is a free online course, “Applied ML in Production”:

There’s a dearth of online courses for practical ML, especially from people who’ve done it before, with Full Stack Deep Learning being the only exception. This pivot certainly gets my 👍!

Goku’s released the videos for the first two sections and they’re phenomenally useful. Check it out here: https://madewithml.com/courses/applied-ml-in-production/

Machine Learning Engineered

Luigi Patruno on writing ML in Production, thinking from a business perspective, and the future of software (PLUS: 2 open source ML tools and a new online course)

Luigi Patruno: ML in Production, Adding Business Value with Data Science, "Code 2.0"

Two New Open Source Tools for ML Engineers

Replicate: Version Control for ML

Evidently: Data Drift Analysis Reports

A New Online Course for Applied ML

What I've learned from hosting the ML Engineered podcast (PLUS: the research area you NEED to know about, data science project management, and more...)

A study guide for ML engineering, a new Google paper on "Data Cascades", and more...

Can machine learning solve scarcity? This founder thinks so... (PLUS: Interest in MLOps "exploding" while ML research is stagnating?)