A brief housekeeping note: the podcast will be taking a holiday break for the next two weeks and will be back on Jan 5. Keep an eye out for a special compilation episode being released then!
Onto this week's newsletter:
In this week's episode, I interviewed Andreas Jansson, the co-founder of Replicate, a version control tool for machine learning. He holds a PhD from City University of London in Music Informatics and was previously a machine learning engineer at Spotify.
Andreas discusses the state of ML research for music information retrieval, the future of tools for data science and ML engineering, and Replicate, his recent project aiming to solve version control for ML models.
As I mentioned in the last newsletter, Ben and Andreas are building in public and soliciting input from the community. They have an active Discord and are holding an open discussion forum over Zoom on Friday (tomorrow!) at 9am PT. You can sign up for that by clicking here.
Click here to listen to the episode, or find it in your podcast player of choice: https://www.mlengineered.com/listen
If you're an entrepreneur, you've no doubt heard of the Lean Canvas, a 1-page business plan template adored by university e-ship centers, business competitions, and startup studios.
Louis Dorard adapted the idea and created the Machine Learning Canvas:
I LOVE this idea and can't believe I hadn't come across it sooner. The magic of a canvas is that it displays the minimum amount of information needed for teams to execute all on one page. Check it out here.
Google has historically been the canary in the ML adoption coal mine and they recently revealed that they use BERT on nearly all English queries. I suspect that we'll see this continue to trickle down into other companies. Thus putting large models into production will be an increasingly valuable skill.
As someone who has only the slightest idea of what Roblox is, I was quite surprised to see an article on their tech blog detailing how they scaled BERT to serve 1B+ daily requests (!). It was enormously helpful when I was facing a similar situation recently (though not nearly to the same scale). Check it out here.
Also see this post for a slightly different use-case of BERT in production: Serving Google BERT in Production using Tensorflow and ZeroMQ
Every time something adverse happens in the course of my work, I try to reflect on why it happened and how it could have been prevented. After a while of doing this, some patterns start to emerge, the most prevalent of which is the importance of communication.
I've found that over half the time something goes awry, the root cause is under-communication, or more specifically, someone making an assumption (usually me) that turns out not to be true. If no one catches this assumption and verifies it, one may end up building on top of a shaky foundation.
An example: there was a dataset that was being streamed into our data lake and I was given the task of figuring out which country each datum originated from. I was also given a rough estimate of the distribution of countries we would expect to see. I dove right in, creating a simple and interpretable model from the categorical features and then a refining model using a language detection library on the text features. This took about a week.
When I ran it on the full dataset, the distribution looked NOTHING like what we had expected. There were entire continents that we expected to see that weren't there. What's going on?
I scheduled a meeting with the team that owned the data and quickly figured out what had gone wrong: they had a filter for specific countries that would be sent to the data lake. And because the country of origin had never mattered before, it was omitted. Sigh. Not only were we not receiving all the data, but the feature I was trying to predict was already something we had ground truth for!
I had ASSUMED that we were getting all the data available, both in terms of quantity and features. Both turned out wrong and it cost me a week and some embarrassment.
In the project I conducted immediately afterwards, I made sure to start by meeting with the data team to ensure I wasn't making erroneous assumptions. If I was working with numerical or categorical features, I would have also set up data monitoring, which is the desired end-state of making assumptions explicit. Alas, I was dealing with images in this case, so monitoring is not straightforward (to say the least).
Ever since making this particularly salient mistake, I've done my best to list out assumptions I make and really question if I can do so. A quick slack message or meeting (in the worst case) goes a long way towards making sure one doesn't stray from the critical path.
This is probably a fairly obvious point if you're further ahead in their career than I am, but I hope this can save at least one person from repeating my mistake.
I mentioned last week that I had to skip an episode release because of guest reschedulings. Since then, I’ve recorded five episodes, which was fun but extremely tiring. Rest assured, I won’t be missing another release! Also, in case you missed it, I wrote up a study guide for aspiring ML engineers that lays out a clear starting path and contains a list of resources that I and my friends have learned from. Read the Study Guide In this week's edition: My Interview on the MLOps Community podcast...
There’s no podcast episode this week due to an unfortunate coincidence of multiple guests needing to reschedule. My apologies, I’ll be doing my best in the future to not let this happen again. That doesn’t mean I don’t have any new content for this week, though! Today I’m releasing an article that answers one of the most common questions I get: “I want to learn machine learning, where do I start / what do I do?” When I was first getting started in ML, it was pretty straightforward: there was...
After a month off from releasing original interviews on the podcast feed, I’m so excited to be sharing this episode with all of you! Aether Biomachines is one of the most interesting machine learning startups I’ve ever come across and I was thrilled to interview the founder, Pavle Jeremic. Building a Post-Scarcity Future using Machine Learning “How can we make sure that the economy is so productive that the desperation that leads people to commit atrocities never happens?” In this episode,...