ML for music at Spotify and the future of data science tools (PLUS: a framework for ML projects, BERT in production, and a story from the data trenches)

A brief housekeeping note: the podcast will be taking a holiday break for the next two weeks and will be back on Jan 5. Keep an eye out for a special compilation episode being released then!

Onto this week's newsletter:

Music Information Retrieval at Spotify and the Future of ML Tooling

In this week's episode, I interviewed Andreas Jansson, the co-founder of Replicate, a version control tool for machine learning. He holds a PhD from City University of London in Music Informatics and was previously a machine learning engineer at Spotify.

Andreas discusses the state of ML research for music information retrieval, the future of tools for data science and ML engineering, and Replicate, his recent project aiming to solve version control for ML models.

As I mentioned in the last newsletter, Ben and Andreas are building in public and soliciting input from the community. They have an active Discord and are holding an open discussion forum over Zoom on Friday (tomorrow!) at 9am PT. You can sign up for that by clicking here.

Click here to listen to the episode, or find it in your podcast player of choice: https://www.mlengineered.com/listen

Machine Learning Canvas

If you're an entrepreneur, you've no doubt heard of the Lean Canvas, a 1-page business plan template adored by university e-ship centers, business competitions, and startup studios.

Louis Dorard adapted the idea and created the Machine Learning Canvas:

I LOVE this idea and can't believe I hadn't come across it sooner. The magic of a canvas is that it displays the minimum amount of information needed for teams to execute all on one page. Check it out here.

Scaling BERT Models in Production

Google has historically been the canary in the ML adoption coal mine and they recently revealed that they use BERT on nearly all English queries. I suspect that we'll see this continue to trickle down into other companies. Thus putting large models into production will be an increasingly valuable skill.

As someone who has only the slightest idea of what Roblox is, I was quite surprised to see an article on their tech blog detailing how they scaled BERT to serve 1B+ daily requests (!). It was enormously helpful when I was facing a similar situation recently (though not nearly to the same scale). Check it out here.

Also see this post for a slightly different use-case of BERT in production: Serving Google BERT in Production using Tensorflow and ZeroMQ

Make Your Assumptions Explicit

Every time something adverse happens in the course of my work, I try to reflect on why it happened and how it could have been prevented. After a while of doing this, some patterns start to emerge, the most prevalent of which is the importance of communication.

I've found that over half the time something goes awry, the root cause is under-communication, or more specifically, someone making an assumption (usually me) that turns out not to be true. If no one catches this assumption and verifies it, one may end up building on top of a shaky foundation.

An example: there was a dataset that was being streamed into our data lake and I was given the task of figuring out which country each datum originated from. I was also given a rough estimate of the distribution of countries we would expect to see. I dove right in, creating a simple and interpretable model from the categorical features and then a refining model using a language detection library on the text features. This took about a week.

When I ran it on the full dataset, the distribution looked NOTHING like what we had expected. There were entire continents that we expected to see that weren't there. What's going on?

I scheduled a meeting with the team that owned the data and quickly figured out what had gone wrong: they had a filter for specific countries that would be sent to the data lake. And because the country of origin had never mattered before, it was omitted. Sigh. Not only were we not receiving all the data, but the feature I was trying to predict was already something we had ground truth for!

I had ASSUMED that we were getting all the data available, both in terms of quantity and features. Both turned out wrong and it cost me a week and some embarrassment.

In the project I conducted immediately afterwards, I made sure to start by meeting with the data team to ensure I wasn't making erroneous assumptions. If I was working with numerical or categorical features, I would have also set up data monitoring, which is the desired end-state of making assumptions explicit. Alas, I was dealing with images in this case, so monitoring is not straightforward (to say the least).

Ever since making this particularly salient mistake, I've done my best to list out assumptions I make and really question if I can do so. A quick slack message or meeting (in the worst case) goes a long way towards making sure one doesn't stray from the critical path.

This is probably a fairly obvious point if you're further ahead in their career than I am, but I hope this can save at least one person from repeating my mistake.

Machine Learning Engineered

ML for music at Spotify and the future of data science tools (PLUS: a framework for ML projects, BERT in production, and a story from the data trenches)

Music Information Retrieval at Spotify and the Future of ML Tooling

Machine Learning Canvas

Scaling BERT Models in Production

Make Your Assumptions Explicit

What I've learned from hosting the ML Engineered podcast (PLUS: the research area you NEED to know about, data science project management, and more...)

A study guide for ML engineering, a new Google paper on "Data Cascades", and more...

Can machine learning solve scarcity? This founder thinks so... (PLUS: Interest in MLOps "exploding" while ML research is stagnating?)