Erik Bernhardsson    About

When machine learning matters

I joined Spotify in 2008 to focus on machine learning and music recommendations. It’s easy to forget, but Spotify’s key differentiator back then was the low-latency playback. People would say that it felt like they had the music on their own hard drive. (The other key differentiator was licensing — until early 2009 Spotify basically just had all kinds of weird stuff that employees had uploaded. In 2009 after a crazy amount of negotiation the music labels agreed to try it out as an experiment. But I’m getting off topic now.)

Music distribution is a trivial problem now. Put everything on a CDN and you’re done. The cost of bandwidth and storage has gone down by an order of magnitude, not to mention the labor cost needed to build and maintain it.

Anyway, at some point in 2009 we realized that we had far bigger challenges at Spotify than building a music recommendation system. So instead, I switched gears and ran the “Analytics team” for 2 years. We did the first A/B tests, ad delivery optimizations, provided data points crucial to bizdev deals, etc.

Not until 2013 did we feel like it was time to focus on music recs. So I switched back and built up a team around that. The feeling was that we already solved the “tablestakes” problems around music distribution and music management. Those problems had become easy to solve for anyone. The next differentiator would be more advanced features that deliver user value and are harder for competitors to copy. So we focused a lot on ML again.

Which brings me to this conclusion

scarface

In the majority of all products, machine learning will not be a key differentiator in the first five years.

Most machine learning is sprinkles on the top

The first few years of product iteration is about getting the “tablestakes” out of the way. The ROI of those are just vastly bigger. I lead the tech team at a startup and we are nowhere near using any kind of sophisticated machine learning, two years into the process. There are a few promising opportunities where we want to use it. I absolutely think it’s going to be a huge competitive advantage for us. But right now far more simpler things matter. Spending a few days working on the conversion funnel is guaranteed do deliver far more business value.

Rarely is machine learning the fundamental enabler of a product. It’s often an enhancer. This unfortunately means that the machine learning team isn’t a team that creates the core business value and has a crucial strategic role. It will be the team that comes in after 5-10 years once the “basic” features have been built and then squeezes out another 10% MAU by A/B testing the crap out of the product. Despite the current AI hype, most of the big shops focus on relatively mundane things. Google is trying to get you to click on more ads, Facebook to use the newsfeed more. It’s all incremental improvements on top of a product that already existed for 10 years.

enhance

Obviousy the image above has nothing to do with this post. I just thought it was funny. Sorry.

Pick your competitive advantage

How can we get around this? How can we build a company that’s founded based on machine learning first? I suspect ML in itself is very rarely a competitive advantage. Any machine learning company needs to find a sustainable non-ML advantage. Do you have a fantastic set of image filters? Great, use that tiny head start, launch an app and build a social network. Do you have a really good fraud detection system? Go out and sign up enterprise customers that feed you data back.

Machine learning can be a first mover advantage. But there’s a high likelihood whatever insight you have will be independently discovered and published at the next NIPS/KDD/ICML. You need to turn it into something sustainable — having data, or lots of users, or very sticky enterprise contracts, or something else.

Besides the core machine learning, other technology can definitely be a competitive advantage. Building super nasty integrations with vendors, or figuring out the control engineering of the suspension system of a self driving car. Those are proprietary assets where there’s little open research. For the pure machine learning I think we’ll see a separate force of commoditization of machine learning in those areas, where the technological differental between companies coverges towards zero. Knowing how to build a convolutional neural network will not be a valuable asset. Hooking it up to a surveillance system and building video distribution system could be a really key piece of technology.

Don’t underestimate the power of data. Scraping the web doesn’t create valuable asset. But if you can obtain highly valuable unique data then that’s a huge competitive advantage. Another type of data I think people underestimate is in people’s heads — learnings from real production usage. Eg. Netflix has iterated movie recommendations for 10 years. They know their shit. It’s hard building a better recommender system even if you magically had ten times the data that Netflix has.

What seems to happen in reality is that the human capital becomes real asset. Here’s a list of some acquisitions. It’s clear to me these acquisitions were 90% acqui-hire — about human capital being redeployed to something else. Google and other big players has shown that they are willing to pay a huge premium for smart teams (throwing out a fun conspiracy theory just for the sake of it: Google is going to acqui-hire any team with smart people just to create a talent monopoly.) These companies all had built some cool tech, but the price paid really represented the scarcity of skills. I expect that scarcity to vanish gradually.

Erik Bernhardsson

... is the CTO at Better, which is a startup changing how mortgages are done. I write a lot of code, some of which ends up being open sourced, such as Luigi and Annoy. I also co-organize NYC Machine Learning meetup. Here's some more facts about me.