New benchmarks for approximate nearest neighbors

UPDATE(2018-06-17): There are is a later blog post with newer benchmarks! One of my super nerdy interests include approximate algorithms for nearest neighbors in high-dimensional spaces. The problem is simple. You have say 1M points in some high-dimensional space.

Are data sets the new server rooms?

This blog post Data sets are the new server rooms makes the point that a bunch of companies raise a ton of money to go get really proprietary awesome data as a competitive moat. Because once you have the data, you can build a better product, and no one can copy it (at least not very cheaply).

When machine learning matters

I joined Spotify in 2008 to focus on machine learning and music recommendations. It's easy to forget, but Spotify's key differentiator back then was the low-latency playback. People would say that it felt like they had the music on their own hard drive.

My issue with GPU-accelerated deep learning

I've been spending several hundred bucks renting GPU instances on AWS over the last year. The speedup from a GPU is awesome and hard to deny. GPUs have taken over the field. Maybe following the footsteps of Bitcoin mining there's some research on using FPGA (I know very little about this).

Analyzing 50k fonts using deep neural networks

For some reason I decided one night I wanted to get a bunch of fonts. A lot of them. An hour later I had a bunch of scrapy scripts pulling down fonts and a few days later I had more than 50k fonts on my computer.

Installing TensorFlow on AWS

Curious about Google's newly released TensorFlow? I don't have a beefy GPU machine, so I spent some time getting it to run on EC2. The steps on how to reproduce it are pretty brutal and I wouldn't recommend going through it unless you want to waste five hours of your live.

Interview with a Data Scientist: Erik Bernhardsson

I was featured in Peadar Coyle's interview series interviewing various “data scientists” – which is kind of arguable since (a) all the other ppl in that series are much cooler than me (b) I'm not really a data scientist.

Presentations about Spotify music recommendations

A couple of people in my old team have been around talking about how Spotify does music recommendations and put together some quite good presentations. First one is Neville Li's presentation about Scala Data Pipelines @ Spotify:

Black Box Machine Learning in the Cloud

There's a bunch of companies working on machine learning as a service. Some old companies like Google, but now also Amazon and Microsoft. Then there's a ton of startups: PredictionIO ($2.7M funding), BigML ($1.6M funding), Clarifai, etc, etc.