Where do locals go in NYC?

2014-06-17

One obvious thing to anyone living in NYC is how tourists cluster in certain areas. I was curious about the larger patterns around this, so I spent some time looking at data. The thing I wanted to understand is: what areas are dominated by tourists? Or conversely, what areas are dominated by locals?

How to build up a data team (everything I ever learned about recruiting)

2014-06-08

During my time at Spotify, I’ve reviewed thousands of resumes and interviewed hundreds of people. Lots of them were rejected but lots of them also got offers. Finally, I’ve also had my share of offers rejected by the candidate.

The power of ensembles

2014-04-24

From my presentation at MLConf, one of the points I think is worth stressing again is how extremely well combining different algorithms works.

In this case, we’re training machine learning algorithms on different data sets (playlists, play counts, sessions) and different objectives (least squares, max likelihood). Then we combine all the models using gradient boosted decision trees training on a smaller but higher quality data set. Finally, we validate on a third data set, in this case looking at recall for a ground truth data set of related artists.

MLConf 2014

2014-04-12

Just spent a day at MLConf where I was talking about how we do music recommendations. There was a whole range of great speakers (actually almost 2/3 women which was pretty cool in itself).

Here are my slides:

Music recommendations using cover images (part 1)

2014-04-01

Scrolling through the Discover page on Spotify the other day it occurred to me that the album is in fact a fairly strong visual proxy for what kind of content you can expect from it. I started wondering if the album cover can in fact be used for recommendations. For many obvious reasons this is a kind of ridiculous idea, but still interesting enough that I just had to explore it a bit. So, I embarked on a journey to see how far I could get in a few hours.

Luigi success

2014-03-22

So Luigi, our open sourced workflow engine in Python, just recently passed 1,000 stars on Github, then shortly after passed mrjob as (I think) the most popular Python package to do Hadoop stuff. This is exciting!

Welcome Echo Nest!

2014-03-22

In case you missed it, we just acquired a company called Echo Nest in Boston. These people have been obsessed with understanding music for the past 8 years since it was founded by Brian Whitman and Tristan Jehan out of MIT Medialab.

Momentum strategies

2014-03-03

Haven’t posted anything in ages, so here’s a quick hack I threw together in Python on a Sunday night. Basically I wanted to know whether momentum strategies work well for international stock indexes. I spent a bit of time putting together a strategy that buys the stock index if the return during the previous n days was positive, otherwise doesn’t do anything. I ran this strategy for a basket of approximately 20 stock markets.

Ratio metrics

2014-01-23

We run a ton of A/B tests at Spotify and we look at a ton of metrics. Defining metrics is a little bit of an art form. Ideally you want to define success metrics before you run a test to avoid cherry picking metrics. You also want to define a metric that has as high signal to noise ratio. And of course, most importantly, your metric should ideally correlate to high level business impact as much as possible.

Benchmarking nearest neighbor libraries in Python

2014-01-12

Radim Rehurek has put together an excellent summary of approximate nearest neighbor libraries in Python. This is exciting, because one of the libraries he’s covering, annoy, was built by me.

After introducing the problem, he goes through the list of contestants and sticks with five remaining ones. Finally, the benchmarks pits annoy against FLANN. Although FLANN seems to have roughly 4x better performance, somewhat surprisingly, Radim concludes annoy is the “winner”. Yay!

Erik Bernhardsson

About Top posts

Where do locals go in NYC?

How to build up a data team (everything I ever learned about recruiting)

The power of ensembles

MLConf 2014

Music recommendations using cover images (part 1)

Luigi success

Welcome Echo Nest!

Momentum strategies

Ratio metrics

Benchmarking nearest neighbor libraries in Python

Erik Bernhardsson

Want to get blog posts over email?

Erik Bernhardsson