Interview with a Data Scientist: Erik Bernhardsson

2015-10-28

I was featured in Peadar Coyle’s interview series interviewing various “data scientists” – which is kind of arguable since (a) all the other ppl in that series are much cooler than me (b) I’m not really a data scientist. Anyway, reposting the full interview:

Nearest neighbors and vector models – epilogue – curse of dimensionality

2015-10-20

This is another post based on my talk at NYC Machine Learning. The previous two parts covered most of the interesting parts, but there are still some topics left to be discussed. To go back and read the meaty stuff, check out

Nearest neighbors and vector models – part 2 – algorithms and data structures

2015-10-01

This is a blog post rewritten from a presentation at NYC Machine Learning on Sep 17. It covers a library called Annoy that I have built that helps you do nearest neighbor queries in high dimensional spaces. In the first part, I went through some examples of why vector models are useful. In the second part I will be explaining the data structures and algorithms that Annoy uses to do approximate nearest neighbor queries.

Nearest neighbor methods and vector models – part 1

2015-09-24

This is a blog post rewritten from a presentation at NYC Machine Learning last week. It covers a library called Annoy that I have built that helps you do (approximate) nearest neighbor queries in high dimensional spaces. I will be splitting it into several parts. This first talks about vector models, how to measure similarity, and why nearest neighbor queries are useful.

Presentations about Spotify music recommendations

2015-09-22

A couple of people in my old team have been around talking about how Spotify does music recommendations and put together some quite good presentations.

First one is Neville Li’s presentation about Scala Data Pipelines @ Spotify:

Antipodes

2015-09-08

I was playing around with D3 last night and built a silly visualization of antipodes and how our intuitive understanding of the world sometimes doesn’t make sense. Check out the visualization at bl.ocks.org!

Basically the idea is if you fly from Beijing to Buenos Aires then you can have a layover at any point of the Earth’s surface and it won’t make the trip longer.

Software Engineers and Automation

2015-08-16

Every once in a while when talking to smart people the topic of automation comes up. Technology has made lots of occupations redundant, so what’s next?

Switchboard operator, a long time ago

What about software engineers? Every year technology replaces parts of what they do. Eventually surely everything must be replaced? I just ran into another one of these arguments: Software Engineers will be obsolete by 2060.

coin2dice

2015-07-24

Here’s a problem that I used to give to candidates. I stopped using it seriously a long time ago since I don’t believe in puzzles, but I think it’s kind of fun.

Let’s say you have a function that simulates a random coin flip. It returns “H” or “T”. This is the only random generator available. How can write a new function that simulates a random dice roll (1…6)?
Is there any method that guarantees that the second function returns in finite time?
Let’s say you want to do this $$ n $$ times where $$ n \to \infty $$ . What’s the most efficient way to do it? Efficient in terms of using the fewest amount of coin flips.

The first part is old, I think. The second and third part are follow up questions that I came up with.

Benchmark of Approximate Nearest Neighbor libraries

2015-07-04

Annoy is a library written by me that supports fast approximate nearest neighbor queries. Say you have a high (1-1000) dimensional space with points in it, and you want to find the nearest neighbors to some point. Annoy gives you a way to do this very quickly. It could be points on a map, but also word vectors in a latent semantic representation or latent item vectors in collaborative filtering.

More Luigi alternatives

2015-07-02

The workflow engine battle has intensified with some more interesting entries lately! Here are a couple I encountered in the last few days. I love that at least two of them are direct references to Luigi!

Want to get blog posts over email?

Erik Bernhardsson