Nearest neighbors and vector models – part 2 – algorithms and data structures

This is a blog post rewritten from a presentation at NYC Machine Learning on Sep 17. It covers a library called Annoy that I have built that helps you do nearest neighbor queries in high dimensional spaces. In the first part, I went through some examples of why vector models are useful. In the second part I will be explaining the data structures and algorithms that Annoy uses to do approximate nearest neighbor queries.

Read more…

coin2dice

Here’s a problem that I used to give to candidates. I stopped using it seriously a long time ago since I don’t believe in puzzles, but I think it’s kind of fun.

  1. Let’s say you have a function that simulates a random coin flip. It returns “H” or “T”. This is the only random generator available. How can write a new function that simulates a random dice roll (1…6)?
  2. Is there any method that guarantees that the second function returns in finite time?
  3. Let’s say you want to do this $$ n $$ times where $$ n \to \infty $$ . What’s the most efficient way to do it? Efficient in terms of using the fewest amount of coin flips.

The first part is old, I think. The second and third part are follow up questions that I came up with.

Read more…

Benchmark of Approximate Nearest Neighbor libraries

Annoy is a library written by me that supports fast approximate nearest neighbor queries. Say you have a high (1-1000) dimensional space with points in it, and you want to find the nearest neighbors to some point. Annoy gives you a way to do this very quickly. It could be points on a map, but also word vectors in a latent semantic representation or latent item vectors in collaborative filtering.

Read more…

3D in D3

I have spent some time lately with D3. It’s a lot of fun to build interactive graphs. See for instance this demo (will provide a longer writeup soon).

D3 doesn’t have support for 3D but you can do projections into 2D pretty easily. It’s just old school computer graphics. I ended up adding an animated background to this blog based on an experiment. The math is simple.

Read more…