Erik Bernhardsson    About

Annoy – now without Boost dependencies and with Python 3 Support


Annoy is a C++/Python package I built for fast approximate nearest neighbor search in high dimensional spaces. Spotify uses it a lot to find similar items. First, matrix factorization gives a low dimensional representation of each item (artist/album/track/user) so that every item is a k-dimensional vector, where k is typically 40-100. This is then loaded into an Annoy index for a number of things: fast similar items, personal music recommendations, etc.

Annoy stands for Approximate Nearest Neighbors something something and was originally open sourced back in 2013, although it wasn’t entirely well-supported until last year when I fixed a couple of crucial bugs. Subsequently, Dirk Eddelbuettel released RCppAnnoy, an R version of Annoy.

The key feature of Annoy is that it supports file-based indexes that can be mmapped very quickly. This makes it very easy to share indexes across multiple processes, load/save indexes, etc.

I built the original version of Annoy using Boost Python but a bunch of people have complained that it’s pretty hard to install. Additionally, Boost Python doesn’t support Python 3.

Last weekend I decided to fix it. I have something to confess. I’ve been meaning to address the Boost dependency for a long time, but never found the time to do it. Finally I just put up an ad on Odesk and outsourced the whole project. I found a great developer who built it all in a few hours.

It might seem ironic to outsource open source projects since I don’t get payed for it. But I spend time working on open source projects because it gives me things back in many ways – networking, recognition, some sort of fuzzy altruistic feeling of having contributed. I don’t mind spending a few bucks on it, same way as I don’t mind spending time on it.

The results is, Annoy doesn’t depend on Boost, and now has Python 3 support. Grab the new version from Github/PyPI and Please let me know if you run into any issues!

Want to get blog posts over email?

Enter your email address and get weekly emails with new articles!

Erik Bernhardsson

... is the CTO at Better, which is a startup changing how mortgages are done. I write a lot of code, some of which ends up being open sourced, such as Luigi and Annoy. I also co-organize NYC Machine Learning meetup. You can follow me on Twitter or see some more facts about me.