Annoy – now without Boost dependencies and with Python 3 Support

image

Annoy is a C++/Python package I built for fast approximate nearest neighbor search in high dimensional spaces. Spotify uses it a lot to find similar items. First, matrix factorization gives a low dimensional representation of each item (artist/album/track/user) so that every item is a k-dimensional vector, where k is typically 40-100. This is then loaded into an Annoy index for a number of things: fast similar items, personal music recommendations, etc.

Read more…

The relationship between commit size and commit message size

Screen Shot 2015-02-24 at 8.56.35 PM

Wow I guess it was more than a year ago that I tweeted this. Crazy how time flies by. Anyway, here’s my rationale:

  • When I update one line of code I feel like I have to put in a long explanation about its side effects, why it’s fully backwards compatible, and why it fixes some issue #xyz.
  • When I refactor 500 lines of code, I get too lazy to write anything sensible, so I just put “refactoring FooBarController”. Note: don’t do at home!

I decided to plot the relationship for Luigi: {% include 2015-02-26-the-relationship-between-commit-size-and-commit-message-size.html %}

Read more…