Erik Bernhardsson    About

Annoying blog post

I spent a couple of hours this weekend going through some pull requests and issues to Annoy, which is an open source C++/Python library for Approximate Nearest Neighbor search.

I set up Travis-CI integration and spent some time on one of the issues that multiple people had reported. At the end of the day, it turns out the issue was actually caused by a bug in GCC 4.8. Some crazy compiler optimization introduced between 4.6 and 4.8 caused this loop to be removed:

if (indices.size() <= (size_t)_K) {
  for (size_t i = 0; i < indices.size(); i++)
    m->children[i] = indices[i];
  return item;

Replacing it with std::copy turned out to do the trick

if (indices.size() <= (size_t)_K) {
  copy(indices.begin(), indices.end(), m->children);
  return item;

It’s still bizarre, but I probably deserved it, given how Annoy is abusing C++. The m->children array is declared as being only 2 elements long, but I deliberately overflow the array because I allocate extra space after it. I think this might cause GCC to unroll the loop to run twice.

I always feel a bit more comfortable when it turns out that the compiler is introducing bugs rather than my code. Made me think of the Jeff Dean joke: Jeff Dean builds his code before committing it, but only to check for compiler and linker bugs.

Anyway, after fixing this in three separately places, it seems like it’s finally working. Dirk Eddelbuettel is working on an R implementation of Annoy which is fun to see.

I haven’t spent much time with Annoy in a year or two and looking around it seems like there’s some new competitors on the block. Panns is one of them, another one is the LSHForest pull request for scikit-learn. I haven’t looked at them thoroughly, but they are both Python-only and claim some advantages over Annoy. None of them implement mmap as a method to load indexes, which imho is Annoy’s killer feature.

There’s a performance benchmark featuring Annoy, LSHForest, and FLANN, written by the author of LSHForest. Annoy performs horribly in the benchmark, getting its ass severely kicked by the other two. After re-running the benchmark myself, I think what happened is that the bug I mentioned above was present for Annoy and that’s why it performed so bad. Re-running the benchmark (thanks for making it easily reproducible!) yields very different results.

It’s extremely hard to compare all trade-offs between index building time, index size, query performance, and accuracy. So please don’t take this super seriously. The only thing I changed in the benchmark was (1) I added Panns, for good measure (2) I reduced the number of trees for Annoy (and Panns) to 10 instead of using n_features. Without reducing the number of trees for Annoy, it gets pretty much 100% accuracy for all data sets, but takes several minutes to build each index. So to emphasize approximate aspect of ANN, I decided to sacrifice some accuracy to gain performance.

Pardon the lousy graphics, but here’s the result in all its glory:




In my layman terms:

  • Annoy and Panns outperform LSHF and FLANN significantly on accuracy.
  • Index building process is fast for LSHF and FLANN. Annoy takes a lot more time, but Panns is 10x slower than Annoy
  • FLANN is faster than Annoy for queries. Annoy is 10x faster than LSHF. Panns is super duper slow.

And with my severely biased conclusions:

  • If you want to use mmap for fast index loading, use Annoy
  • If you want to minimize file size at any cost, use Panns
  • If you want fast query times at any cost, use FLANN
  • If you want a pure Python solution, use LSHF
  • For anything else, use Annoy. Or am I going too far promoting my own projects now…?

Btw, I would love it if someone could help me reimplement the algo used by Panns in Annoy, since it seems pretty good.

For another comparison, check out Radim Řehůřek’s Performance Shootout of Nearest Neighbours.

All metrics below:

Time building index (s) Average query time (ms) Average accuracy
n_samples: 1000 n_features: 100
LSHF 0.02 5.46 0.59
Annoy 0.15 0.27 0.98
Flann 0.00 0.17 0.60
Panns 3.27 66.48 0.93
n_samples: 1000 n_features: 500
LSHF 0.10 7.11 0.61
Annoy 0.39 0.75 0.98
Flann 0.01 0.24 0.62
Panns 9.94 140.60 0.96
n_samples: 10000 n_features: 100
LSHF 0.25 8.01 0.61
Annoy 3.17 0.45 0.98
Flann 0.02 0.20 0.62
Panns 55.34 71.12 0.96
n_samples: 10000 n_features: 500
LSHF 1.29 9.50 0.15
Annoy 10.46 1.14 0.50
Flann 0.07 0.24 0.13
Panns 154.58 139.83 0.54
n_samples: 10000 n_features: 1000
LSHF 2.70 13.74 0.16
Annoy 18.28 2.32 0.49
Flann 0.11 0.32 0.12
Panns 278.21 257.45 0.49

Erik Bernhardsson

... is the CTO at Better, which is a startup changing how mortgages are done. I write a lot of code, some of which ends up being open sourced, such as Luigi and Annoy. I also co-organize NYC Machine Learning meetup. You can follow me on Twitter or see some more facts about me.