Articles tagged with Approximate Nearest Neighbors

New approximate nearest neighbor benchmarks

2018-06-17

As some of you may know, one of my side interests is approximate nearest neighbor algorithms. I’m the author of Annoy, a library with 3,500+ stars on Github as of today. It offers fast approximate search for nearest neighbors with the additional benefit that you can load data super fast from disk using mmap. I built it at Spotify to use for music recommendations where it’s still used to power millions (maybe billions) of music recommendations every day.

New benchmarks for approximate nearest neighbors

2018-02-15

UPDATE(2018-06-17): There are is a later blog post with newer benchmarks!

One of my super nerdy interests include approximate algorithms for nearest neighbors in high-dimensional spaces. The problem is simple. You have say 1M points in some high-dimensional space. Now given a query point, can you find the nearest points out of the 1M set? Doing this fast turns out to be tricky.

Annoy 1.10 released, with Hamming distance and Windows support

2017-11-26

I’ve been a bit bad at posting things with a regular cadence lately, partly because I’m trying to adjust to having a toddler, partly because the hunt for clicks has caused such a high bar for me that I feel like I have to post something Pulitzer-worthy. But things are always cooking, so let’s break this pattern with a quick notice on something I’ve been working on!

Approximate nearest news

2016-06-02

As you may know, one of my (very geeky) interests is Approximate nearest neigbor methods, and I’m the author of a Python package called Annoy.

I’ve also built a benchmark suite called ann-benchmarks to compare different packages. Annoy was the world’s fastest package for a few months, but two things happened.

Nearest neighbors and vector models – epilogue – curse of dimensionality

2015-10-20

This is another post based on my talk at NYC Machine Learning. The previous two parts covered most of the interesting parts, but there are still some topics left to be discussed. To go back and read the meaty stuff, check out

Nearest neighbors and vector models – part 2 – algorithms and data structures

2015-10-01

This is a blog post rewritten from a presentation at NYC Machine Learning on Sep 17. It covers a library called Annoy that I have built that helps you do nearest neighbor queries in high dimensional spaces. In the first part, I went through some examples of why vector models are useful. In the second part I will be explaining the data structures and algorithms that Annoy uses to do approximate nearest neighbor queries.

Nearest neighbor methods and vector models – part 1

2015-09-24

This is a blog post rewritten from a presentation at NYC Machine Learning last week. It covers a library called Annoy that I have built that helps you do (approximate) nearest neighbor queries in high dimensional spaces. I will be splitting it into several parts. This first talks about vector models, how to measure similarity, and why nearest neighbor queries are useful.

Benchmark of Approximate Nearest Neighbor libraries

2015-07-04

Annoy is a library written by me that supports fast approximate nearest neighbor queries. Say you have a high (1-1000) dimensional space with points in it, and you want to find the nearest neighbors to some point. Annoy gives you a way to do this very quickly. It could be points on a map, but also word vectors in a latent semantic representation or latent item vectors in collaborative filtering.

Better precision and faster index building in Annoy

2015-05-26

Sometimes you have these awesome insights. A few days ago I got an idea for how to improve index building in Annoy.

For anyone who isn’t acquainted with Annoy – it’s a C++ library with Python bindings that provides fast high-dimensional nearest neighbor search.

Annoy – now without Boost dependencies and with Python 3 Support

2015-05-03

Annoy is a C++/Python package I built for fast approximate nearest neighbor search in high dimensional spaces. Spotify uses it a lot to find similar items. First, matrix factorization gives a low dimensional representation of each item (artist/album/track/user) so that every item is a k-dimensional vector, where k is typically 40-100. This is then loaded into an Annoy index for a number of things: fast similar items, personal music recommendations, etc.

Erik Bernhardsson

About Top posts