# Stuff that bothers me: “100x faster than Hadoop”

The simple way to get featured on big data blog these days seem to be Build something that does 1 thing super well but nothing else Benchmark it against Hadoop Publish stats showing that it’s 100x faster than Hadoop \$ Spark...

I like the editing!

# Being data driven

I picked up an issue of Foreign Affairs while flying back to NYC from SFO. It features this long interview with U.S. General Stanley McChrystal and I thought it was pretty interesting how striking some of the similarities are between fig...

# Annoy

Annoy is a simple package to find approximate nearest neighbors (ANN) that I just put on Github. I’m not trying to compete with existing packages, but Annoy has a couple of features that makes it pretty useful. Most importantly, it uses ...

# More Luigi!

Elias Freider just talked about Luigi at PyData 2013: The presentation above is much better than one I put together a few weeks ago. In case anyone is interested I’ll include it too: