Erik Bernhardsson    About

hdfs2cass

Just open sourced hdfs2cass which is a Hadoop job (written in Java) to do efficient Cassandra bulkloading. The nice thing is that it queries Cassandra for its topology and uses that to partition the data so that each reducer can upload d...

NoDoc

We had an unconference at Spotify last Thursday and I added a semi-trolling semi-serious topic about abolishing documentation. Or NoDoc, as I’m going to call this movement. This was meant to be mostly a thought experiment, but I don’t se...

Wikiphilia

I’ve been obsessed with Wikipedia for the past ten years. Occasionally I find some good articles worth sharing and that’s why I created the wikiphilia Twitter handle. Just a long stream of stuff that for one reason or another may be inte...

Spotify’s Discovery page

The Discovery page, the new start page in Spotify, is finally out to a fairly significant percentage of all users. Really happy since we have worked on it for the past six months. Here’s a screen shot: Some cool features Artist/al...

Fermat’s principle

I was browsing around on the Internet and the physics geek in me started reading about Fermat’s principle. And suddenly something came back to me that I’ve been trying to suppress for many years – how I never understood why there’s anyth...

Snakebite

Just promoting Spotify stuff here: check out the Snakebite repo on Github, written by Wouter de Bie. It’s a super fast tool to access HDFS over CLI/Python, by accessing the namenode directly over sockets/protobuf. Spotify’s developer bl...

Stuff that bothers me: “100x faster than Hadoop”

The simple way to get featured on big data blog these days seem to be Build something that does 1 thing super well but nothing else Benchmark it against Hadoop Publish stats showing that it’s 100x faster than Hadoop $$$ Spark...