A brief history of Hadoop at Spotify

2014-12-20

I was talking with some data engineers at Spotify and had a moment of nostalgia.

2008

I was writing my master's thesis at Spotify and had to run a Hadoop job to extract some data from the logs. Every time I started running the job, I kept hearing this subtle noise. I kept noticing the correlation for a few days but I was too intimidated to ask. Finally people starting cursing that their machines had gotten really slow lately and I realized we were running Hadoop on the developer's desktop machines. No one had told me. I think back then we had only GB's of log data. I remember running less on the log and I would recognize half the usernames because they were my friends.

2009

We took a bunch of machines and put them on a pallet in the foosball room. It was a super hot Swedish summer and I kept running this matrix factorization job in Hadoop that would fail halfway through. The node on the top of the pile would crash and you had to reboot it. I suspected overheating. We had a fan running in the room but it wasn't helping. Finally I realized the problem was the sun was shining in through the window.

Erik Bernhardsson

About Top posts

A brief history of Hadoop at Spotify

Erik Bernhardsson

A brief history of Hadoop at Spotify

Want to get blog posts over email?

Erik Bernhardsson