Erik Bernhardsson

About   Top posts  

Calculating cosine similarities using dimensionality reduction

2012-12-05 This was posted on the Twitter Engineering blog a few days ago: Dimension Independent Similarity Computation (DISCO) I just glanced at the paper, and there's some cool stuff going on from a theoretical perspective. What I'm curious about is why they didn't decide to use dimensionality reduction to solve such a big problem. Read more…

Tumblr's awesome project names

2012-11-18 Not sure how I managed to miss this, but I'm watching this Tumblr presentation and they talk about their projects named after Arrested Development topics: Gob, Parmesan, Buster, Jetpants, Oscar, George and Motherboy. Still, the best software project name is probably still Apple's BHA. Read more…

A neat little trick with time decay

2012-10-29 Something that pops up pretty frequently is to implement time decay, especially where you have recursive chains of jobs. For instance, say you want to keep track of a popularity score. You calculate today's output by reading yesterday's output, discounting it by $$ exp(-lambda Delta T) $$ and then adding some hit count for today. Read more…

Luigi: complex pipelines of tasks in Python

2012-10-21 I'm shamelessly promoting my first major open source project. Luigi is a Python module that helps you build complex pipelines of batch jobs, handle dependency resolution, and create visualizations to help manage multiple workflows. It also comes with Hadoop support built in (because that's where really where its strength becomes clear). Read more…
Older Newer

Want to get blog posts over email?

Enter your email address and get an email (roughly monthly) when there's a new post!

Erik Bernhardsson

... is the founder of Modal Labs which is working on some ideas in the data/infrastructure space. I used to be the CTO at Better. A long time ago, I built the music recommendation system at Spotify. You can follow me on Twitter or see some more facts about me.