NYC Machine Learning meetup

2013-01-22

From the NYC Machine Learning talk I had last week:

Haven’t looked at it yet except briefly. Unfortunately the quality isn’t the best.

Momentum and mean reversion might just be volatility bias

2013-01-13

The Economist just published an article called The best, the worst and the ugly. By looking at historical performance for mutual funds, they find strong support for momentum and mean reversion. Picking the best or the worst fund over the previous five years gives great returns over the next five years.

Calculating cosine similarities using dimensionality reduction

2012-12-05

This was posted on the Twitter Engineering blog a few days ago: Dimension Independent Similarity Computation (DISCO)

I just glanced at the paper, and there’s some cool stuff going on from a theoretical perspective. What I’m curious about is why they didn’t decide to use dimensionality reduction to solve such a big problem. The benefit of this approach is that it scales much better (linear in input data size) and produces much better results. The drawback is that it’s much harder to implement.

Tumblr's awesome project names

2012-11-18

Not sure how I managed to miss this, but I’m watching this Tumblr presentation and they talk about their projects named after Arrested Development topics: Gob, Parmesan, Buster, Jetpants, Oscar, George and Motherboy.

Still, the best software project name is probably still Apple’s BHA.

A neat little trick with time decay

2012-10-29

Something that pops up pretty frequently is to implement time decay, especially where you have recursive chains of jobs. For instance, say you want to keep track of a popularity score. You calculate today’s output by reading yesterday’s output, discounting it by $$ exp(-lambda Delta T) $$ and then adding some hit count for today. Typically you choose $$ lambda $$ so that $$ exp(-lambda Delta T) = 0.95 $$ for a day or something like that. We do this to generate popularity scores for every track at Spotify.

Luigi: complex pipelines of tasks in Python

2012-10-21

I’m shamelessly promoting my first major open source project. Luigi is a Python module that helps you build complex pipelines of batch jobs, handle dependency resolution, and create visualizations to help manage multiple workflows. It also comes with Hadoop support built in (because that’s where really where its strength becomes clear).

Erik Bernhardsson

About Top posts

NYC Machine Learning meetup

Momentum and mean reversion might just be volatility bias

Calculating cosine similarities using dimensionality reduction

Tumblr's awesome project names

A neat little trick with time decay

Luigi: complex pipelines of tasks in Python

Erik Bernhardsson

Want to get blog posts over email?

Erik Bernhardsson