math · Erik Bernhardsson

σ-driven project management: when is the optimal time to give up?

2022-04-05 Hi! It's your friendly project management theorician. You might remember me from blog posts such as Why software projects take longer than you think, which is a blog post I wrote a long time ago positing that software projects completion time follow a log-normal distribution.

Mortality statistics and Sweden's "dry tinder" effect

2020-09-23 We live in a year of about 350,000 amateur epidemiologists and I have no desire to join that “club”. But I read something about COVID-19 deaths that I thought was interesting and wanted to see if I could replicated it through data.

How to set compensation using commonsense principles

2020-06-08 Compensation has always been one of the most confusing parts of management to me. Getting it right is obviously extremely important. Compensation is what drives our entire economy, and you could look at the market for labor as one gigantic resource-allocating machine in the same way as people look at the stock market as a gigantic resource-allocating machine for investments.

How to hire smarter than the market: a toy model

2020-01-13 Let's consider a toy model where you're hiring for two things and that those are equally valuable. It's not very important what those are, so let's just call them “thing A” and “thing B” for now.

Modeling conversion rates using Weibull and gamma distributions

2019-08-05 This is a blog post originally featured on the Better engineering blog. If you want to link to this article or share it, please go to the original post URL! Separately, I'm sorry it's been so long with no posts on this blog.

Headcount goals, feature factories, and when to hire those mythical 10x people

2019-02-21 When I started building up a tech team for Better, I made a very conscious decision to pay at the high end to get people. I thought this made more sense: they cost a bit more money to hire, but output usually more than compensates for it.

Data architecture vs backend architecture

2019-01-10 A modern tech stack typically involves at least a frontend and backend but relatively quickly also grows to include a data platform. This typically grows out of the need for ad-hoc analysis and reporting but possibly evolves into a whole oil refinery of cronjobs, dashboards, bulk data copying, and much more.

The hacker's guide to uncertainty estimates

2018-10-08 It started with a tweet: New years resolution: every plot I make during 2018 will contain uncertainty estimates — Erik Bernhardsson (@bernhardsson) January 7, 2018 Why? Because I've been sitting in 100,000,000 meetings where people endlessly debate whether the monthly number of widgets is going up or down, or whether widget method X is more productive than widget method Y.

Interviewing is a noisy prediction problem

2018-05-02 I have done roughly 2,000 interviews in my life. When I started recruiting, I had so much confidence in my ability to assess people. Let me just throw a couple of algorithm questions at a candidate and then I'll tell you if they are good or not!

Waiting time, load factor, and queueing theory: why you need to cut your systems a bit of slack

2018-03-27 I've been reading up on operations research lately, including queueing theory. It started out as a way to understand the very complex mortgage process (I work at a mortgage startup) but it's turned into my little hammer and now I see nails everywhere.

Why conversion matters: a toy model

2017-10-30 There are often close relationships between top level business metrics. For instance, it's well known that retention has a super strong impact on the valuation of a subscription business. Or that the % of occupied seats is super important for an airline.

Fun with trigonometry: the world's most twisted coastline

2017-07-12 I just spent a few days in Italy, on the Ligurian coast. Even though we were on the west side of Italy, the Mediterranean sea was to the east, because the house was situated on a long bay.

Optimizing for iteration speed

2017-07-06 I've written before about the importance of iterating quickly but I didn't necessarily talk about some concrete things you can do. When I've built up the tech team at Better, I've intentionally optimized for fast iteration speed above almost everything else.

Conversion rates – you are (most likely) computing them wrong

2017-05-23 How hard can it be to compute conversion rate? Take the total number of users that converted and divide them with the total number of users. Done. Except… it's a lot more complicated when you have any sort of significant time lag.

The eigenvector of "Why we moved from language X to language Y"

2017-03-15 I was reading yet another blog post titled “Why our team moved from <language X> to <language Y>” (I forgot which one) and I started wondering if you can generalize it a bit. Is it possible to generate a N * N contingency table of moving from language X to language Y?

Language pitch

2017-02-01 Here's a fun analysis that I did of the pitch (aka. frequency) of various languages. Certain languages are simply pronounced with lower or higher pitch. Whether this is a feature of the language or more a cultural thing is a good question, but there are some substantial differences between languages.

Pareto efficency

2016-10-25 Pareto efficiency is a useful concept I like to think about. It often comes up when you compare items on multiple dimensions. Say you want to buy a new TV. To simplify it let's assume you only care about two factors: price and quality.

Subway waiting math

2016-07-09 Why does it suck to wait for things? In a previous post I analyzed a NYC subway dataset and found that at some point, quite early, it's worth just giving up. This isn't a proof that the subway doesn't run on time – in fact it might actually proves that the subway runs really well.

Approximate nearest news

2016-06-02 As you may know, one of my (very geeky) interests is Approximate nearest neigbor methods, and I'm the author of a Python package called Annoy. I've also built a benchmark suite called ann-benchmarks to compare different packages.

Dollar cost averaging

2016-04-26 (I accidentally published an unfinished draft of this post a few days ago – sorry about that). There's a lot of sources preaching the benefits of dollar cost averaging, or the practice of investing a fixed amount of money regularly.

NYC subway math

2016-04-04 Apparently MTA (the company running the NYC subway) has a real-time API. My fascination for the subway takes autistic proportions and so obviously I had to analyze some of the data. The documentation is somewhat terrible, but here's some relevant code for how to use the API:

My issue with GPU-accelerated deep learning

2016-02-03 I've been spending several hundred bucks renting GPU instances on AWS over the last year. The speedup from a GPU is awesome and hard to deny. GPUs have taken over the field. Maybe following the footsteps of Bitcoin mining there's some research on using FPGA (I know very little about this).

Nearest neighbors and vector models – epilogue – curse of dimensionality

2015-10-20 This is another post based on my talk at NYC Machine Learning. The previous two parts covered most of the interesting parts, but there are still some topics left to be discussed. To go back and read the meaty stuff, check out

Nearest neighbors and vector models – part 2 – algorithms and data structures

2015-10-01 This is a blog post rewritten from a presentation at NYC Machine Learning on Sep 17. It covers a library called Annoy that I have built that helps you do nearest neighbor queries in high dimensional spaces.

Nearest neighbor methods and vector models – part 1

2015-09-24 This is a blog post rewritten from a presentation at NYC Machine Learning last week. It covers a library called Annoy that I have built that helps you do (approximate) nearest neighbor queries in high dimensional spaces.

coin2dice

2015-07-24 Here's a problem that I used to give to candidates. I stopped using it seriously a long time ago since I don't believe in puzzles, but I think it's kind of fun. Let's say you have a function that simulates a random coin flip.

3D in D3

2015-06-21 I have spent some time lately with D3. It's a lot of fun to build interactive graphs. See for instance this demo (will provide a longer writeup soon). D3 doesn't have support for 3D but you can do projections into 2D pretty easily.

The lane next to you is more likely to be slower than yours

2015-05-28 Saw this link on Hacker News the other day: The Highway Lane Next to Yours Isn’t Really Moving Any Faster The article describes a phenomenon unique to traffic where cars spread out when they go fast and get more compact when they go slow.

Norvig's claim that programming competitions correlate negatively with being good on the job

2015-04-07 I saw a bunch of tweets over the weekend about Peter Norvig claiming there's a negative correlation between being good at programming competitions and being good at the job. There were some decent Hacker News comments on it.

Deep learning for… Go

2014-12-11 This is the last post about deep learning for chess/go/whatever. But this really cool paper by Christopher Clark and Amos Storkey was forwarded to me by Michael Eickenberg. It's about using convolutional neural networks to play Go.

Deep learning for… chess (addendum)

2014-12-08 My previous blog post about deep learning for chess blew up and made it to Hacker News and a couple of other places. One pretty amazing thing was that the Github repo got 150 stars overnight.

Deep learning for... chess

2014-11-29 I've been meaning to learn Theano for a while and I've also wanted to build a chess AI at some point. So why not combine the two? That's what I thought, and I ended up spending way too much time on it.

Optimizing things: everything is a proxy for a proxy for a proxy

2014-11-22 Say you build a machine learning model, like a movie recommender system. You need to optimize for something. You have 1-5 stars as ratings so let's optimize for mean squared error. Great. Then let's say you build a new model.

Detecting corporate fraud using Benford's law

2014-10-07 Note: This is a silly application. Don't take anything seriously. Benford's law describes a phenomenon where numbers in any data series will exhibit patterns in their first digit. For instance, if you took a list of the 1,000 longest rivers of Mongolia, or the average daily calorie consumption of mammals, or the wealth distribution of German soccer players, you will on average see that these numbers start with “1” about 30% of the time.

Recurrent Neural Networks for Collaborative Filtering

2014-06-28 I’ve been spending quite some time lately playing around with RNN’s for collaborative filtering. RNN’s are models that predict a sequence of something. The beauty is that this something can be anything really – as long as you can design an output gate with a proper loss function, you can model essentially anything.

More recommender algorithms

2013-12-20 I wanted to share some more insight into the algorithms we use at Spotify. One matrix factorization algorithm we have used for a while assumes that we have user vectors $$ bf{a}_u $$ and item vectors $$ bf{b}_i $$ .

Bagging as a regularizer

2013-12-06 One thing I encountered today was a trick using bagging as a way to go beyond a point estimate and get an approximation for the full distribution. This can then be used to penalize predictions with larger uncertainty, which helps reducing false positives.

Optimizing over multinomial distributions

2013-07-24 Sometimes you have to maximize some function $$ f(w_1, w_2, ldots, w_n) $$ where $$ w_1 + w_2 + ldots + w_n = 1 $$ and $$ 0 le w_i le 1 $$ . Usually, $$ f $$ is concave and differentiable, so there's one unique global maximum and you can solve it by applying gradient ascent.

Stuff that bothers me: “100x faster than Hadoop”

2013-04-27 The simple way to get featured on big data blog these days seem to be Build something that does 1 thing super well but nothing else Benchmark it against Hadoop Publish stats showing that it's 100x faster than Hadoop $$$ Spark claims their 100x faster than Hadoop and there's a lot of stats showing Redshift is 10x faster than Hadoop.

Calculating cosine similarities using dimensionality reduction

2012-12-05 This was posted on the Twitter Engineering blog a few days ago: Dimension Independent Similarity Computation (DISCO) I just glanced at the paper, and there's some cool stuff going on from a theoretical perspective. What I'm curious about is why they didn't decide to use dimensionality reduction to solve such a big problem.

A neat little trick with time decay

2012-10-29 Something that pops up pretty frequently is to implement time decay, especially where you have recursive chains of jobs. For instance, say you want to keep track of a popularity score. You calculate today's output by reading yesterday's output, discounting it by $$ exp(-lambda Delta T) $$ and then adding some hit count for today.