Articles tagged with math

σ-driven project management: when is the optimal time to give up?

Hi! It's your friendly project management theorician. You might remember me from blog posts such as Why software projects take longer than you think, which is a blog post I wrote a long time ago positing that software projects completion time follow a log-normal distribution.

Mortality statistics and Sweden's "dry tinder" effect

We live in a year of about 350,000 amateur epidemiologists and I have no desire to join that “club”. But I read something about COVID-19 deaths that I thought was interesting and wanted to see if I could replicated it through data.

How to set compensation using commonsense principles

Compensation has always been one of the most confusing parts of management to me. Getting it right is obviously extremely important. Compensation is what drives our entire economy, and you could look at the market for labor as one gigantic resource-allocating machine in the same way as people look at the stock market as a gigantic resource-allocating machine for investments.

How to hire smarter than the market: a toy model

Let's consider a toy model where you're hiring for two things and that those are equally valuable. It's not very important what those are, so let's just call them “thing A” and “thing B” for now.

Data architecture vs backend architecture

A modern tech stack typically involves at least a frontend and backend but relatively quickly also grows to include a data platform. This typically grows out of the need for ad-hoc analysis and reporting but possibly evolves into a whole oil refinery of cronjobs, dashboards, bulk data copying, and much more.

The hacker's guide to uncertainty estimates

It started with a tweet: New years resolution: every plot I make during 2018 will contain uncertainty estimates — Erik Bernhardsson (@bernhardsson) January 7, 2018 Why? Because I've been sitting in 100,000,000 meetings where people endlessly debate whether the monthly number of widgets is going up or down, or whether widget method X is more productive than widget method Y.

Interviewing is a noisy prediction problem

I have done roughly 2,000 interviews in my life. When I started recruiting, I had so much confidence in my ability to assess people. Let me just throw a couple of algorithm questions at a candidate and then I'll tell you if they are good or not!

Why conversion matters: a toy model

There are often close relationships between top level business metrics. For instance, it's well known that retention has a super strong impact on the valuation of a subscription business. Or that the % of occupied seats is super important for an airline.

Optimizing for iteration speed

I've written before about the importance of iterating quickly but I didn't necessarily talk about some concrete things you can do. When I've built up the tech team at Better, I've intentionally optimized for fast iteration speed above almost everything else.

The eigenvector of "Why we moved from language X to language Y"

I was reading yet another blog post titled “Why our team moved from <language X> to <language Y>” (I forgot which one) and I started wondering if you can generalize it a bit. Is it possible to generate a N * N contingency table of moving from language X to language Y?

Language pitch

Here's a fun analysis that I did of the pitch (aka. frequency) of various languages. Certain languages are simply pronounced with lower or higher pitch. Whether this is a feature of the language or more a cultural thing is a good question, but there are some substantial differences between languages.

Pareto efficency

Pareto efficiency is a useful concept I like to think about. It often comes up when you compare items on multiple dimensions. Say you want to buy a new TV. To simplify it let's assume you only care about two factors: price and quality.

Subway waiting math

Why does it suck to wait for things? In a previous post I analyzed a NYC subway dataset and found that at some point, quite early, it's worth just giving up. This isn't a proof that the subway doesn't run on time – in fact it might actually proves that the subway runs really well.

Approximate nearest news

As you may know, one of my (very geeky) interests is Approximate nearest neigbor methods, and I'm the author of a Python package called Annoy. I've also built a benchmark suite called ann-benchmarks to compare different packages.

Dollar cost averaging

(I accidentally published an unfinished draft of this post a few days ago – sorry about that). There's a lot of sources preaching the benefits of dollar cost averaging, or the practice of investing a fixed amount of money regularly.

NYC subway math

Apparently MTA (the company running the NYC subway) has a real-time API. My fascination for the subway takes autistic proportions and so obviously I had to analyze some of the data. The documentation is somewhat terrible, but here's some relevant code for how to use the API:

My issue with GPU-accelerated deep learning

I've been spending several hundred bucks renting GPU instances on AWS over the last year. The speedup from a GPU is awesome and hard to deny. GPUs have taken over the field. Maybe following the footsteps of Bitcoin mining there's some research on using FPGA (I know very little about this).

Nearest neighbor methods and vector models – part 1

This is a blog post rewritten from a presentation at NYC Machine Learning last week. It covers a library called Annoy that I have built that helps you do (approximate) nearest neighbor queries in high dimensional spaces.


Here's a problem that I used to give to candidates. I stopped using it seriously a long time ago since I don't believe in puzzles, but I think it's kind of fun. Let's say you have a function that simulates a random coin flip.

3D in D3

I have spent some time lately with D3. It's a lot of fun to build interactive graphs. See for instance this demo (will provide a longer writeup soon). D3 doesn't have support for 3D but you can do projections into 2D pretty easily.

The lane next to you is more likely to be slower than yours

Saw this link on Hacker News the other day: The Highway Lane Next to Yours Isn’t Really Moving Any Faster The article describes a phenomenon unique to traffic where cars spread out when they go fast and get more compact when they go slow.

Deep learning for&#8230; Go

This is the last post about deep learning for chess/go/whatever. But this really cool paper by Christopher Clark and Amos Storkey was forwarded to me by Michael Eickenberg. It's about using convolutional neural networks to play Go.

Deep learning for&#8230; chess (addendum)

My previous blog post about deep learning for chess blew up and made it to Hacker News and a couple of other places. One pretty amazing thing was that the Github repo got 150 stars overnight.

Deep learning for... chess

I've been meaning to learn Theano for a while and I've also wanted to build a chess AI at some point. So why not combine the two? That's what I thought, and I ended up spending way too much time on it.

Detecting corporate fraud using Benford's law

Note: This is a silly application. Don't take anything seriously. Benford's law describes a phenomenon where numbers in any data series will exhibit patterns in their first digit. For instance, if you took a list of the 1,000 longest rivers of Mongolia, or the average daily calorie consumption of mammals, or the wealth distribution of German soccer players, you will on average see that these numbers start with “1” about 30% of the time.

Recurrent Neural Networks for Collaborative Filtering

I’ve been spending quite some time lately playing around with RNN’s for collaborative filtering. RNN’s are models that predict a sequence of something. The beauty is that this something can be anything really – as long as you can design an output gate with a proper loss function, you can model essentially anything.

More recommender algorithms

I wanted to share some more insight into the algorithms we use at Spotify. One matrix factorization algorithm we have used for a while assumes that we have user vectors $$ bf{a}_u $$ and item vectors $$ bf{b}_i $$ .

Bagging as a regularizer

One thing I encountered today was a trick using bagging as a way to go beyond a point estimate and get an approximation for the full distribution. This can then be used to penalize predictions with larger uncertainty, which helps reducing false positives.

Optimizing over multinomial distributions

Sometimes you have to maximize some function $$ f(w_1, w_2, ldots, w_n) $$ where $$ w_1 + w_2 + ldots + w_n = 1 $$ and $$ 0 le w_i le 1 $$ . Usually, $$ f $$ is concave and differentiable, so there's one unique global maximum and you can solve it by applying gradient ascent.

Stuff that bothers me: &#8220;100x faster than Hadoop&#8221;

The simple way to get featured on big data blog these days seem to be Build something that does 1 thing super well but nothing else Benchmark it against Hadoop Publish stats showing that it's 100x faster than Hadoop $$$ Spark claims their 100x faster than Hadoop and there's a lot of stats showing Redshift is 10x faster than Hadoop.

Calculating cosine similarities using dimensionality reduction

This was posted on the Twitter Engineering blog a few days ago: Dimension Independent Similarity Computation (DISCO) I just glanced at the paper, and there's some cool stuff going on from a theoretical perspective. What I'm curious about is why they didn't decide to use dimensionality reduction to solve such a big problem.

A neat little trick with time decay

Something that pops up pretty frequently is to implement time decay, especially where you have recursive chains of jobs. For instance, say you want to keep track of a popularity score. You calculate today's output by reading yesterday's output, discounting it by $$ exp(-lambda Delta T) $$ and then adding some hit count for today.