Articles tagged with statistics



Mortality statistics and Sweden's "dry tinder" effect

We live in a year of about 350,000 amateur epidemiologists and I have no desire to join that “club”. But I read something about COVID-19 deaths that I thought was interesting and wanted to see if I could replicated it through data.

How to hire smarter than the market: a toy model

Let's consider a toy model where you're hiring for two things and that those are equally valuable. It's not very important what those are, so let's just call them “thing A” and “thing B” for now.

The hacker's guide to uncertainty estimates

It started with a tweet: New years resolution: every plot I make during 2018 will contain uncertainty estimates — Erik Bernhardsson (@bernhardsson) January 7, 2018 Why? Because I've been sitting in 100,000,000 meetings where people endlessly debate whether the monthly number of widgets is going up or down, or whether widget method X is more productive than widget method Y.

Learning from users faster using machine learning

I had an interesting idea a few weeks ago, best explained through an example. Let's say you're running an e-commerce site (I kind of do) and you want to optimize the number of purchases. Let's also say we try to learn as much as we can from users, both using A/B tests but also using just basic slicing and dicing of the data.

The half-life of code & the ship of Theseus

As a project evolves, does the new code just add on top of the old code? Or does it replace the old code slowly over time? In order to understand this, I built a little thing to analyze Git projects, with help from the formidable GitPython project.

Subway waiting math

Why does it suck to wait for things? In a previous post I analyzed a NYC subway dataset and found that at some point, quite early, it's worth just giving up. This isn't a proof that the subway doesn't run on time – in fact it might actually proves that the subway runs really well.

NYC subway math

Apparently MTA (the company running the NYC subway) has a real-time API. My fascination for the subway takes autistic proportions and so obviously I had to analyze some of the data. The documentation is somewhat terrible, but here's some relevant code for how to use the API:

More MCMC – Analyzing a small dataset with 1-5 ratings

I've been obsessed with how to iterate quickly based on small scale feedback lately. One awesome website I encountered is Usability Hub which lets you run 5 second tests. Users see your site for 5 seconds and you can ask them free-form questions afterwards.

MCMC for marketing data

The other day I was looking at marketing spend broken down by channel and wanted to compute some simple uncertainty estimates. I have data like this: <th> Total spend </th> <th> Transactions </th> Channel A <td> 2292.

It's called Berkson's paradox!

As noted by multiple tweets, my previous post describes a phenomenon denoted Berkson's paradox. Here's another example: Why Are Handsome Men Such Jerks?