Articles tagged with Popular



Storm in the stratosphere: how the cloud will be reshuffled

cloud atmosphere layers

Here’s a theory I have about cloud vendors (AWS, Azure, GCP):

  1. Cloud vendors1 will increasingly focus on the lowest layers in the stack: basically leasing capacity in their data centers through an API.
  2. Other pure-software providers will build all the stuff on top of it. Databases, running code, you name it.

We currently have cloud vendors that offer end-to-end solutions from the developer experience down to the hardware:

Building a data team at a mid-stage startup: a short story

I guess I should really call this a parable.

The backdrop is: you have been brought in to grow a tiny data team (4 people) at a mid-stage startup ($10M annual revenue), although this story could take place at many different types of companies.

Software infrastructure 2.0: a wishlist

Software infrastructure (by which I include everything ending with *aaS, or anything remotely similar to it) is an exciting field, in particular because (despite what the neo-luddites may say) it keeps getting better every year! I love working with something that moves so quickly.

Never attribute to stupidity that which is adequately explained by opportunity cost

Hanlon’s razor is a classic aphorism I’m sure you have heard before: Never attribute to malice that which can be adequately explained by stupidity.

I’ve found that neither malice nor stupidity is the most common reason when you don’t understand why something is in a certain way. Instead, the root cause is probably just that they didn’t have time yet. This happens all the time at startups (maybe a bit less at big companies, for reasons I’ll get back to).

Why software projects take longer than you think: a statistical model

Anyone who built software for a while knows that estimating how long something is going to take is hard. It’s hard to come up with an unbiased estimate of how long something will take, when fundamentally the work in itself is about solving something. One pet theory I’ve had for a really long time, is that some of this is really just a statistical artifact.

The hacker's guide to uncertainty estimates

It started with a tweet:

Why? Because I’ve been sitting in 100,000,000 meetings where people endlessly debate whether the monthly number of widgets is going up or down, or whether widget method X is more productive than widget method Y. For almost any graph, quantifying the uncertainty seems useful, so I started trying. A few months later:

I don't want to learn your garbage query language

This is a bit of a rant but I really don’t like software that invents its own query language. There’s a trillion different ORMs out there. Another trillion databases with their own query language. Another trillion SaaS products where the only way to query is to learn some random query DSL they made up.

The software engineering rule of 3

Here’s a dumb extremely accurate rule I’m postulating* for software engineering projects: you need at least 3 examples before you solve the right problem.

This is what I’ve noticed:

  1. Don’t factor out shared code between two classes. Wait until you have at least three.
  2. The two first attempts to solve a problem will fail because you misunderstood the problem. The third time it will work.
  3. Any attempt at being smart earlier will end up overfitting to coincidental patterns.

(Note that #1 and #2 are actually pretty different implications. But let’s get back to that later.)

The eigenvector of "Why we moved from language X to language Y"

I was reading yet another blog post titled “Why our team moved from <language X> to <language Y>” (I forgot which one) and I started wondering if you can generalize it a bit. Is it possible to generate a N * N contingency table of moving from language X to language Y?

The half-life of code & the ship of Theseus

trireme

As a project evolves, does the new code just add on top of the old code? Or does it replace the old code slowly over time? In order to understand this, I built a little thing to analyze Git projects, with help from the formidable GitPython project. The idea is to go back in history historical and run a git blame (making this somewhat fast was a bit nontrivial, as it turns out, but I’ll spare you the details, which involve some opportunistic caching of files, pick historical points spread out in time, use git diff to invalidate changed files, etc).

Subway waiting math

Why does it suck to wait for things? In a previous post I analyzed a NYC subway dataset and found that at some point, quite early, it’s worth just giving up.

This isn’t a proof that the subway doesn’t run on time – in fact it might actually proves that the subway runs really well. The numbers indicate that it’s not worth waiting after 10 minutes, but it’s a rare event and usually involves something extraordinary like a multi-hour delay. You should roughly give up after some point related to the normal train frequency, and 10 minutes is not a lot at all. Conversely if the trains ran hourly, it probably would had been worth waiting an hour or more. My analysis gave me a lot of respect for the job MTA is doing.

NYC subway math

Apparently MTA (the company running the NYC subway) has a real-time API. My fascination for the subway takes autistic proportions and so obviously I had to analyze some of the data. The documentation is somewhat terrible, but here’s some relevant code for how to use the API:

Exploding offers are bullshit

Time bomb

I do a lot of recruiting and have given maybe 50 offers in my career. Although many companies do, I never put a deadline on any of them. Unfortunately, I’ve often ended up competing with other companies who do, and I feel really bad that this usually tricks younger developers into signing offers. On numerous occasions, I’ve gotten an email halfway through the interview process

Analyzing 50k fonts using deep neural networks

For some reason I decided one night I wanted to get a bunch of fonts. A lot of them. An hour later I had a bunch of scrapy scripts pulling down fonts and a few days later I had more than 50k fonts on my computer.

I believe in the 10x engineer, but...

  • The easiest way to be a 10x engineer is to make 10 other engineers 2x more efficient. Someone can be a 10x engineer if they do nothing for 364 days then convinces the team to change programming language to a 2x more productive language.
  • A motivated 10x engineer in one team could be a demotivated 0.5x engineer in another team (and vice versa).
  • A average 1x engineer could easily become a 5x engineer if surrounded by 10x engineers. Engagement and work ethics is contagious.
  • The cynical reason why 10x engineers aren’t paid 10x more salary is that there is no way for the new employer to know. There is no “10x badge”.
  • …but also, a 10x engineer can go to a new company and become an 1x engineer because of bad focus / bad engagement / tech stack mismatch.
  • So unfortunately there’s less economic rationality for companies to pay 10x salaries to 10x engineers (contrary to what Google or Netflix says)
  • There’s no such thing as a 10x engineer spending time on something that never ends up delivering business value. If something doesn’t deliver business value, it’s 0x.
  • If you build something that the average engineer would not have been able to build, no matter how much time, that can make you 100x or 1000x, or ∞x. Quoting Alexander Scott: There is no number of ordinary eight-year-olds who, when organized into a team, will become smart enough to beat a grandmaster in chess.
  • Most of the 10x factor is most likely explained by team and company factors (process, tech stack, etc) and applies to everyone in the team/company. Intra-team variation is thus much smaller than 10x (even controlling for the fact that companies tend to attract people of equal caliber). Nature vs nurture…
  • I’ve never met the legendary “10x jerk”. Anecdotally the outperforming engineers are generally nice and humble.
  • Don’t get hung up on the exact numbers here, it’s just for illustration purposes. I.e. someone introduced a bug in the trading system of Knight Capital that made them lose $465M in 30 minutes. Did that make it a -1,000,000x engineer? (and btw it had more to do with company culture). The numbers aren’t meant to be taken literally.

image