Erik Bernhardsson    About

Iterate or die

Here’s a conclusion I’ve made building consumer products for many years: the speed at which a company innovates is limited by its iteration speed.

I don’t even mean throughput here. I just mean the cycle time. Invoking Little’s law this is also related to the total inventory of features not being deployed yet.

In a hypothetical scenario, clone two teams of identical engineers and split them up in two groups A and B. Actually, clone B another nine times so that they are 10x larger. Give them the exact same tools and the same problems to solve, but team B can only deploy code every 3 months whereas A deploys multiple times per day. I bet team A will outperform B in terms of delivering business value, even though they are 10 times smaller.

I don’t have a proof for this, it’s really just a grumpy coder speculating. The way I visualize it is applying stochastic gradient descent on a function that keeps on changing. If you used SGD, you know that sometimes a lower learning rate can help you converge faster, especially if you can evaluate the function a lot more often.

image

Iterate or die

A large set of companies do not have iteration in its DNA. I’m particularly excited about this because I’m trying to compete in the banking space.

Let’s consider how a big bank builds a new product. First, they would conduct customer research. Then, come up with a set of requirements. Then, make up a big ass GANTT chart and put 100s of engineers on it. 1-2 years after the project is started, they launch it, together with a huge spend on marketing. Under the year that has passed, nothing has been learned that wasn’t known from the start. Every assumption that was made from scratch will take a year to validate.

It’s like looking at a map and then trying to run through a forest for ten hours. Inevitably you are going to be very far from your target.

A modern day software shop is not a manufacturing plant. It probably resembles applied research a lot more. You can’t plan research. You need to embrace uncertainty and iterate based on new information all the time. Forget about the six month plan you have.

Sometimes it’s good to stop and look at the map. Your total speed is lower, but you can adjust your directions all the time.

image

Apparently the army realized this and have invented a bunch of methods like F3EA: Contemporary warfare challenges the practice of targeting and the philosophy of its purpose, promoting a shift from targeting for effect to targeting to learn. It’s interesting that the US army has learned this the hard way: a small group of insurgents changing tactics very quickly is a serious threat to the 10x larger American war machine.

Building modern software

What kind of things do you learn? There’s a lot of schools here. Highly anecdotal qualitative can be great directional evidence. Once you get closer to the optimum, you need to shift to A/B tests. Some people preach that the product and tech team should do customer support. Other people think the CEO’s job is to email the first 1000 customers. At my company, we spend a lot of time looking at user sessions in FullStory and it’s been a fantastic tool to see issues. I also talk to customers over our chat feature. Sometimes quantitative methods work, sometimes qualitative.

The other day a customer told me he ran into this bug and within an hour, I had deployed a fix. This is literally a 5000x faster iteration cycle than a big bank (5000 hours is a bit more than 6 months). It’s the equivalent of walking through the forest, constantly checking the surroundings and making sure it matches the map. We might walk a bit slower (because our tech team is 10x smaller than most banks), but we learn new things at a much higher rate.

imageSeriously if I had more time I would start a Tumblr featuring people drawing horizontally flipped content on glass

I think most of failed IT projects can be traced back to this. It reminds me of Webvan’s epic failure. Instead of iterating quickly and learning, the company embarked on a “scale at any cost” strategy. Or the new system for the Swedish police (in Swedish) that had to be shut down after wasting $1B taxpayer money. Or the epic IT project to reform the UK health records that spend about £12B before shutting down.

What I’m saying isn’t exactly inter-universal Teichmüller theory, but interestingly even modern companies struggle with this. Spotify generally iterates pretty well but has had its fair share of big bang projects that would go on for a year without learning anything from real users. The product them would keep hypothesizing and develop an even more elaborate model for the bet.

But Erik. What about Agile?

A process where you deploy code at best every 3 weeks is really mini-waterfall. I like the basic tenets but I find that it encourages cargo cult behavior and distracts from the focus on the end result – creating a tight feedback loop and building a learning machine. Many companies use Agile as a way to deliver software often, but not as a way to learn quickly. If you’re not constantly monitoring usage and adjusting your path then you are not learning.

Why is my team moving slowly?

One antipattern is to compensate lack of productivity with hiring. This is likely to get throughput up marginally, but cycle time down by a lot. I have seen companies with broken organization models, like having a “ML theory” team in NYC and an “ML implementation” team in another state. Good luck iterating quickly and learning fast. An organization has to be designed to minimize the information latency.

Another antipattern is to attribute project failure to lack of planning, and add even more planning. This 2005 article Why software fails is a great example of the wrong conclusion. The authors of the article should obviously have asked me – projects rarely fail because of bad planning – they fail because planning might have been futile in the first place. There’s only so much you can prepare for.

image

Erik? Why so many strong feelings?

I think a lot about this because it’s the only way my company can win. We’re a bank and I’m pretty sure we are the only bank in the world that does continuous deployment, deploying code 20-30 times every day.

The word “paradigm” has an annoying ring to it, but actually think there is a new one of how to build a consumer product. Tech companies spent the last 10 years figuring out how to build a learning machine. If your company is based on scale advantages, that’s great, but unless you learn how to iterate quickly, you’re going to be dead.