The relationship between commit size and commit message size

Screen Shot 2015-02-24 at 8.56.35 PM

Wow I guess it was more than a year ago that I tweeted this. Crazy how time flies by. Anyway, here’s my rationale:

  • When I update one line of code I feel like I have to put in a long explanation about its side effects, why it’s fully backwards compatible, and why it fixes some issue #xyz.
  • When I refactor 500 lines of code, I get too lazy to write anything sensible, so I just put “refactoring FooBarController”. Note: don’t do at home!

I decided to plot the relationship for Luigi:

test

The plot is clickable! Check it out! It’s an old school image map which is pretty pathetic, since no one has used it since 1997, but it was just so much easier for this task. Hover over any point to see the commit message and click on it to jump to the commit on Github.

As you can see, there’s essentially no relationship between the two values. Not as spectacular as I was hoping for, but still kind of weird/interesting.

Code is here if you’re curious!

My favorite management failures

For most people straight out of school, work life is a bit of a culture shock. For me it was an awesome experience, but a lot of the constraints were different and I had to learn to optimize for different things. It wasn’t necessarily the technology that I struggled with. The hardest part was how to manage my own projects and my time, as well as how to grow and make impact as an engineer. I’ve listed some of my biggest mistakes, which are also mistakes I see other (mostly junior) engineers make.

 

Having the wrong scope

How do you know what’s the right amount of work to spend on a project? I had horrible intuition about this coming out of school. One thing I think is helpful is to think of the relationship between time spent and impact. For a given project, it looks something like this:

time impact

It usually ends up being a concave function.

How do you pick a point on this curve? If you only have one task then it’s usually pretty easy because you have some constraint on total time or total impact. In school usually you work on some task until it hits a certain y value (problem is solved) or until it hits a certain x value (time to turn in what you have).

The problem is in real life you actually need to pick not just one point on one curve but a points on each many curves. Actually an infinite number of curves. And you need to pick these points subject to the constraint that you get the maximum value per time invested.

time impact 2 copy

 

This is a much harder problem! It means the amount of time we spend on task A is actually determined not just by how hard task A is but how hard an infinite number of other tasks are.

Let’s get mathematical here: for this concave optimization problem you can show that the marginal impact of each task should be identical. (I really want to write a book some day called The Mathematics of Project Management)

time impact 3

This means: recognize when the marginal impact of spending more time on a project starts to get low and you get more marginal impact elsewhere. Or just think: is this already good enough to deliver user value? Then take a break and look at the whole portfolio of possible task: ignoring what I have done so far, what’s the highest impact next thing I can do?

Focusing only on the things you are supposed to focus on

This might sound weird. What are you supposed to do at work? Most of the time you should probably do what your team/manager told you to do. But guess what? Your team/manager is not an all-seeing all-knowing oracle. Sometimes you might actually have a better idea of what to do.

Your sole purpose of working somewhere is to deliver value for the company. Completing a task from the backlog is a great proxy for that. But it’s still a proxy and as such has no intrinsic value. In many cases there might be even higher leverage things that no one will tell you to do. For instance, look around you. Is the team struggling with some old framework? Can you help someone get unblocked?

I like people to come in every morning and ask themselves: what is the highest impact thing I can do to for the company today? And do that. If you think about it, task backlogs is a completely artificial construct needed because we don’t have perfect information.

This gets especially important if you are interested in management roles. The higher up you get, the less people are going to tell you what to do.

Silly obligatory visualization:

things to do

Focusing only on low-leverage activities

There’s only that much leverage you get by being an individual contributor. Even if you’re a 10x engineer. Look around you for things with a force multiplier built in. Usually that means applying something to the entire team. Are you using the wrong language for the tool? Spend a few days investigating something else, introduce it to the team, and watch the whole team move twice as fast.

Screen Shot 2015-02-22 at 1.32.23 PM

I used to work with Neville Li at Spotify who was a genius at finding these opportunities. He also never did what you told him to. Instead, he would spend days reading blogs and trying new technologies. Every few months he would find something that made the whole team 2x as much productive. Then he would organize a workshop, introduce it to the team, and move on.

Not realizing technology isn’t just a job

This is probably my most cynical note, or optimistic, depending on how you look at it.

The truth is, software engineering isn’t just a normal job. It’s a life style. It’s also a field that keeps changing from year to year. If you want to be successful, you need to stay up to date. If you want to be above average, you need to do things like:

  • Working on side projects
  • Reading tech blogs
  • Following influencers on Twitter
  • Going to meetups
  • Reading papers
  • Etc

Being a software engineer is a fantastic career in many ways. With lots of freedom comes a lot of responsibilities. If you want to stay fresh, you need to invest a fair amount of your spare time.

Not drawing diagrams on glass walls

This is a no-brainer. Everyone knows that solid software engineers work draw everything on glass walls. And they also write everything flipped horizontally because it’s cooler.

cloud

Summary

I love technology. Go write some kick ass code now.

 

Leaving Spotify

Febrary 6 was my last day at Spotify. In total I spent more than six years at Spotify and it was an amazing experience.

I joined Spotify in Stockholm in 2008, mainly because a bunch of friends from programming competitions had joined already. Their goal to change music consumption seemed ridiculous at that point, but six years later I think it’s safe to say they actually succeeded.

Back in the early days, my job was to do almost anything related to data. I think the range of tasks that I was responsible for has now grown into 100+ people at Spotify. My day to day tasks was all over the map: Hadoop maintenance, Powerpoint presentations, label reporting, ran A/B tests, optimized ad delivery, did ad delivery forecasts, built music recommendations, and much more (for most of that time we were actually three people though, not just me).

It was an amazing learning experience to see a company grow this way. I think a company goes through different challenges at every stage, both technically and organizationally (honestly a lot more of the latter compared to the former).

Pushing the button, launching Spotify to the world (late 2008)
Pushing the button, launching Spotify to the world (late 2008)

 

Figuring out the cable situation (2009)
Figuring out the cable situation (2009)

I’ve been craving to go back and go through the same journey again, so I’ve joined a small startup in NYC as the head of engineering. I will share more details soon. Hopefully this time will be an opportunity to apply all those things I learned at Spotify.

Oskar Stål, the CTO of Spotify and a great mentor, would always tell me that I have to decide between machine learning and the “CTO ladder” at some point. I made a conscious decision right now to focus more on management and building teams. I think this might be the topic of some future blog post, but not now.

What’s going to happen to my open source projects such as Luigi and Annoy? Nothing should change, except I will have a lot less time to spend on it.

Stay tuned for more updates!

 

Everything I learned about technical debt

I just made it to Sweden suffering from jet lag induced insomnia, but this blog post will not cover that. Instead, I will talk a little bit about technical debt.

The concept of technical debt always resonated with me, partly because I always like the analogy with “real” debt. If you take the analogy really far, there are some curious implications. I always like to think of the “interest rate” of software development. Debt is really just borrowing from the future, with some interest rate. You are getting a free lunch right now, but you need to pay back 1.2 free lunches in a few months. That’s the interest rate. In a software project the equivalent could be to pick a database that will have scalability issues later, or to make all member variables of some class public. You are doing it because it makes it easier to do things now but you will have to pay the cost of that later.

A recent paper from Google stretches the analogy in its title: Machine Learning: The High-Interest Credit Card of Technical Debt. It focuses specifically on machine learning, but definitely read it if you are interested. A recent blog post challenges if tech debt is really “debt” in the strict sense (you borrow fixed amount and pay back slightly more) or if it has a more complicated structure: Bad code isn’t Technical Debt, it’s an Unhedged Call Option.

I like the blog post because it brings up something I have noticed many times. A lot of developers have this intuitive aversion towards tech debt and always want to fix anything that’s perceived as “hacky”. FooBarController is a 1,000 line mayhem that no one understands, we need to refactor it! But say FooBarController is a well separated component that you have no intent on ever modifying, then there’s really no reason to fix it. It’s almost always a waste of time to try to fix bad code or bad architecture unless you at least some idea of why it helps you in the future.

So in some cases it makes sense not to fix technical debt. In other cases, it makes sense to take on tech debt deliberately. Back to the interest rate analogy: if the interest rate is lower than the return of investment, you should borrow money from the bank. It’s fine to ship a product a year earlier with a hacky code, if you make a lot of money, and hire a ton of developers to clean it up. The concept of interest rate applies both to financing and software engineering.

In my experience, the biggest issues isn’t taking on technical debt or not. As long as you make a conscious decision to take on tech debt, and everyone agrees it’s tech debt that you might need to fix later, you’re in the clear. You will get problems if you build up technical debt without acknowledging it. I made a chart to make it clear:

Do you think you are taking on tech debt?
No Yes
Are you taking on tech debt? No Ok, cool Don’t worry so much!
Yes

The bottom left picture is Tony Soprano knocking on your door because he’s here to collect the debt you owe him. What happened is, you saw this investment (real estate?) that you thought would appreciate 10% year on year. You borrowed money from Tony, but you never realized you might have to pay it back. It turns out the interest rate was a lot more hefty than you thought, and now he wants it back a year later with 50% interest rate.

The bottom right picture is you going to the bank because you want to buy real estate. You examine the interest rates and make a decision to get a mortgage.

These pictures might not illustrate the point super well, because the bottom right also covers this situation: borrowing at a high interest rate because the return on investment is even higher. Maybe you know of this boxing match that’s already rigged, and it’s 5:1 odds. You won’t be able to borrow money from the bank, so you go to Tony Soprano and borrow it for a few days. Next week, you pay it back with some interest, but you still made a ton of money.

Back to software engineering. The example above is like shipping the v2.0 of your web shop on time, and it turns out to be much better for users. You sell twice as much now! But you also have a bunch of scripts you have to run manually every day. You clearly should automate those scripts later, and it might be really messy to do so, but it’s also clear that you can do that later. You made a deliberate decision to borrow some resources from the future, because the return of your investment was really high.

 

A brief history of Hadoop at Spotify

I was talking with some data engineers at Spotify and had a moment of nostalgia.

2008

I was writing my master’s thesis at Spotify and had to run a Hadoop job to extract some data from the logs. Every time I started running the job, I kept hearing this subtle noise. I kept noticing the correlation for a few days but I was too intimidated to ask. Finally people starting cursing that their machines had gotten really slow lately and I realized we were running Hadoop on the developer’s desktop machines. No one had told me. I think back then we had only GB’s of log data. I remember running less on the log and I would recognize half the usernames because they were my friends.

2009

We took a bunch of machines and put them on a pallet in the foosball room. It was a super hot Swedish summer and I kept running this matrix factorization job in Hadoop that would fail halfway through. The node on the top of the pile would crash and you had to reboot it. I suspected overheating. We had a fan running in the room but it wasn’t helping. Finally I realized the problem was the sun was shining in through the window.

Jon Åslund with our Hadoop cluster

I found a big sheet or blanket and some nails and a hammer and put it up over the window. I was finally able to run my matrix factorization job to completion after doing this. This is probably going to be my favorite bug fix until the day I die.

In the summer of 2009, we installed a 30-node Hadoop cluster in our data center in Stockholm. Finally a “real” cluster.

2011

More and more people started using Hadoop so we decided to move to Elastic Mapreduce. I uploaded all our logs to S3 and we put together some tooling so that you could run things on our own Hadoop cluster or on EC2 using the same source code. It was pretty beautiful but the performance wasn’t super great compared to how much we were paying.

Later in 2011 we had grown even more. We decided to move back to our own data center. We installed 500 nodes in our data center in London, later upgrading it to 700 and then 900 nodes.

Our fifth Hadoop cluster

I also implemented Luigi as a workflow engine with Mapreduce support in late 2011.

2012

There was this long-standing assumption (at least I had) that Hadoop jobs were I/O bound and thus the language didn’t matter. We were using Python for probably 95% of all jobs, with some stuff in Hive by the analytics team. During 2012 and forward, we started realizing Python isn’t the ideal language, both from a performance and usability point of view. Eventually we would end up switching to Crunch and Scalding. We still use Luigi as the workflow engine to glue everything together.

This is a super simplified history of everything that took place. Josh Baer and Rafal Wojdyla are talking about the Evolution of Hadoop at Spotify at Strata in February for the rest of the story!

Deep learning for… Go

This is the last post about deep learning for chess/go/whatever. But this really cool paper by Christopher Clark and Amos Storkey was forwarded to me by Michael Eickenberg. It’s about using convolutional neural networks to play Go. The authors of the paper do a much better job than I would ever have done of modeling move prediction in Go and show that their model beat certain Go engines.

The fascinating thing about this paper is that playing against other Go engines, they just plug in their move prediction function, with no deep search beyond one level. That means the total time it spends is a fraction of the opponents. Still, the fact that it plays so well speaks for its strength.

So what happened if we could plug this into a deep search framework? The authors suggest doing exactly that in the conclusion. State of the art of Go engines actually use Monte Carlo tree search rather than minimax but other than that, it’s the same principle.

I talked a bit with the authors and the main thing that you have to change is to switch from move prediction to an evaluation function. For my chess experiments, I found a (hacky) way to train a function that does both at the same time. There’s essentially two terms in my objective function: one is comparing the actual move with a random move, using a sigmoid:

\frac{P(q)}{P(q) + P(r)} = \frac{exp(f(q))}{exp(f(q)) + exp(f(r))}.

If you extend that to all possible random moves you actually get a full probability distribution (a softmax) over all possible next moves.

P(p \rightarrow q) = \frac{exp(f(q))}{\sum exp(f(r)) }.

Now, how do you “convert” that into an evaluation function? That’s the second term, which tries fit the negative parent score to the current score. We penalize the quantity f(p) + f(q) by throwing in two more sigmoids. It’s a “soft constraint” that has absolutely no probabilistic interpretation. This a hacky solution, but here’s how I justify it:

  1. Note that the evaluation functions are unique up to a monotonic transform, so we can actually mangle it quite a lot.
  2. The softmax distribution has one degree of freedom in how it chooses the quantities, so (I’m speculating) the artificial constraint does not change the probabilities.

I think you could do the exact thing with their Go engine. In fact I’m willing to bet a couple of hundred bucks that if you did that, you would end up with the best Go engine in the world.

Btw another fun thing was that they plot some of the filters and they seem as random as the ones I learn for Chess. But a clever trick enforcing symmetry seem to help the model quite a lot.