How to build up a data team (everything I ever learned about recruiting)

During my time at Spotify, I've reviewed thousands of resumes and interviewed hundreds of people. Lots of them were rejected but lots of them also got offers. Finally, I've also had my share of offers rejected by the candidate.

Recruiting is one of those things where the Dunning-Kruger effect is the most pronounced: the more you do it, the more you realize how bad you are at it. Every time I look back a year, I realize 10 things I did wrong. Extrapolating this, I know in another year I'll realize all the stupid mistakes I'm doing right now. Anyway, that being said, here are some things I learned from recruiting.

Getting the word out

Depending on where you work, people might have no clue about your company. Why would they work for something they have never heard of? Or alternatively – something they know of, but doesn't necessarily associate with cutting edge tech? There's a million companies out there doing cool stuff, so make sure that people know your company stands out. Blog, talk at meetups, open source stuff, go to conferences. I honestly don't know what works – I don't have any numbers. But you need to hedge your bets by attacking on all angles at the same time.

I think developers have a hard time justifying this just because success is not easily quantifiable – this is a branding exercise, and it's super hard to find out if you're doing the right thing. But over time if you do this right, you will get anecdotal feedback from candidates coming in saying they saw your presentation or read this cool story on Hacker News, or what not.

Finding the people

I don't think there's anything magic about this – just go through external recruiters, internal recruiters, job postings, connections, whatever.

Presenting the opportunity

I think most people in the industry are fed up with bad bulk messages over email/LinkedIn. Ideally, the hiring manager should introduce themselves, or for more senior roles having more senior people reaching out (all the way up to the CTO). If a recruiter is reaching out, it's super important to make sure the recruiter can reach out to people with a quick note on what's interesting about the team and why it's a good fit.

Finding the right candidates

Recruiting is some crazy type of active learning problem with this observation bias where you only see how well the people you hire are doing. In particular there was a lot of discussion a while back when Google claimed there was no correlation between test scores and GPA. I think there totally are really strong correlations on a macro scale, but if you are already filtering out people based on those criteria, obviously you will reduce the strength, or even reverse it. Not that I claim to have found any magic criteria. I do however thing the two most successful traits that I've observed are (with the risk of sounding cheesy):

  1. Programming fluency (10,000 hour rule or whatever) – you need to be able to visualize large codebases, and understand how things fit together. I strongly believe that data engineers need to understand the full stack from idea, to machine learning algorithm, to code running in production. I've seen other companies having a “throw it over the fence” attitude, with one team brainstorming algorithms, another team in another city implementing them. I think that's a flawed way to have a tight learning cycle. In particular, I'm hesitant to hire candidates who are strong on the theoretical side, but with little experience writing code. That's why I really avoid the “data science” label – most people within this group are generally lacking on the core programming side. I don't think this necessarily means candidates has to have a solid understanding of the CAP theorem and the linux page cache. The most important thing is they have written a lot of code, can work with nontrivial code bases, and can write clean, maintainable code. There is nothing magic to this – but a person who only has written Matlab scripts probably will have a harder time adjusting.
  2. Understand the big picture – go from a vision to a set of tasks, to a bunch of code being written. People have to be able to go from an idea (“analyze this data set and build an algorithm that uses it as a signal for the recommender system”) to code, without having to hand hold them throughout every single step. People who need to be given small tasks rather than the underlying problem will never understand why we're working on things, and will inevitably end up doing the wrong thing.