<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Home on Erik Bernhardsson</title>
    <link>https://erikbern.com/index.html</link>
    <description>Recent content in Home on Erik Bernhardsson</description>
    <generator>Hugo</generator>
    <language>en</language>
    <atom:link href="https://erikbern.com/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Software companies buying software: a story of ecosystems and vendors</title>
      <link>https://erikbern.com/2026/02/25/software-companies-buying-software-from-software-companies.html</link>
      <pubDate>Wed, 25 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2026/02/25/software-companies-buying-software-from-software-companies.html</guid>
      <description>&lt;p&gt;Why are new startups growing so fast?&#xA;Why is the wage distribution getting larger for software engineers?&#xA;Why do I love infrastructure?&#xA;Is open source dead?&lt;/p&gt;&#xA;&lt;p&gt;I think a lot of these things, and some more things, all have their roots in a big shift in how we build software.&#xA;Or rather, how we &lt;em&gt;buy&lt;/em&gt; software.&#xA;Software development today is a lot about using the right vendors rather than building technology yourself.&lt;/p&gt;</description>
    </item>
    <item>
      <title>It&#39;s hard to write code for computers, but it&#39;s even harder to write code for humans</title>
      <link>https://erikbern.com/2024/09/27/its-hard-to-write-code-for-humans.html</link>
      <pubDate>Fri, 27 Sep 2024 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2024/09/27/its-hard-to-write-code-for-humans.html</guid>
      <description>&lt;p&gt;Writing code for a computer is hard enough.&#xA;You take something big and fuzzy, some large vague business outcome you want to achive.&#xA;Then you break it down recursively and think about all the cases until you have clear logical statements a computer can follow.&#xA;Computers are very good at following logical statements.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Predicting solar eclipses with Python</title>
      <link>https://erikbern.com/2024/04/07/predicting-solar-eclipses-with-python.html</link>
      <pubDate>Sun, 07 Apr 2024 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2024/04/07/predicting-solar-eclipses-with-python.html</guid>
      <description>&lt;p&gt;As I am en route to see my first total solar eclipse, I was curious how hard it would be to compute eclipses in Python.&#xA;It turns out, ignoring some minor coordinate system head-banging, I was able to get something half-decent working in a couple of hours.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Simple sabotage for software</title>
      <link>https://erikbern.com/2023/12/13/simple-sabotage-for-software.html</link>
      <pubDate>Wed, 13 Dec 2023 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2023/12/13/simple-sabotage-for-software.html</guid>
      <description>&lt;p&gt;CIA produced a fantastic book during the peak of World War 2 called &lt;a href=&#34;https://www.cia.gov/static/5c875f3ec660e092cf893f60b4a288df/SimpleSabotage.pdf&#34;&gt;Simple Sabotage&lt;/a&gt;. It laid out various ways for infiltrators to ruin productivity of a company. Some of the advice is timeless, for instance the section about &amp;ldquo;General interference with Organizations and Production&amp;rdquo;:&lt;/p&gt;</description>
    </item>
    <item>
      <title>What I have been working on: Modal</title>
      <link>https://erikbern.com/2022/12/07/what-ive-been-working-on-modal.html</link>
      <pubDate>Wed, 07 Dec 2022 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2022/12/07/what-ive-been-working-on-modal.html</guid>
      <description>&lt;p&gt;&lt;em&gt;Long story short:&lt;/em&gt; I&amp;rsquo;m working on a super cool tool called &lt;a href=&#34;https://modal.com&#34;&gt;Modal&lt;/a&gt;. Please check it out — it lets you run things in the cloud without having to think about infrastructure. Scaling out, scheduling, containerization, using GPUs, setting up webhooks, and all kinds of other stuff. It&amp;rsquo;s primarily meant for data teams. We aren&amp;rsquo;t &lt;em&gt;quite&lt;/em&gt; live, but you can sign up for our waitlist.&lt;/p&gt;</description>
    </item>
    <item>
      <title>We are still early with the cloud: why software development is overdue for a change</title>
      <link>https://erikbern.com/2022/10/19/we-are-still-early-with-the-cloud.html</link>
      <pubDate>Wed, 19 Oct 2022 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2022/10/19/we-are-still-early-with-the-cloud.html</guid>
      <description>&lt;p&gt;This is is in many respects a successor to a&#xA;&lt;a href=&#34;https://erikbern.com/2021/04/19/software-infrastructure-2.0-a-wishlist.html&#34;&gt;blog post I wrote last year&lt;/a&gt;&#xA;about what I want from software infrastructure, but the ideas morphed in my head into something sort of wider.&lt;/p&gt;&#xA;&lt;h2 id=&#34;the-genesis&#34;&gt;The genesis&lt;/h2&gt;&#xA;&lt;p&gt;I encountered AWS in 2006 or 2007 and remember thinking that it&amp;rsquo;s crazy — why would anyone want to put their stuff in someone else&amp;rsquo;s data center?&#xA;But only a couple of years later, I was running a bunch of stuff on top of AWS.&lt;/p&gt;</description>
    </item>
    <item>
      <title>σ-driven project management: when is the optimal time to give up?</title>
      <link>https://erikbern.com/2022/04/05/sigma-driven-project-management-when-is-the-optimal-time-to-give-up.html</link>
      <pubDate>Tue, 05 Apr 2022 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2022/04/05/sigma-driven-project-management-when-is-the-optimal-time-to-give-up.html</guid>
      <description>&lt;p&gt;Hi! It&amp;rsquo;s your friendly project management theorician. You might remember me from blog posts such as &lt;a href=&#34;https://erikbern.com/2019/04/15/why-software-projects-take-longer-than-you-think-a-statistical-model.html&#34;&gt;Why software projects take longer than you think&lt;/a&gt;, which is a blog post I wrote a long time ago positing that software projects completion time follow a log-normal distribution.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Storm in the stratosphere: how the cloud will be reshuffled</title>
      <link>https://erikbern.com/2021/11/30/storm-in-the-stratosphere-how-the-cloud-will-be-reshuffled.html</link>
      <pubDate>Tue, 30 Nov 2021 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2021/11/30/storm-in-the-stratosphere-how-the-cloud-will-be-reshuffled.html</guid>
      <description>&lt;p&gt;&lt;img src=&#34;https://erikbern.com/assets/cloud-atmosphere-layers.jpeg&#34; alt=&#34;cloud atmosphere layers&#34;&gt;&lt;/p&gt;&#xA;&lt;p&gt;Here&amp;rsquo;s a theory I have about cloud vendors (AWS, Azure, GCP):&lt;/p&gt;&#xA;&lt;ol&gt;&#xA;&lt;li&gt;Cloud vendors&lt;sup id=&#34;fnref:1&#34;&gt;&lt;a href=&#34;#fn:1&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;1&lt;/a&gt;&lt;/sup&gt; will increasingly focus on the lowest layers in the stack: basically leasing capacity in their data centers through an API.&lt;/li&gt;&#xA;&lt;li&gt;Other pure-software providers will build all the stuff on top of it. Databases, running code, you name it.&lt;/li&gt;&#xA;&lt;/ol&gt;&#xA;&lt;p&gt;We currently have cloud vendors that offer end-to-end solutions from the developer experience down to the hardware:&lt;/p&gt;</description>
    </item>
    <item>
      <title>What is the right level of specialization? For data teams and anyone else.</title>
      <link>https://erikbern.com/2021/07/23/what-is-the-right-level-of-specialization.html</link>
      <pubDate>Fri, 23 Jul 2021 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2021/07/23/what-is-the-right-level-of-specialization.html</guid>
      <description>&lt;p&gt;This isn&amp;rsquo;t as much of a blog post as an elaboration of a tweet I posted the other day:&lt;/p&gt;&#xA;&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;I think this specialization of data teams into 99 different roles (data scientist, data engineer, analytics engineer, ML engineer etc) is generally a bad thing driven by the fact that tools are bad and too hard to use&lt;/p&gt;</description>
    </item>
    <item>
      <title>Building a data team at a mid-stage startup: a short story</title>
      <link>https://erikbern.com/2021/07/07/the-data-team-a-short-story.html</link>
      <pubDate>Wed, 07 Jul 2021 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2021/07/07/the-data-team-a-short-story.html</guid>
      <description>&lt;p&gt;I guess I should really call this a parable.&lt;/p&gt;&#xA;&lt;p&gt;The backdrop is: you have been brought in to grow a tiny data team (&lt;del&gt;4 people) at a mid-stage startup (&lt;/del&gt;$10M annual revenue), although this story could take place at many different types of companies.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Software infrastructure 2.0: a wishlist</title>
      <link>https://erikbern.com/2021/04/19/software-infrastructure-2.0-a-wishlist.html</link>
      <pubDate>Mon, 19 Apr 2021 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2021/04/19/software-infrastructure-2.0-a-wishlist.html</guid>
      <description>&lt;p&gt;Software infrastructure (by which I include everything ending with *aaS, or anything remotely similar to it) is an exciting field, in particular because (despite what the neo-luddites may say) it keeps getting better every year! I love working with something that moves so quickly.&lt;/p&gt;</description>
    </item>
    <item>
      <title>What&#39;s Erik up to?</title>
      <link>https://erikbern.com/2021/04/01/whats-erik-up-to.html</link>
      <pubDate>Thu, 01 Apr 2021 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2021/04/01/whats-erik-up-to.html</guid>
      <description>&lt;p&gt;I joined &lt;a href=&#34;https://better.com&#34;&gt;Better&lt;/a&gt; in early 2015 because I thought the team was crazy enough to actually change one of the largest industries in the US. For six years, I ran the tech team, hiring 300+ people, probably doing 2,000+ interviews, and according to GitHub I added 646,941 lines of code and removed 339,164. But I also got married, had two kids, bought an apartment and renovated it! From time to time, there was some &lt;em&gt;intense&lt;/em&gt; periods of hard work.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Giving more tools to software engineers: the reorganization of the factory</title>
      <link>https://erikbern.com/2020/12/16/giving-more-tools-to-software-engineers-the-reorganization-of-the-factory.html</link>
      <pubDate>Wed, 16 Dec 2020 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2020/12/16/giving-more-tools-to-software-engineers-the-reorganization-of-the-factory.html</guid>
      <description>&lt;p&gt;&lt;img src=&#34;https://erikbern.com/assets/power-loom.jpeg&#34; alt=&#34;power loom&#34;&gt;&lt;/p&gt;&#xA;&lt;p&gt;It&amp;rsquo;s a popular attitude among developers to rant about our tools and how broken things are. Maybe I&amp;rsquo;m an optimistic person, because my viewpoint is the complete opposite! I had my first job as a software engineer in 1999, and in the last two decades I&amp;rsquo;ve seen software engineering changing in ways that have made us orders of magnitude more productive. Just some examples from things I&amp;rsquo;ve worked on or close to:&lt;/p&gt;</description>
    </item>
    <item>
      <title>Developer experience as a competitive advantage</title>
      <link>https://erikbern.com/2020/10/06/developer-experience-as-a-competitive-advantage.html</link>
      <pubDate>Tue, 06 Oct 2020 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2020/10/06/developer-experience-as-a-competitive-advantage.html</guid>
      <description>&lt;p&gt;I spent a ton of time looking at different software providers, both as a CTO, and as a &lt;del&gt;nerd&lt;/del&gt; &amp;ldquo;advanced&amp;rdquo; consumer who builds stuff in my spare time. In the last 10 years, there has been an order of magnitude more products that cater directly to developers, through APIs, SDKs, and tooling. I&amp;rsquo;m pretty psyched about this trend. As the cost of building software goes down, that drives up the demand for software engineers. That then drives up the demand for even more products built &lt;em&gt;for software engineers&lt;/em&gt;. That then drives down the cost of building software even more!&lt;/p&gt;</description>
    </item>
    <item>
      <title>Mortality statistics and Sweden&#39;s &#34;dry tinder&#34; effect</title>
      <link>https://erikbern.com/2020/09/23/mortality-statistics-and-swedens-dry-tinder-effect.html</link>
      <pubDate>Wed, 23 Sep 2020 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2020/09/23/mortality-statistics-and-swedens-dry-tinder-effect.html</guid>
      <description>&lt;p&gt;We live in a year of about 350,000 amateur epidemiologists and I have no desire to join that &amp;ldquo;club&amp;rdquo;. But I read &lt;a href=&#34;https://www.aier.org/article/swedens-high-covid-death-rates-among-the-nordics-dry-tinder-and-other-important-factors/&#34;&gt;something about COVID-19 deaths&lt;/a&gt; that I thought was interesting and wanted to see if I could replicated it through data. Basically the claim is that Sweden had an exceptionally &amp;ldquo;good&amp;rdquo; year in 2019 in terms of influenza deaths causing there to be more deaths &amp;ldquo;overdue&amp;rdquo; in 2020.&lt;/p&gt;</description>
    </item>
    <item>
      <title>How to set compensation using commonsense principles</title>
      <link>https://erikbern.com/2020/06/08/how-to-set-compensation-using-commonsense-principles.html</link>
      <pubDate>Mon, 08 Jun 2020 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2020/06/08/how-to-set-compensation-using-commonsense-principles.html</guid>
      <description>&lt;p&gt;Compensation has always been one of the most confusing parts of management to me. Getting it right is obviously &lt;em&gt;extremely&lt;/em&gt; important. Compensation is what drives our entire economy, and you could look at the market for labor as one gigantic resource-allocating machine in the same way as people look at the stock market as a gigantic resource-allocating machine for investments.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Never attribute to stupidity that which is adequately explained by opportunity cost</title>
      <link>https://erikbern.com/2020/03/10/never-attribute-to-stupidity-that-which-is-adequately-explained-by-opportunity-cost.html</link>
      <pubDate>Tue, 10 Mar 2020 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2020/03/10/never-attribute-to-stupidity-that-which-is-adequately-explained-by-opportunity-cost.html</guid>
      <description>&lt;p&gt;&lt;a href=&#34;https://en.wikipedia.org/wiki/Hanlon%27s_razor&#34;&gt;Hanlon&amp;rsquo;s razor&lt;/a&gt; is a classic aphorism I&amp;rsquo;m sure you have heard before: &lt;em&gt;Never attribute to malice that which can be adequately explained by stupidity.&lt;/em&gt;&lt;/p&gt;&#xA;&lt;p&gt;I&amp;rsquo;ve found that neither malice nor stupidity is the most common reason when you don&amp;rsquo;t understand why something is in a certain way. Instead, the root cause is probably just that &lt;em&gt;they didn&amp;rsquo;t have time yet&lt;/em&gt;. This happens all the time at startups (maybe a bit less at big companies, for reasons I&amp;rsquo;ll get back to).&lt;/p&gt;</description>
    </item>
    <item>
      <title>How to hire smarter than the market: a toy model</title>
      <link>https://erikbern.com/2020/01/13/how-to-hire-smarter-than-the-market-a-toy-model.html</link>
      <pubDate>Mon, 13 Jan 2020 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2020/01/13/how-to-hire-smarter-than-the-market-a-toy-model.html</guid>
      <description>&lt;p&gt;Let&amp;rsquo;s consider a toy model where you&amp;rsquo;re hiring for two things and that those are equally valuable. It&amp;rsquo;s not very important what those are, so let&amp;rsquo;s just call them &amp;ldquo;thing A&amp;rdquo; and &amp;ldquo;thing B&amp;rdquo; for now. For one set of abilities, the scatter plot looks like this:&lt;/p&gt;</description>
    </item>
    <item>
      <title>What can startups learn from Koch Industries?</title>
      <link>https://erikbern.com/2019/12/19/what-can-startups-learn-from-koch-industries.html</link>
      <pubDate>Thu, 19 Dec 2019 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2019/12/19/what-can-startups-learn-from-koch-industries.html</guid>
      <description>&lt;p&gt;I recently finished the excellent book &lt;a href=&#34;https://www.amazon.com/Kochland-History-Industries-Corporate-America/dp/1476775389&#34;&gt;Kochland&lt;/a&gt;. This isn&amp;rsquo;t my first interest in Koch—I read &lt;a href=&#34;https://www.amazon.com/Science-Success-Market-Based-Management-Largest/dp/0470139889/ref=asc_df_0470139889/&#34;&gt;The Science of Success&lt;/a&gt; by Charles Koch himself a couple of years ago.&lt;/p&gt;&#xA;&lt;p&gt;Charles Koch inherited a tiny company in 1967 and turned it into one of the world&amp;rsquo;s largest ones. That&amp;rsquo;s impressive! Just a quick disclaimer just to get it out of the way. You may know the Koch brothers as the climate deniers who funded the Tea Party. I don&amp;rsquo;t understand this disconnect between being so brilliant in one field, and extremely ignorant in another. But my curiosity tells me there&amp;rsquo;s something worth learning from most notable people, &lt;em&gt;despite what I may think of their opinions&lt;/em&gt; and Koch Industries turns out ot be a particularly interesting case study.&lt;/p&gt;</description>
    </item>
    <item>
      <title>We&#39;re hiring at Better</title>
      <link>https://erikbern.com/2019/12/09/hiring-at-better.html</link>
      <pubDate>Mon, 09 Dec 2019 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2019/12/09/hiring-at-better.html</guid>
      <description>&lt;p&gt;Just a quick note that my team is always hiring at &lt;a href=&#34;http://better.com/&#34;&gt;Better&lt;/a&gt;. A lot of new people have been joining the team here in NYC lately—the tech team has actually grown from 35 to 60 in just ~3 months. We are primarily looking for senior software engineers and/or engineering managers. But we would love to talk if you have less experience too! Our main tech stack is mostly TypeScript and Python.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Buffet lines are terrible, but let&#39;s try to improve them using computer simulations</title>
      <link>https://erikbern.com/2019/10/16/buffet-lines-are-terrible.html</link>
      <pubDate>Wed, 16 Oct 2019 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2019/10/16/buffet-lines-are-terrible.html</guid>
      <description>&lt;p&gt;&lt;a href=&#34;https://better.com&#34;&gt;My company&lt;/a&gt; has a buffet every Friday, and the lines grow to epic proportions when the food arrives. I&amp;rsquo;ve suspected for &lt;em&gt;years&lt;/em&gt; that the &amp;ldquo;classic&amp;rdquo; buffet line system is a deeply flawed and inefficient method, and every time I&amp;rsquo;m stuck in the line has made me more convinced.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Miscellaneous unsolicited (and possibly biased) career advice</title>
      <link>https://erikbern.com/2019/09/26/misc-unsolicited-career-advice.html</link>
      <pubDate>Thu, 26 Sep 2019 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2019/09/26/misc-unsolicited-career-advice.html</guid>
      <description>&lt;p&gt;No one asked for this, but I&amp;rsquo;m something like ~12 years into my career and have had my fair share of mistakes and luck so I thought I&amp;rsquo;d share some.&lt;/p&gt;&#xA;&lt;p&gt;Honestly, I feel like I&amp;rsquo;ve mostly benefitted from luck. Some of the things I did on a whim turned out to be excellent choices many years later. Some of the things were clear blind spots in hindsight. If I could give my 12 years younger self a bunch of career advice, here are some of those things.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Modeling conversion rates using Weibull and gamma distributions</title>
      <link>https://erikbern.com/2019/08/05/modeling-conversion-rates-using-weibull-and-gamma-distributions.html</link>
      <pubDate>Mon, 05 Aug 2019 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2019/08/05/modeling-conversion-rates-using-weibull-and-gamma-distributions.html</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a blog post originally featured on the &lt;a href=&#34;https://better.engineering&#34;&gt;Better engineering blog&lt;/a&gt;. If you want to link to this article or share it, please go to the &lt;a href=&#34;https://better.engineering/2019/07/29/modeling-conversion-rates-and-saving-millions-of-dollars-using-kaplan-meier-and-gamma-distributions/&#34;&gt;original post URL&lt;/a&gt;! Separately, I&amp;rsquo;m sorry it&amp;rsquo;s been so long with no posts on this blog. Between kids, moving, and being a startup CTO, I&amp;rsquo;ve been busy. I have a few posts coming down the pipe though, so stay tuned&amp;hellip;&lt;/em&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <title>Why software projects take longer than you think: a statistical model</title>
      <link>https://erikbern.com/2019/04/15/why-software-projects-take-longer-than-you-think-a-statistical-model.html</link>
      <pubDate>Mon, 15 Apr 2019 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2019/04/15/why-software-projects-take-longer-than-you-think-a-statistical-model.html</guid>
      <description>&lt;p&gt;Anyone who built software for a while knows that estimating how long something is going to take is &lt;em&gt;hard&lt;/em&gt;.&#xA;It&amp;rsquo;s hard to come up with an unbiased estimate of how long something will take, when fundamentally the work in itself is about &lt;em&gt;solving&lt;/em&gt; something.&#xA;One pet theory I&amp;rsquo;ve had for a really long time, is that some of this is really just a statistical artifact.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Headcount goals, feature factories, and when to hire those mythical 10x people</title>
      <link>https://erikbern.com/2019/02/21/headcount-targets-feature-factories-and-when-to-hire-those-mythical-10x-people.html</link>
      <pubDate>Thu, 21 Feb 2019 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2019/02/21/headcount-targets-feature-factories-and-when-to-hire-those-mythical-10x-people.html</guid>
      <description>&lt;p&gt;When I started building up a tech team for &lt;a href=&#34;https://better.com&#34;&gt;Better&lt;/a&gt;, I made a very conscious decision to pay at the high end to get people. I thought this made more sense: they cost a bit more money to hire, but output usually more than compensates for it. Many fellow CTOs, some went for the other side of the spectrum. This was a mystery to me, until it all made sense to me.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Data architecture vs backend architecture</title>
      <link>https://erikbern.com/2019/01/10/data-architecture-vs-backend-architecture.html</link>
      <pubDate>Thu, 10 Jan 2019 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2019/01/10/data-architecture-vs-backend-architecture.html</guid>
      <description>&lt;p&gt;&lt;img src=&#34;https://erikbern.com/assets/refinery.jpeg&#34; alt=&#34;refinery&#34;&gt;&lt;/p&gt;&#xA;&lt;p&gt;A modern tech stack typically involves at least a frontend and backend but relatively quickly also grows to include a data platform. This typically grows out of the need for ad-hoc analysis and reporting but possibly evolves into a whole oil refinery of cronjobs, dashboards, bulk data copying, and much more. What generally pushes things into the data platform is (generally) that a number of things are&lt;/p&gt;</description>
    </item>
    <item>
      <title>The hacker&#39;s guide to uncertainty estimates</title>
      <link>https://erikbern.com/2018/10/08/the-hackers-guide-to-uncertainty-estimates.html</link>
      <pubDate>Mon, 08 Oct 2018 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2018/10/08/the-hackers-guide-to-uncertainty-estimates.html</guid>
      <description>&lt;p&gt;It started with a tweet:&lt;/p&gt;&#xA;&lt;blockquote class=&#34;twitter-tweet&#34;&gt;&lt;p lang=&#34;en&#34; dir=&#34;ltr&#34;&gt;New years resolution: every plot I make during 2018 will contain uncertainty estimates&lt;/p&gt;&amp;mdash; Erik Bernhardsson (@bernhardsson) &lt;a href=&#34;https://twitter.com/bernhardsson/status/950065836194066433?ref_src=twsrc%5Etfw&#34;&gt;January 7, 2018&lt;/a&gt;&lt;/blockquote&gt;&#xA;&lt;script async src=&#34;https://platform.twitter.com/widgets.js&#34; charset=&#34;utf-8&#34;&gt;&lt;/script&gt;&#xA;&#xA;&#xA;&lt;p&gt;Why? Because I&amp;rsquo;ve been sitting in 100,000,000 meetings where people endlessly debate whether the monthly number of widgets is going up or down, or whether widget method X is more productive than widget method Y. For almost any graph, quantifying the uncertainty seems useful, so I started trying. A few months later:&lt;/p&gt;</description>
    </item>
    <item>
      <title>I don&#39;t want to learn your garbage query language</title>
      <link>https://erikbern.com/2018/08/30/i-dont-want-to-learn-your-garbage-query-language.html</link>
      <pubDate>Thu, 30 Aug 2018 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2018/08/30/i-dont-want-to-learn-your-garbage-query-language.html</guid>
      <description>&lt;p&gt;This is a bit of a rant but I really don&amp;rsquo;t like software that invents its own query language. There&amp;rsquo;s a trillion different ORMs out there. Another trillion databases with their own query language. Another trillion SaaS products where the only way to query is to learn some random query DSL they made up.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Business secrets from terrible people</title>
      <link>https://erikbern.com/2018/08/16/business-secrets-from-terrible-people.html</link>
      <pubDate>Thu, 16 Aug 2018 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2018/08/16/business-secrets-from-terrible-people.html</guid>
      <description>&lt;p&gt;I get bored reading management books very easily and lately I&amp;rsquo;ve been reading about a wide range of almost arbitrary topics. One of the lenses I tend to read through is to see different management styles in different environments.&lt;/p&gt;</description>
    </item>
    <item>
      <title>New approximate nearest neighbor benchmarks</title>
      <link>https://erikbern.com/2018/06/17/new-approximate-nearest-neighbor-benchmarks.html</link>
      <pubDate>Sun, 17 Jun 2018 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2018/06/17/new-approximate-nearest-neighbor-benchmarks.html</guid>
      <description>&lt;p&gt;As some of you may know, one of my side interests is approximate nearest neighbor algorithms. I&amp;rsquo;m the author of &lt;a href=&#34;https://github.com/spotify/annoy&#34;&gt;Annoy&lt;/a&gt;, a library with 3,500+ stars on Github as of today. It offers fast approximate search for nearest neighbors with the additional benefit that you can load data super fast from disk using mmap. I built it at Spotify to use for music recommendations where it&amp;rsquo;s still used to power millions (maybe billions) of music recommendations every day.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Missing the point about microservices: it&#39;s about testing and deploying independently</title>
      <link>https://erikbern.com/2018/06/04/missing-the-point-about-microservices.html</link>
      <pubDate>Mon, 04 Jun 2018 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2018/06/04/missing-the-point-about-microservices.html</guid>
      <description>&lt;p&gt;Ok, so I have to first preface this whole blog post by a few things:&lt;/p&gt;&#xA;&lt;ol&gt;&#xA;&lt;li&gt;I really struggle with the term &lt;em&gt;microservices&lt;/em&gt;. I can&amp;rsquo;t put my finger on exactly why. Maybe because the term is hopelessly ill-defined, maybe because it&amp;rsquo;s gotten picked up by the hype train. Whatever. But I have to stick to some type of terminology so let&amp;rsquo;s just roll with it.&lt;/li&gt;&#xA;&lt;li&gt;This blog post might be mildly controversial, but I&amp;rsquo;m throwing it out there because I&amp;rsquo;ve had this itchy feeling for so long and I can&amp;rsquo;t get rid of it. I respect it if you want to disagree vehemently, and maybe there&amp;rsquo;s something both of us can learn.&lt;/li&gt;&#xA;&lt;li&gt;I have a weird story. My first &amp;ldquo;real&amp;rdquo; company, Spotify, used a service-oriented architecture from scratch. I also spent some time at Google which used a service-oriented architecture. So basically since 2006 I&amp;rsquo;ve been continuously working in what people now call a &amp;ldquo;microservice architecture&amp;rdquo;. It didn&amp;rsquo;t even &lt;em&gt;occur&lt;/em&gt; to me that some people might want to build things as &lt;em&gt;monoliths&lt;/em&gt;. So I guess I&amp;rsquo;m coming at it from a different direction than many other. Either way, there were particular non-standard reasons why Spotify and Google had to do this that I&amp;rsquo;ll get back to later.&lt;/li&gt;&#xA;&lt;/ol&gt;&#xA;&lt;p&gt;Let&amp;rsquo;s start by talking about iteration speed!&lt;/p&gt;</description>
    </item>
    <item>
      <title>Interviewing is a noisy prediction problem</title>
      <link>https://erikbern.com/2018/05/02/interviewing-is-a-noisy-prediction-problem.html</link>
      <pubDate>Wed, 02 May 2018 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2018/05/02/interviewing-is-a-noisy-prediction-problem.html</guid>
      <description>&lt;p&gt;I have done roughly 2,000 interviews in my life. When I started recruiting, I had so much confidence in my ability to assess people. Let me just throw a couple of algorithm questions at a candidate and then I&amp;rsquo;ll tell you if they are good or not!&lt;/p&gt;</description>
    </item>
    <item>
      <title>Waiting time, load factor, and queueing theory: why you need to cut your systems a bit of slack</title>
      <link>https://erikbern.com/2018/03/27/waiting-time-load-factor-and-queueing-theory.html</link>
      <pubDate>Tue, 27 Mar 2018 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2018/03/27/waiting-time-load-factor-and-queueing-theory.html</guid>
      <description>&lt;p&gt;&lt;img src=&#34;https://erikbern.com/assets/queue.jpeg&#34; alt=&#34;queue&#34;&gt;&lt;/p&gt;&#xA;&lt;p&gt;I&amp;rsquo;ve been reading up on operations research lately, including &lt;a href=&#34;https://en.wikipedia.org/wiki/Queueing_theory&#34;&gt;queueing theory&lt;/a&gt;. It started out as a way to understand the very complex mortgage process (I work at &lt;a href=&#34;https://better.com/&#34;&gt;a mortgage startup&lt;/a&gt;) but it&amp;rsquo;s turned into my little hammer and now I see nails everywhere.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Lessons from content marketing myself (aka blogging) for five years</title>
      <link>https://erikbern.com/2018/03/07/lessons-from-content-marketing-myself-aka-blogging-for-five-years.html</link>
      <pubDate>Wed, 07 Mar 2018 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2018/03/07/lessons-from-content-marketing-myself-aka-blogging-for-five-years.html</guid>
      <description>&lt;p&gt;I started writing this blog in late 2012, partly because I felt like it would help me improve my English and my writing skills, partly because I kept having a lot of random ideas in my head and I wanted to write them down somewhere. I honestly never cared too much about finding a particular niche, I just wanted to write down stuff that I found interesting. I set up a Wordpress blog on my crappy Swedish virtual private server.&lt;/p&gt;</description>
    </item>
    <item>
      <title>New benchmarks for approximate nearest neighbors</title>
      <link>https://erikbern.com/2018/02/15/new-benchmarks-for-approximate-nearest-neighbors.html</link>
      <pubDate>Thu, 15 Feb 2018 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2018/02/15/new-benchmarks-for-approximate-nearest-neighbors.html</guid>
      <description>&lt;p&gt;UPDATE(2018-06-17): There are is a &lt;a href=&#34;https://erikbern.com/2018-06-17-new-approximate-nearest-neighbor-benchmarks&#34;&gt;later blog post with newer benchmarks&lt;/a&gt;!&lt;/p&gt;&#xA;&lt;p&gt;One of my super nerdy interests include approximate algorithms for nearest neighbors in high-dimensional spaces. The problem is simple. You have say 1M points in some high-dimensional space. Now given a query point, can you find the nearest points out of the 1M set? Doing this fast turns out to be tricky.&lt;/p&gt;</description>
    </item>
    <item>
      <title>I&#39;m looking for data engineers</title>
      <link>https://erikbern.com/2018/01/28/im-looking-for-data-engineers.html</link>
      <pubDate>Sun, 28 Jan 2018 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2018/01/28/im-looking-for-data-engineers.html</guid>
      <description>&lt;p&gt;&lt;img src=&#34;https://erikbern.com/assets/binary_globe.jpeg&#34; alt=&#34;binary globe&#34;&gt;&lt;/p&gt;&#xA;&lt;p&gt;I&amp;rsquo;m interrupting the regular programming for a quick announcement: we&amp;rsquo;re looking for data engineers at &lt;a href=&#34;https://better.com&#34;&gt;Better&lt;/a&gt;. You would be the first one to join and would work a lot directly with me.&lt;/p&gt;&#xA;&lt;p&gt;Some fun things you &lt;em&gt;could&lt;/em&gt; work on (these are all projects I&amp;rsquo;m working on right now):&lt;/p&gt;</description>
    </item>
    <item>
      <title>Books I consumed in 2017</title>
      <link>https://erikbern.com/2018/01/17/books-i-consumed-in-2017.html</link>
      <pubDate>Wed, 17 Jan 2018 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2018/01/17/books-i-consumed-in-2017.html</guid>
      <description>&lt;p&gt;&lt;img src=&#34;https://erikbern.com/assets/library.jpeg&#34; alt=&#34;library&#34;&gt;&lt;/p&gt;&#xA;&lt;p&gt;Turns out having a toddler isn&amp;rsquo;t super compatible with reading. I used to read ~100 books/year as a teenager, but it has slowly deteriorated to maybe 20-30 books, at most. And I don&amp;rsquo;t even finish all of them because life is too short! Some books are just not that interesting. So what were some of the books worth mentioning?&lt;/p&gt;</description>
    </item>
    <item>
      <title>Plotting author statistics for Git repos using Git of Theseus</title>
      <link>https://erikbern.com/2018/01/03/plotting-author-statistics-for-git-repos-using-git-of-theseus.html</link>
      <pubDate>Wed, 03 Jan 2018 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2018/01/03/plotting-author-statistics-for-git-repos-using-git-of-theseus.html</guid>
      <description>&lt;p&gt;I spent a few days during the holidays fixing up a bunch of semi-dormant open source projects and I have a couple of blog posts in the pipeline about various updates. First up, I made a number of fixes to &lt;a href=&#34;https://github.com/erikbern/git-of-theseus&#34;&gt;Git of Theseus&lt;/a&gt; which is a tool (written in Python) that generates statistics about Git repositories. I&amp;rsquo;ve &lt;a href=&#34;https://erikbern.com/2016/12/05/the-half-life-of-code.html&#34;&gt;written about it previously&lt;/a&gt; on this blog. The name is a horrible pun (I&amp;rsquo;m a dad!) on &lt;a href=&#34;https://en.wikipedia.org/wiki/Ship_of_Theseus&#34;&gt;Ship of Theseus&lt;/a&gt; which is a philosophical thought experiment about what happens if you replace every single part of a boat — is it still the same boat ⁉️ 🤔&lt;/p&gt;</description>
    </item>
    <item>
      <title>Toxic meeting culture</title>
      <link>https://erikbern.com/2017/12/29/toxic-meeting-culture.html</link>
      <pubDate>Fri, 29 Dec 2017 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2017/12/29/toxic-meeting-culture.html</guid>
      <description>&lt;p&gt;&lt;img src=&#34;https://erikbern.com/assets/dogs-meeting.jpg&#34; alt=&#34;dogs meeting&#34;&gt;&lt;/p&gt;&#xA;&lt;p&gt;I spent six years at a company that went from 50 people to 1500 and one contributing factor leading to my departure was that I went from a &amp;ldquo;maker&amp;rdquo; to a person stuck in meetings every day. It wasn&amp;rsquo;t that I wanted to do that, but everyone else kept dragging me into meetings.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Learning from users faster using machine learning</title>
      <link>https://erikbern.com/2017/12/12/learning-from-users-faster-using-machine-learning.html</link>
      <pubDate>Tue, 12 Dec 2017 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2017/12/12/learning-from-users-faster-using-machine-learning.html</guid>
      <description>&lt;p&gt;I had an interesting idea a few weeks ago, best explained through an example. Let&amp;rsquo;s say you&amp;rsquo;re running an e-commerce site (I &lt;a href=&#34;https://better.com&#34;&gt;kind of do&lt;/a&gt;) and you want to optimize the number of purchases.&lt;/p&gt;&#xA;&lt;p&gt;Let&amp;rsquo;s also say we try to learn as much as we can from users, both using A/B tests but also using just basic slicing and dicing of the data. We are looking at how many people convert (buy our widgets) but a constant problem is there&amp;rsquo;s just &lt;em&gt;too much uncertainty&lt;/em&gt;.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Annoy 1.10 released, with Hamming distance and Windows support</title>
      <link>https://erikbern.com/2017/11/26/annoy-1.10-released-with-hamming-distance-and-windows-support.html</link>
      <pubDate>Sun, 26 Nov 2017 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2017/11/26/annoy-1.10-released-with-hamming-distance-and-windows-support.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve been a bit bad at posting things with a regular cadence lately, partly because I&amp;rsquo;m trying to adjust to having a toddler, partly because the hunt for clicks has caused such a high bar for me that I feel like I have to post something Pulitzer-worthy. But things are always cooking, so let&amp;rsquo;s break this pattern with a quick notice on something I&amp;rsquo;ve been working on!&lt;/p&gt;</description>
    </item>
    <item>
      <title>Why conversion matters: a toy model</title>
      <link>https://erikbern.com/2017/10/30/why-conversion-matters-a-toy-model.html</link>
      <pubDate>Mon, 30 Oct 2017 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2017/10/30/why-conversion-matters-a-toy-model.html</guid>
      <description>&lt;p&gt;&lt;img src=&#34;https://erikbern.com/assets/funnel.gif&#34; alt=&#34;funnel&#34;&gt;&lt;/p&gt;&#xA;&lt;p&gt;There are often close relationships between top level business metrics. For instance, it&amp;rsquo;s well known that retention has a &lt;a href=&#34;https://25iq.com/2017/01/27/everyone-poops-and-has-customer-churn-and-a-dozen-notes/&#34;&gt;super strong impact&lt;/a&gt; on the valuation of a subscription business. Or that the % of occupied seats is super important for an airline. A fun little &lt;a href=&#34;https://en.wikipedia.org/wiki/Toy_model&#34;&gt;toy model&lt;/a&gt; that I can up with generates a curious relationship between conversion rates and revenue.&lt;/p&gt;</description>
    </item>
    <item>
      <title>On the Equifax breach and how to really prevent identity theft</title>
      <link>https://erikbern.com/2017/09/26/on-the-equifax-breach-and-how-to-really-secure-prevent-theft.html</link>
      <pubDate>Tue, 26 Sep 2017 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2017/09/26/on-the-equifax-breach-and-how-to-really-secure-prevent-theft.html</guid>
      <description>&lt;p&gt;&lt;img src=&#34;https://erikbern.com/assets/bar-code-tattoo.jpg&#34; alt=&#34;bar code tattoo&#34;&gt;&lt;/p&gt;&#xA;&lt;p&gt;A funny thing about being a foreigner is how you realize people take broken things for granted. I&amp;rsquo;m going to go out on a limb here claiming that &lt;em&gt;the US has a pretty dumb banking system&lt;/em&gt;. I could talk about it all day, but right now I want to focus on a very particular piece of it: &lt;em&gt;how to verify your identity online.&lt;/em&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <title>The number of letters in the word for each number</title>
      <link>https://erikbern.com/2017/09/06/the-number-of-letters-in-the-word-for-each-number.html</link>
      <pubDate>Wed, 06 Sep 2017 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2017/09/06/the-number-of-letters-in-the-word-for-each-number.html</guid>
      <description>&lt;p&gt;Just for fun, I generated these graphs of the number of letters in the word for each number. I really spent about 10 minutes on this (ok&amp;hellip;possibly also another 40 minutes tweaking the plots):&lt;/p&gt;&#xA;&lt;p&gt;&lt;img src=&#34;https://erikbern.com/assets/num-letters-en.png&#34; alt=&#34;en&#34;&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <title>The software engineering rule of 3</title>
      <link>https://erikbern.com/2017/08/29/the-software-engineering-rule-of-3.html</link>
      <pubDate>Tue, 29 Aug 2017 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2017/08/29/the-software-engineering-rule-of-3.html</guid>
      <description>&lt;p&gt;Here&amp;rsquo;s a &lt;del&gt;dumb&lt;/del&gt; extremely accurate rule I&amp;rsquo;m postulating* for software engineering projects: &lt;em&gt;you need at least 3 examples before you solve the right problem&lt;/em&gt;.&lt;/p&gt;&#xA;&lt;p&gt;This is what I&amp;rsquo;ve noticed:&lt;/p&gt;&#xA;&lt;ol&gt;&#xA;&lt;li&gt;Don&amp;rsquo;t factor out shared code between two classes. Wait until you have at least three.&lt;/li&gt;&#xA;&lt;li&gt;The two first attempts to solve a problem will fail because you misunderstood the problem. The third time it will work.&lt;/li&gt;&#xA;&lt;li&gt;Any attempt at being smart earlier will end up overfitting to coincidental patterns.&lt;/li&gt;&#xA;&lt;/ol&gt;&#xA;&lt;p&gt;(Note that #1 and #2 are actually pretty different implications. But let&amp;rsquo;s get back to that later.)&lt;/p&gt;</description>
    </item>
    <item>
      <title>Machine, Platform, Crowd</title>
      <link>https://erikbern.com/2017/08/19/machine-platform-crowd.html</link>
      <pubDate>Sat, 19 Aug 2017 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2017/08/19/machine-platform-crowd.html</guid>
      <description>&lt;p&gt;I just bought &lt;a href=&#34;https://www.amazon.com/dp/0393254291&#34;&gt;Machine, Platform, Crowd: Harnessing Our Digital Future&lt;/a&gt; and discovered that it mentions my blog – in particular the post &lt;a href=&#34;https://erikbern.com/2016/08/05/when-machine-learning-matters.html&#34;&gt;When machine learning matters&lt;/a&gt;.&lt;/p&gt;&#xA;&lt;p&gt;&lt;img src=&#34;https://erikbern.com/assets/machine_platform_crowd.jpeg&#34; alt=&#34;machine, platform, crowd p. 146&#34;&gt;&lt;/p&gt;&#xA;&lt;p&gt;Ok, I lied a little bit. I didn&amp;rsquo;t discover it serendipitously. Someone actually emailed me saying I was mentioned, and so I ordered the book for same-day delivery. But I was seriously planning to read the book anyway – having read both &lt;a href=&#34;https://www.amazon.com/Second-Machine-Age-Prosperity-Technologies/dp/0393350649&#34;&gt;The Second Machine Age&lt;/a&gt; and &lt;a href=&#34;https://www.amazon.com/Race-Against-Machine-Accelerating-Productivity/dp/0984725113&#34;&gt;Rage Against the Machine&lt;/a&gt; – they are great books &lt;em&gt;and I&amp;rsquo;m not being biased&lt;/em&gt;.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Google diversity memo, global warming, Pascal&#39;s wager, and other stuff</title>
      <link>https://erikbern.com/2017/08/14/google-diversity-memo-global-warming-pascals-wager.html</link>
      <pubDate>Mon, 14 Aug 2017 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2017/08/14/google-diversity-memo-global-warming-pascals-wager.html</guid>
      <description>&lt;p&gt;There&amp;rsquo;s about 765 million blog posts about the diversity &amp;ldquo;memo&amp;rdquo; that leaked out of Google a couple of weeks ago. I think the case for any biological difference is pretty weak, and it bothers me when people refer to an &amp;ldquo;interest gap&amp;rdquo; as anything else than caused by the environment. Maybe because I have a daughter, maybe because I have too many female friends who told me stories how they were held back or discriminated against.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Fun with trigonometry: the world&#39;s most twisted coastline</title>
      <link>https://erikbern.com/2017/07/12/the-most-twisted-coastline.html</link>
      <pubDate>Wed, 12 Jul 2017 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2017/07/12/the-most-twisted-coastline.html</guid>
      <description>&lt;p&gt;I just spent a few days in Italy, on the Ligurian coast. Even though we were on the west side of Italy, the Mediterranean sea was to the east, because the house was situated on a long bay. But zooming in even more, there were parts of the coast that were even more twisted – to the point where it had turned a full 360 degress so you ended up having the sea to the west again.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Optimizing for iteration speed</title>
      <link>https://erikbern.com/2017/07/06/optimizing-for-iteration-speed.html</link>
      <pubDate>Thu, 06 Jul 2017 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2017/07/06/optimizing-for-iteration-speed.html</guid>
      <description>&lt;p&gt;&lt;img src=&#34;https://erikbern.com/assets/burger_buns.jpg&#34; alt=&#34;burgers&#34;&gt;&lt;/p&gt;&#xA;&lt;p&gt;I&amp;rsquo;ve written before about &lt;a href=&#34;https://erikbern.com/2016/03/02/iterate-or-die.html&#34;&gt;the importance of iterating quickly&lt;/a&gt; but I didn&amp;rsquo;t necessarily talk about some concrete things you can do. When I&amp;rsquo;ve built up the tech team at &lt;a href=&#34;https://better.com&#34;&gt;Better&lt;/a&gt;, I&amp;rsquo;ve intentionally optimized for fast iteration speed above almost everything else. What are some ways we did that?&lt;/p&gt;</description>
    </item>
    <item>
      <title>Blogroll</title>
      <link>https://erikbern.com/2017/06/09/blogroll.html</link>
      <pubDate>Fri, 09 Jun 2017 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2017/06/09/blogroll.html</guid>
      <description>&lt;p&gt;Remember when everyone had a really ugly blog with a &lt;em&gt;blogroll&lt;/em&gt;? Anyway, just think the word is funny.&lt;/p&gt;&#xA;&lt;p&gt;I follow a few hundred blogs using &lt;a href=&#34;https://feedly.com&#34;&gt;Feedly&lt;/a&gt; and &lt;a href=&#34;http://reederapp.com/&#34;&gt;Reeder&lt;/a&gt; and have been reading a few hundred thousand blog posts over the last 10 years. Here&amp;rsquo;s some stuff I think everyone should follow. Not going to share a million blogs, just a few top ones. That way you don&amp;rsquo;t have to think about it, just subscribe to all of it:&lt;/p&gt;</description>
    </item>
    <item>
      <title>Conversion rates – you are (most likely) computing them wrong</title>
      <link>https://erikbern.com/2017/05/23/conversion-rates-you-are-most-likely-computing-them-wrong.html</link>
      <pubDate>Tue, 23 May 2017 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2017/05/23/conversion-rates-you-are-most-likely-computing-them-wrong.html</guid>
      <description>&lt;p&gt;How hard can it be to compute conversion rate? Take the total number of users that converted and divide them with the total number of users. &lt;em&gt;Done.&lt;/em&gt; Except&amp;hellip; it&amp;rsquo;s a lot more complicated when you have any sort of significant time lag.&lt;/p&gt;</description>
    </item>
    <item>
      <title>The mathematical principles of management</title>
      <link>https://erikbern.com/2017/04/09/the-mathematical-principles-of-management.html</link>
      <pubDate>Sun, 09 Apr 2017 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2017/04/09/the-mathematical-principles-of-management.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve read about 100 management books by now but if there&amp;rsquo;s something that always bothered me it&amp;rsquo;s the lack of first principles thinking. Basically it&amp;rsquo;s a ton of heuristics. And heuristics are great, but when you present heuristics as true objectives, it kind of clouds the underlying objectives (and you end up with weird proxy cults like the Agile movement 👹 – not that I disagree with it, I just wish they could derive it from a more systematic understanding of project management).&lt;/p&gt;</description>
    </item>
    <item>
      <title>The eigenvector of &#34;Why we moved from language X to language Y&#34;</title>
      <link>https://erikbern.com/2017/03/15/the-eigenvector-of-why-we-moved-from-language-x-to-language-y.html</link>
      <pubDate>Wed, 15 Mar 2017 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2017/03/15/the-eigenvector-of-why-we-moved-from-language-x-to-language-y.html</guid>
      <description>&lt;p&gt;I was reading yet another blog post titled &amp;ldquo;Why our team moved from &amp;lt;language X&amp;gt; to &amp;lt;language Y&amp;gt;&amp;rdquo; (I forgot which one) and I started wondering if you can generalize it a bit. Is it possible to generate a N * N contingency table of moving from language X to language Y?&lt;/p&gt;</description>
    </item>
    <item>
      <title>Why I went into the mortgage industry</title>
      <link>https://erikbern.com/2017/02/17/why-i-went-into-the-mortgage-industry.html</link>
      <pubDate>Fri, 17 Feb 2017 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2017/02/17/why-i-went-into-the-mortgage-industry.html</guid>
      <description>&lt;p&gt;I just realized last Thursday that I have spent two full years at &lt;a href=&#34;https://better.com&#34;&gt;Better&lt;/a&gt;, incidentally on the same day as we announced a &lt;a href=&#34;https://www.wsj.com/articles/lender-better-mortgage-gets-new-kleiner-perkins-funding-valuing-firm-at-220-million-1486643386&#34;&gt;$15M round&lt;/a&gt; led by Kleiner Perkins. So it was a good point to reflect a bit and think back – what the F led me to abandon my role managing the machine learning team at Spotify? To join some random startup in the world&amp;rsquo;s most boring industry? So here&amp;rsquo;s my justification why I love being where I am:&lt;/p&gt;</description>
    </item>
    <item>
      <title>Language pitch</title>
      <link>https://erikbern.com/2017/02/01/language-pitch.html</link>
      <pubDate>Wed, 01 Feb 2017 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2017/02/01/language-pitch.html</guid>
      <description>&lt;p&gt;Here&amp;rsquo;s a fun analysis that I did of the &lt;em&gt;pitch&lt;/em&gt; (aka. frequency) of various languages. Certain languages are simply pronounced with lower or higher pitch. Whether this is a feature of the language or more a cultural thing is a good question, but there are some substantial differences between languages.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Functional programming is the libertarianism of software engineering</title>
      <link>https://erikbern.com/2017/01/10/functional-programming-is-the-libertarianism-of-sw-eng.html</link>
      <pubDate>Tue, 10 Jan 2017 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2017/01/10/functional-programming-is-the-libertarianism-of-sw-eng.html</guid>
      <description>&lt;p&gt;This is a pretty dumb post, in which I argue that functional programming has a lot of the bad parts of libertarianism and a lot of the good parts:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Both ideologies strive to eliminate [the] state. &lt;em&gt;(ok, dumb dad joke)&lt;/em&gt;&lt;/li&gt;&#xA;&lt;li&gt;Both ideologies are driven by a set of dogmatic axioms rather than a practical goal:&lt;/li&gt;&#xA;&lt;li&gt;Libertarianism wants to reduce the government because any involvement distorts free markets. I always struggled to see what the underlying objective function is (it doesn&amp;rsquo;t seem to be maximization of people&amp;rsquo;s utility). 🤔&lt;/li&gt;&#xA;&lt;li&gt;Functional programming wants to reduce side effects and make everything pure, often by enforcing onerous type systems. But why? Again I don&amp;rsquo;t see an ultimate objective here. IMO it should start from the principle that the goal of a programming language should be to &lt;em&gt;make the programmers as productive as possible.&lt;/em&gt; For instance, the little research that exists has shown that most bugs have little to &lt;a href=&#34;https://vimeo.com/74354480&#34;&gt;with typing&lt;/a&gt; and I&amp;rsquo;d expect something similar to apply to mutable state. In fact the largest class seems to be &lt;a href=&#34;https://blog.acolyer.org/2016/10/06/simple-testing-can-prevent-most-critical-failures/&#34;&gt;poor error handling&lt;/a&gt; &lt;em&gt;(ok, typing isn&amp;rsquo;t necessarily related to FP, but in practice I find that strong typing and FP have highly overlapping fan clubs).&lt;/em&gt;&lt;/li&gt;&#xA;&lt;li&gt;Both camps invoke obscure cases in history as a proof of success: libertarianists (more so anarchists I guess) often talks about Spain during the civil war, &lt;a href=&#34;https://mises.org/library/stateless-somalia-and-loving-it&#34;&gt;Somalia&lt;/a&gt;, or sometimes Singapore. Haskell acolytes are very eager to bring up &lt;a href=&#34;https://code.facebook.com/posts/745068642270222/fighting-spam-with-haskell/&#34;&gt;Facebook&amp;rsquo;s spam filtering&lt;/a&gt;.&lt;/li&gt;&#xA;&lt;li&gt;YET – and this is the surprising part imho – both ideologies are ~90% correct (source: my opinion). Which really surprises me given that they start from a (imo) arbitrary set of axioms.&lt;/li&gt;&#xA;&lt;li&gt;Even if you are a die hard bolshevik, you benefit from an understanding of how interventions distort markets, how incentives matter, and how entrepreneurship is the driver of progress.&lt;/li&gt;&#xA;&lt;li&gt;Even if you are coding in Visual Basic, you can level up your skills by learning FP: making functions pure when needed, avoid state, avoid reassign variables, avoid mutable data structures, write pipelines of data transformations, and all that jazz that FP has taught us to cherish.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;&lt;img src=&#34;https://erikbern.com/assets/mind_blown.gif&#34; alt=&#34;mind blown&#34;&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <title>The half-life of code &amp; the ship of Theseus</title>
      <link>https://erikbern.com/2016/12/05/the-half-life-of-code.html</link>
      <pubDate>Mon, 05 Dec 2016 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2016/12/05/the-half-life-of-code.html</guid>
      <description>&lt;p&gt;&lt;img src=&#34;https://erikbern.com/assets/trireme.jpg&#34; alt=&#34;trireme&#34;&gt;&lt;/p&gt;&#xA;&lt;p&gt;As a project evolves, does the new code just add on top of the old code? Or does it replace the old code slowly over time? In order to understand this, I built a &lt;a href=&#34;https://github.com/erikbern/git-of-theseus&#34;&gt;little thing&lt;/a&gt; to analyze Git projects, with help from the formidable &lt;a href=&#34;https://gitpython.readthedocs.io/en/stable/&#34;&gt;GitPython&lt;/a&gt; project. The idea is to go back in history historical and run a &lt;code&gt;git blame&lt;/code&gt; (making this somewhat fast was a bit nontrivial, as it turns out, but I&amp;rsquo;ll spare you the details, which involve some opportunistic caching of files, pick historical points spread out in time, use &lt;code&gt;git diff&lt;/code&gt; to invalidate changed files, etc).&lt;/p&gt;</description>
    </item>
    <item>
      <title>Are data sets the new server rooms?</title>
      <link>https://erikbern.com/2016/11/01/are-data-sets-the-new-server-rooms.html</link>
      <pubDate>Tue, 01 Nov 2016 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2016/11/01/are-data-sets-the-new-server-rooms.html</guid>
      <description>&lt;p&gt;This blog post &lt;a href=&#34;https://medium.com/@josh_nussbaum/data-sets-are-the-new-server-rooms-40fdb5aed6b0&#34;&gt;Data sets are the new server rooms&lt;/a&gt; makes the point that a bunch of companies raise a ton of money to go get really proprietary awesome data as a competitive moat. Because once you have the data, you can build a better product, and no one can copy it (at least not very cheaply). Ideally you hit a virtuous cycle as well, where usage of your system once it takes of gives even more data, which makes the system even better, which attracts more users&amp;hellip;&lt;/p&gt;</description>
    </item>
    <item>
      <title>Pareto efficency</title>
      <link>https://erikbern.com/2016/10/25/pareto-efficiency.html</link>
      <pubDate>Tue, 25 Oct 2016 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2016/10/25/pareto-efficiency.html</guid>
      <description>&lt;p&gt;&lt;a href=&#34;https://en.wikipedia.org/wiki/Pareto_efficiency&#34;&gt;Pareto efficiency&lt;/a&gt; is a useful concept I like to think about. It often comes up when you compare items on multiple dimensions. Say you want to buy a new TV. To simplify it let&amp;rsquo;s assume you only care about two factors: price and quality. We don&amp;rsquo;t know what you are willing to pay for quality – but we know that &lt;em&gt;everything else equals&lt;/em&gt;:&lt;/p&gt;</description>
    </item>
    <item>
      <title>State drift</title>
      <link>https://erikbern.com/2016/09/08/state-drift.html</link>
      <pubDate>Thu, 08 Sep 2016 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2016/09/08/state-drift.html</guid>
      <description>&lt;p&gt;I generally haven&amp;rsquo;t written much about software architecture. People make heuristics into religion. But here is something I thought about: &lt;em&gt;how to build in self-correction into systems&lt;/em&gt;. This has been something just vaguely sitting in my head lacking a clear conceptual definition until a whole slew of things popped up today that all had the exact same issue at its core. I&amp;rsquo;m going to refer to it as &lt;em&gt;state drift&lt;/em&gt; lacking a better term for it.&lt;/p&gt;</description>
    </item>
    <item>
      <title>When machine learning matters</title>
      <link>https://erikbern.com/2016/08/05/when-machine-learning-matters.html</link>
      <pubDate>Fri, 05 Aug 2016 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2016/08/05/when-machine-learning-matters.html</guid>
      <description>&lt;p&gt;I joined Spotify in 2008 to focus on machine learning and music recommendations. It&amp;rsquo;s easy to forget, but Spotify&amp;rsquo;s key differentiator back then was the low-latency playback. People would say that it felt like they had the music on their own hard drive. (The other key differentiator was licensing – until early 2009 Spotify basically just had all kinds of weird stuff that employees had uploaded. In 2009 after a crazy amount of negotiation the music labels agreed to try it out as an experiment. But I&amp;rsquo;m getting off topic now.)&lt;/p&gt;</description>
    </item>
    <item>
      <title>Subway waiting math</title>
      <link>https://erikbern.com/2016/07/09/waiting-time-math.html</link>
      <pubDate>Sat, 09 Jul 2016 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2016/07/09/waiting-time-math.html</guid>
      <description>&lt;p&gt;Why does it suck to wait for things? In a &lt;a href=&#34;https://erikbern.com/2016/04/04/nyc-subway-math.html&#34;&gt;previous post I analyzed a NYC subway dataset&lt;/a&gt; and found that at some point, quite early, it&amp;rsquo;s worth just giving up.&lt;/p&gt;&#xA;&lt;p&gt;This isn&amp;rsquo;t a proof that the subway doesn&amp;rsquo;t run on time – in fact it might actually &lt;em&gt;proves that the subway runs really well&lt;/em&gt;. The numbers indicate that it&amp;rsquo;s not worth waiting after 10 minutes, but it&amp;rsquo;s a rare event and usually involves something extraordinary like a multi-hour delay. You should roughly give up after some point related to the normal train frequency, and 10 minutes is not a lot at all. Conversely if the trains ran hourly, it probably would  had been worth waiting an hour or more. My analysis gave me a lot of respect for the job MTA is doing.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Approximate nearest news</title>
      <link>https://erikbern.com/2016/06/02/approximate-nearest-news.html</link>
      <pubDate>Thu, 02 Jun 2016 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2016/06/02/approximate-nearest-news.html</guid>
      <description>&lt;p&gt;&lt;img src=&#34;https://erikbern.com/assets/tree-full-K.png&#34; alt=&#34;pic&#34;&gt;&lt;/p&gt;&#xA;&lt;p&gt;As you may know, one of my (very geeky) interests is &lt;a href=&#34;https://en.wikipedia.org/wiki/Nearest_neighbor_search&#34;&gt;Approximate nearest neigbor&lt;/a&gt; methods, and I&amp;rsquo;m the author of a Python package called &lt;a href=&#34;https://github.com/spotify/annoy&#34;&gt;Annoy&lt;/a&gt;.&lt;/p&gt;&#xA;&lt;p&gt;I&amp;rsquo;ve also built a benchmark suite called &lt;a href=&#34;https://github.com/erikbern/ann-benchmarks&#34;&gt;ann-benchmarks&lt;/a&gt; to compare different packages. Annoy was the world&amp;rsquo;s fastest package for a few months, but two things happened.&lt;/p&gt;</description>
    </item>
    <item>
      <title>What is your motivation?</title>
      <link>https://erikbern.com/2016/05/24/what-is-your-motivation.html</link>
      <pubDate>Tue, 24 May 2016 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2016/05/24/what-is-your-motivation.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve been trying to learn Clojure. I keep telling people I meet that I really want to learn Clojure, but still every night I can&amp;rsquo;t get myself to spend time with it. It&amp;rsquo;s unclear if I really want to learn Clojure or just want to &lt;em&gt;have learned&lt;/em&gt; Clojure?&lt;/p&gt;</description>
    </item>
    <item>
      <title>Dollar cost averaging</title>
      <link>https://erikbern.com/2016/04/26/dollar-cost-averaging.html</link>
      <pubDate>Tue, 26 Apr 2016 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2016/04/26/dollar-cost-averaging.html</guid>
      <description>&lt;p&gt;(I accidentally published an unfinished draft of this post a few days ago – sorry about that).&lt;/p&gt;&#xA;&lt;p&gt;There&amp;rsquo;s a lot of sources preaching the benefits of &lt;em&gt;dollar cost averaging&lt;/em&gt;, or the practice of investing a fixed amount of money regularly. The alleged benefit is that when the price goes up, well, then your stake is worth more, but if the price goes down, then you get more shares for the same amount of money. &lt;a href=&#34;https://en.wikipedia.org/wiki/Dollar_cost_averaging&#34;&gt;According to&lt;/a&gt; Wikipedia, it &amp;ldquo;minimises downside risk&amp;rdquo;, about.com &lt;a href=&#34;http://beginnersinvest.about.com/cs/newinvestors/a/041901a.htm&#34;&gt;says&lt;/a&gt; it &amp;ldquo;drastically reduces market risk&amp;rdquo;, and an article on Nasdaq.com &lt;a href=&#34;http://www.nasdaq.com/article/why-dollar-cost-averaging-is-a-smart-investment-strategy-cm354240&#34;&gt;claims that&lt;/a&gt; it&amp;rsquo;s a &amp;ldquo;smart investment strategy&amp;rdquo;.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Why organizations fail</title>
      <link>https://erikbern.com/2016/04/18/why-organizations-fail.html</link>
      <pubDate>Mon, 18 Apr 2016 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2016/04/18/why-organizations-fail.html</guid>
      <description>&lt;p&gt;&lt;img src=&#34;https://erikbern.com/assets/simpsons_enron_1.gif&#34; alt=&#34;&#34;&gt;&lt;/p&gt;&#xA;&lt;p&gt;One of my favorite business hobbies is to reduce some nasty decision down to its absolute core objective, decide the most basic strategy, and then add more and more modifications as you have to confront the complexity of reality (yes I have very lame hobbies thanks I know).&lt;/p&gt;</description>
    </item>
    <item>
      <title>NYC subway math</title>
      <link>https://erikbern.com/2016/04/04/nyc-subway-math.html</link>
      <pubDate>Mon, 04 Apr 2016 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2016/04/04/nyc-subway-math.html</guid>
      <description>&lt;p&gt;Apparently &lt;a href=&#34;http://www.mta.info/&#34;&gt;MTA&lt;/a&gt; (the company running the NYC subway) has a &lt;a href=&#34;http://datamine.mta.info/&#34;&gt;real-time API&lt;/a&gt;. My fascination for the subway takes autistic proportions and so obviously I had to analyze some of the data. The documentation is somewhat terrible, but here&amp;rsquo;s some relevant code for how to use the API:&lt;/p&gt;</description>
    </item>
    <item>
      <title>Exploding offers are bullshit</title>
      <link>https://erikbern.com/2016/03/16/exploding-offers-are-bullshit.html</link>
      <pubDate>Wed, 16 Mar 2016 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2016/03/16/exploding-offers-are-bullshit.html</guid>
      <description>&lt;p&gt;&lt;img src=&#34;https://erikbern.com/assets/time_bomb.gif&#34; alt=&#34;Time bomb&#34;&gt;&lt;/p&gt;&#xA;&lt;p&gt;I do a lot of recruiting and have given maybe 50 offers in my career. Although many companies do, I &lt;em&gt;never&lt;/em&gt; put a deadline on any of them. Unfortunately, I&amp;rsquo;ve often ended up competing with other companies who do, and I feel really bad that this usually tricks younger developers into signing offers. On numerous occasions, I&amp;rsquo;ve gotten an email halfway through the interview process&lt;/p&gt;</description>
    </item>
    <item>
      <title>Meta-blogging</title>
      <link>https://erikbern.com/2016/03/12/meta-blogging.html</link>
      <pubDate>Sat, 12 Mar 2016 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2016/03/12/meta-blogging.html</guid>
      <description>&lt;p&gt;(This is not a very relevant/useful post for regular readers – feel free to skip. I thought I would share it so people can find it on Google.)&lt;/p&gt;&#xA;&lt;p&gt;My blog blew up twice in a week earlier this year when I landed on Hacker News. The first time I was asleep so I didn&amp;rsquo;t notice that the site went down. The second time I did notice, and scrambled to reconfigure Apache &amp;amp; MySQL to handle the load.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Iterate or die</title>
      <link>https://erikbern.com/2016/03/02/iterate-or-die.html</link>
      <pubDate>Wed, 02 Mar 2016 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2016/03/02/iterate-or-die.html</guid>
      <description>&lt;p&gt;Here&amp;rsquo;s a conclusion I&amp;rsquo;ve made building consumer products for many years: &lt;strong&gt;the speed at which a company innovates is limited by its iteration speed.&lt;/strong&gt;&lt;/p&gt;&#xA;&lt;p&gt;I don&amp;rsquo;t even mean throughput here. I just mean the cycle time. Invoking &lt;a href=&#34;https://en.wikipedia.org/wiki/Little%27s_law&#34;&gt;Little&amp;rsquo;s law&lt;/a&gt; this is also related to the &lt;em&gt;total inventory of features not being deployed yet&lt;/em&gt;.&lt;/p&gt;</description>
    </item>
    <item>
      <title>My issue with GPU-accelerated deep learning</title>
      <link>https://erikbern.com/2016/02/03/my-issue-with-gpu-accelerated-deep-learning.html</link>
      <pubDate>Wed, 03 Feb 2016 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2016/02/03/my-issue-with-gpu-accelerated-deep-learning.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve been spending several hundred bucks renting GPU instances on AWS over the last year. The speedup from a GPU is awesome and hard to deny. GPUs have taken over the field. Maybe following the footsteps of Bitcoin mining there&amp;rsquo;s some research on &lt;a href=&#34;https://gigaom.com/2015/02/23/microsoft-is-building-fast-low-power-neural-networks-with-fpgas/&#34;&gt;using FPGA&lt;/a&gt; (I know very little about this).&lt;/p&gt;</description>
    </item>
    <item>
      <title>Some more font links</title>
      <link>https://erikbern.com/2016/01/25/some-more-font-links.html</link>
      <pubDate>Mon, 25 Jan 2016 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2016/01/25/some-more-font-links.html</guid>
      <description>&lt;p&gt;My blog post about fonts generated lots of traffic – it landed on Hacker News, took down my site while I was sleeping, and then obviously vanished from HN before I woke up. But it also got retweeted by a ton of people.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Analyzing 50k fonts using deep neural networks</title>
      <link>https://erikbern.com/2016/01/21/analyzing-50k-fonts-using-deep-neural-networks.html</link>
      <pubDate>Thu, 21 Jan 2016 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2016/01/21/analyzing-50k-fonts-using-deep-neural-networks.html</guid>
      <description>&lt;p&gt;For some reason I decided one night I wanted to get a bunch of fonts. A lot of them. An hour later I had a bunch of &lt;a href=&#34;http://scrapy.org/&#34;&gt;scrapy&lt;/a&gt; scripts pulling down fonts and a few days later I had more than 50k fonts on my computer.&lt;/p&gt;</description>
    </item>
    <item>
      <title>I believe in the 10x engineer, but...</title>
      <link>https://erikbern.com/2016/01/08/i-believe-in-the-10x-engineer-but.html</link>
      <pubDate>Fri, 08 Jan 2016 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2016/01/08/i-believe-in-the-10x-engineer-but.html</guid>
      <description>&lt;ul&gt;&#xA;&lt;li&gt;The easiest way to be a 10x engineer is to make 10 other engineers 2x more efficient. Someone can be a 10x engineer if they do nothing for 364 days then convinces the team to change programming language to a 2x more productive language.&lt;/li&gt;&#xA;&lt;li&gt;A motivated 10x engineer in one team could be a demotivated 0.5x engineer in another team (and vice versa).&lt;/li&gt;&#xA;&lt;li&gt;A average 1x engineer could easily become a 5x engineer if surrounded by 10x engineers. Engagement and work ethics is contagious.&lt;/li&gt;&#xA;&lt;li&gt;The cynical reason why 10x engineers aren&amp;rsquo;t paid 10x more salary is that there is no way for the new employer to know. There is no “10x badge”.&lt;/li&gt;&#xA;&lt;li&gt;…but also, a 10x engineer can go to a new company and become an 1x engineer because of bad focus / bad engagement / tech stack mismatch.&lt;/li&gt;&#xA;&lt;li&gt;So unfortunately there&amp;rsquo;s less economic rationality for companies to pay 10x salaries to 10x engineers (contrary to what &lt;a href=&#34;http://www.businessinsider.com/google-policy-to-pay-unfairly-2015-4&#34;&gt;Google&lt;/a&gt; or &lt;a href=&#34;http://www.slideshare.net/reed2001/culture-1798664/98-Takes_Great_Judgment_Goal_is&#34;&gt;Netflix&lt;/a&gt; says)&lt;/li&gt;&#xA;&lt;li&gt;There&amp;rsquo;s no such thing as a 10x engineer spending time on something that never ends up delivering business value. If something doesn&amp;rsquo;t deliver business value, it&amp;rsquo;s 0x.&lt;/li&gt;&#xA;&lt;li&gt;If you build something that the average engineer &lt;em&gt;would not have been able to build, no matter how much time&lt;/em&gt;, that can make you 100x or 1000x, or ∞x. &lt;a href=&#34;http://slatestarcodex.com/2015/12/27/things-that-are-not-superintelligences/&#34;&gt;Quoting Alexander Scott&lt;/a&gt;: &lt;em&gt;There is no number of ordinary eight-year-olds who, when organized into a team, will become smart enough to beat a grandmaster in chess&lt;a href=&#34;http://slatestarcodex.com/2015/12/27/things-that-are-not-superintelligences/&#34;&gt;.&lt;/a&gt;&lt;/em&gt;&lt;/li&gt;&#xA;&lt;li&gt;Most of the 10x factor is most likely explained by team and company factors (process, tech stack, etc) and applies to everyone in the team/company. Intra-team variation is thus much smaller than 10x (even controlling for the fact that companies tend to attract people of equal caliber). Nature vs nurture…&lt;/li&gt;&#xA;&lt;li&gt;I&amp;rsquo;ve never met the legendary “10x jerk”. Anecdotally the outperforming engineers are generally nice and humble.&lt;/li&gt;&#xA;&lt;li&gt;Don&amp;rsquo;t get hung up on the exact numbers here, it&amp;rsquo;s just for illustration purposes. I.e. someone introduced a &lt;a href=&#34;http://pythonsweetness.tumblr.com/post/64740079543/how-to-lose-172222-a-second-for-45-minutes&#34;&gt;bug in the trading system&lt;/a&gt; of Knight Capital that made them lose $465M in 30 minutes. Did that make it a -1,000,000x engineer? (and btw it had more to do with company culture). The numbers aren&amp;rsquo;t meant to be taken literally.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;&lt;img src=&#34;https://erikbern.com/assets/2016/01/business_meeting_3-1024x440.jpg&#34; alt=&#34;image&#34;&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <title>Books I read in 2015</title>
      <link>https://erikbern.com/2016/01/01/books-i-read-in-2015.html</link>
      <pubDate>Fri, 01 Jan 2016 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2016/01/01/books-i-read-in-2015.html</guid>
      <description>&lt;p&gt;Early last year when I left Spotify I decided to do more reading. I was planning to read at least one book per week and in particular I wanted to brush up on management, economics, and technology. 2015 was also a year of exclusively non-fiction, which is a pretty drastic shift, since I grew up reading fiction compulsively for 20 years.&lt;/p&gt;</description>
    </item>
    <item>
      <title>More MCMC – Analyzing a small dataset with 1-5 ratings</title>
      <link>https://erikbern.com/2015/12/05/more-mcmc-analyzing-a-small-dataset-with-1-5-ratings.html</link>
      <pubDate>Sat, 05 Dec 2015 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2015/12/05/more-mcmc-analyzing-a-small-dataset-with-1-5-ratings.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve been obsessed with how to iterate quickly based on small scale feedback lately. One awesome website I encountered is &lt;a href=&#34;https://usabilityhub.com&#34;&gt;Usability Hub&lt;/a&gt; which lets you run 5 second tests. Users see your site for 5 seconds and you can ask them free-form questions afterwards. The nice thing is you don&amp;rsquo;t even have to build the site – just upload a static png/jpg and collect data.&lt;/p&gt;</description>
    </item>
    <item>
      <title>There is no magic trick</title>
      <link>https://erikbern.com/2015/11/28/there-is-no-magic-trick.html</link>
      <pubDate>Sat, 28 Nov 2015 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2015/11/28/there-is-no-magic-trick.html</guid>
      <description>&lt;p&gt;(Warning: super speculative, feel free to ignore)&lt;/p&gt;&#xA;&lt;p&gt;As Yogi Berra said, “It&amp;rsquo;s tough to make predictions, especially about the future”. Unfortunately predicting is hard, and unsurprisingly people look for the Magic Trick™ that can resolve all the uncertainty. Whether it&amp;rsquo;s recruiting, investing, system design, finding your soulmate, or anything else, there&amp;rsquo;s always an alleged shortcut.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Installing TensorFlow on AWS</title>
      <link>https://erikbern.com/2015/11/12/installing-tensorflow-on-aws.html</link>
      <pubDate>Thu, 12 Nov 2015 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2015/11/12/installing-tensorflow-on-aws.html</guid>
      <description>&lt;p&gt;Curious about Google&amp;rsquo;s newly released &lt;a href=&#34;https://tensorflow.org&#34;&gt;TensorFlow&lt;/a&gt;? I don&amp;rsquo;t have a beefy GPU machine, so I spent some time getting it to run on EC2. The &lt;a href=&#34;https://gist.github.com/erikbern/78ba519b97b440e10640&#34;&gt;steps on how to reproduce&lt;/a&gt; it are pretty brutal and I wouldn&amp;rsquo;t recommend going through it unless you want to waste five hours of your live.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Looking for smart people</title>
      <link>https://erikbern.com/2015/11/04/looking-for-smart-people.html</link>
      <pubDate>Wed, 04 Nov 2015 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2015/11/04/looking-for-smart-people.html</guid>
      <description>&lt;p&gt;I haven&amp;rsquo;t mentioned what I&amp;rsquo;m currently up to. Earlier this year I left Spotify to join a small startup called &lt;a href=&#34;https://better.com/&#34;&gt;Better&lt;/a&gt;. We&amp;rsquo;re going after one of the biggest industries in the world that also turns out to be completely broken. The mortgage industry might not be the #1 industry you pictured yourself in, but it&amp;rsquo;s an enormous opportunity to fix a series of real consumer problems and join a company that I predict will be huge.&lt;/p&gt;</description>
    </item>
    <item>
      <title>MCMC for marketing data</title>
      <link>https://erikbern.com/2015/10/31/mcmc-for-marketing-data.html</link>
      <pubDate>Sat, 31 Oct 2015 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2015/10/31/mcmc-for-marketing-data.html</guid>
      <description>&lt;p&gt;The other day I was looking at marketing spend broken down by channel and wanted to compute some simple uncertainty estimates. I have data like this:&lt;/p&gt;&#xA;&lt;table&gt;&#xA;  &lt;tr&gt;&#xA;    &lt;th&gt;&#xA;    &lt;/th&gt;&#xA;&lt;pre&gt;&lt;code&gt;&amp;lt;th&amp;gt;&#xA;  Total spend&#xA;&amp;lt;/th&amp;gt;&#xA;&#xA;&amp;lt;th&amp;gt;&#xA;  Transactions&#xA;&amp;lt;/th&amp;gt;&#xA;&lt;/code&gt;&lt;/pre&gt;&#xA;  &lt;/tr&gt;&#xA;  &lt;tr&gt;&#xA;    &lt;th&gt;&#xA;      Channel A&#xA;    &lt;/th&gt;&#xA;&lt;pre&gt;&lt;code&gt;&amp;lt;td&amp;gt;&#xA;  2292.04&#xA;&amp;lt;/td&amp;gt;&#xA;&#xA;&amp;lt;td&amp;gt;&#xA;  9&#xA;&amp;lt;/td&amp;gt;&#xA;&lt;/code&gt;&lt;/pre&gt;&#xA;  &lt;/tr&gt;&#xA;  &lt;tr&gt;&#xA;    &lt;th&gt;&#xA;      Channel B&#xA;    &lt;/th&gt;&#xA;&lt;pre&gt;&lt;code&gt;&amp;lt;td&amp;gt;&#xA;  1276.85&#xA;&amp;lt;/td&amp;gt;&#xA;&#xA;&amp;lt;td&amp;gt;&#xA;  2&#xA;&amp;lt;/td&amp;gt;&#xA;&lt;/code&gt;&lt;/pre&gt;&#xA;  &lt;/tr&gt;&#xA;  &lt;tr&gt;&#xA;    &lt;th&gt;&#xA;      Channel C&#xA;    &lt;/th&gt;&#xA;&lt;pre&gt;&lt;code&gt;&amp;lt;td&amp;gt;&#xA;  139.59&#xA;&amp;lt;/td&amp;gt;&#xA;&#xA;&amp;lt;td&amp;gt;&#xA;  3&#xA;&amp;lt;/td&amp;gt;&#xA;&lt;/code&gt;&lt;/pre&gt;&#xA;  &lt;/tr&gt;&#xA;  &lt;tr&gt;&#xA;    &lt;th&gt;&#xA;      Channel D&#xA;    &lt;/th&gt;&#xA;&lt;pre&gt;&lt;code&gt;&amp;lt;td&amp;gt;&#xA;  954.98&#xA;&amp;lt;/td&amp;gt;&#xA;&#xA;&amp;lt;td&amp;gt;&#xA;  5&#xA;&amp;lt;/td&amp;gt;&#xA;&lt;/code&gt;&lt;/pre&gt;&#xA;  &lt;/tr&gt;&#xA;&lt;/table&gt;&#xA;&lt;p&gt;Of course, it&amp;rsquo;s easy to compute the cost per transaction, but how do you produce uncertainty estimates? Turns out to be somewhat nontrivial. I don&amp;rsquo;t even think it&amp;rsquo;s possible to do a &lt;a href=&#34;https://en.wikipedia.org/wiki/Student%27s_t-test&#34;&gt;t-test&lt;/a&gt;, which is kind of interesting in itself.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Interview with a Data Scientist: Erik Bernhardsson</title>
      <link>https://erikbern.com/2015/10/28/interview-with-a-data-scientist-erik-bernhardsson.html</link>
      <pubDate>Wed, 28 Oct 2015 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2015/10/28/interview-with-a-data-scientist-erik-bernhardsson.html</guid>
      <description>&lt;p&gt;&lt;em&gt;&lt;a href=&#34;https://peadarcoyle.wordpress.com/2015/10/03/interview-with-a-data-scientist-erik-bernhardsson/&#34;&gt;I was featured&lt;/a&gt; in Peadar Coyle&amp;rsquo;s &lt;a href=&#34;https://peadarcoyle.wordpress.com&#34;&gt;interview series&lt;/a&gt; interviewing various “data scientists” – which is kind of arguable since (a) all the other ppl in that series are much cooler than me (b) I&amp;rsquo;m not really a data scientist. Anyway, reposting the full interview:&lt;/em&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <title>Nearest neighbors and vector models – epilogue – curse of dimensionality</title>
      <link>https://erikbern.com/2015/10/20/nearest-neighbors-and-vector-models-epilogue-curse-of-dimensionality.html</link>
      <pubDate>Tue, 20 Oct 2015 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2015/10/20/nearest-neighbors-and-vector-models-epilogue-curse-of-dimensionality.html</guid>
      <description>&lt;p&gt;This is another post based on my talk at &lt;a href=&#34;http://www.meetup.com/NYC-Machine-Learning/events/225265016/&#34;&gt;NYC Machine Learning&lt;/a&gt;. The previous two parts covered most of the interesting parts, but there are still some topics left to be discussed. To go back and read the meaty stuff, check out&lt;/p&gt;</description>
    </item>
    <item>
      <title>Nearest neighbors and vector models – part 2 – algorithms and data structures</title>
      <link>https://erikbern.com/2015/10/01/nearest-neighbors-and-vector-models-part-2-how-to-search-in-high-dimensional-spaces.html</link>
      <pubDate>Thu, 01 Oct 2015 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2015/10/01/nearest-neighbors-and-vector-models-part-2-how-to-search-in-high-dimensional-spaces.html</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a blog post rewritten from a presentation at &lt;a href=&#34;http://www.meetup.com/NYC-Machine-Learning/events/225265016/&#34;&gt;NYC Machine Learning&lt;/a&gt; on Sep 17. It covers a library called &lt;a href=&#34;https://github.com/spotify/annoy&#34;&gt;Annoy&lt;/a&gt; that I have built that helps you do nearest neighbor queries in high dimensional spaces. In the &lt;a href=&#34;https://erikbern.com/2015/09/24/nearest-neighbor-methods-vector-models-part-1/&#34;&gt;first part&lt;/a&gt;, I went through some examples of why vector models are useful. In the second part I will be explaining the data structures and algorithms that Annoy uses to do approximate nearest neighbor queries.&lt;/em&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <title>Nearest neighbor methods and vector models – part 1</title>
      <link>https://erikbern.com/2015/09/24/nearest-neighbor-methods-vector-models-part-1.html</link>
      <pubDate>Thu, 24 Sep 2015 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2015/09/24/nearest-neighbor-methods-vector-models-part-1.html</guid>
      <description>&lt;p&gt;This is a blog post rewritten from a presentation at &lt;a href=&#34;http://www.meetup.com/NYC-Machine-Learning/events/225265016/&#34;&gt;NYC Machine Learning&lt;/a&gt; last week. It covers a library called &lt;a href=&#34;https://github.com/spotify/annoy&#34;&gt;Annoy&lt;/a&gt; that I have built that helps you do (approximate) nearest neighbor queries in high dimensional spaces. I will be splitting it into several parts. This first talks about vector models, how to measure similarity, and why nearest neighbor queries are useful.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Presentations about Spotify music recommendations</title>
      <link>https://erikbern.com/2015/09/22/presentations-about-spotify-music-recommendations.html</link>
      <pubDate>Tue, 22 Sep 2015 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2015/09/22/presentations-about-spotify-music-recommendations.html</guid>
      <description>&lt;p&gt;A couple of people in my old team have been around talking about how Spotify does music recommendations and put together some quite good presentations.&lt;/p&gt;&#xA;&lt;p&gt;First one is Neville Li&amp;rsquo;s presentation about &lt;a href=&#34;http://www.slideshare.net/sinisalyh/scala-data-pipelines-spotify&#34;&gt;Scala Data Pipelines @ Spotify&lt;/a&gt;:&lt;/p&gt;</description>
    </item>
    <item>
      <title>Antipodes</title>
      <link>https://erikbern.com/2015/09/08/antipodes.html</link>
      <pubDate>Tue, 08 Sep 2015 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2015/09/08/antipodes.html</guid>
      <description>&lt;p&gt;I was playing around with D3 last night and built a silly visualization of antipodes and how our intuitive understanding of the world sometimes doesn&amp;rsquo;t make sense. Check out the &lt;a href=&#34;http://bl.ocks.org/erikbern/1ff88b70b70e10f81822&#34;&gt;visualization at bl.ocks.org&lt;/a&gt;!&lt;/p&gt;&#xA;&lt;p&gt;Basically the idea is if you fly from Beijing to Buenos Aires then you can have a layover at &lt;em&gt;any point of the Earth&amp;rsquo;s surface&lt;/em&gt; and it won&amp;rsquo;t make the trip longer.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Software Engineers and Automation</title>
      <link>https://erikbern.com/2015/08/16/software-engineers-and-automation.html</link>
      <pubDate>Sun, 16 Aug 2015 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2015/08/16/software-engineers-and-automation.html</guid>
      <description>&lt;p&gt;Every once in a while when talking to smart people the topic of automation comes up. Technology has made lots of occupations redundant, so what&amp;rsquo;s next?&lt;/p&gt;&#xA;&lt;p&gt;&lt;img src=&#34;https://erikbern.com/assets/2015/08/switchboard-operator.jpg&#34; alt=&#34;image&#34;&gt;&lt;/p&gt;&#xA;&lt;p&gt;&lt;em&gt;Switchboard operator, a long time ago&lt;/em&gt;&lt;/p&gt;&#xA;&lt;p&gt;What about software engineers? Every year technology replaces parts of what they do. Eventually surely everything must be replaced? I just ran into another one of these arguments: &lt;a href=&#34;https://medium.com/@dtauerbach/software-engineers-will-be-obsolete-by-2060-2a214fdf9737&#34;&gt;Software Engineers will be obsolete by 2060&lt;/a&gt;.&lt;/p&gt;</description>
    </item>
    <item>
      <title>coin2dice</title>
      <link>https://erikbern.com/2015/07/24/math-problem.html</link>
      <pubDate>Fri, 24 Jul 2015 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2015/07/24/math-problem.html</guid>
      <description>&lt;p&gt;Here&amp;rsquo;s a problem that I used to give to candidates. I stopped using it seriously a long time ago since I don&amp;rsquo;t believe in puzzles, but I think it&amp;rsquo;s kind of fun.&lt;/p&gt;&#xA;&lt;ol&gt;&#xA;&lt;li&gt;Let&amp;rsquo;s say you have a function that simulates a random coin flip. It returns “H” or “T”. This is the &lt;em&gt;only random generator available&lt;/em&gt;. How can write a new function that simulates a random dice roll (1…6)?&lt;/li&gt;&#xA;&lt;li&gt;Is there any method that guarantees that the second function returns in finite time?&lt;/li&gt;&#xA;&lt;li&gt;Let&amp;rsquo;s say you want to do this  $$ n $$ times where $$ n \to \infty $$ . What&amp;rsquo;s the most efficient way to do it? Efficient in terms of &lt;em&gt;using the fewest amount of coin flips&lt;/em&gt;.&lt;/li&gt;&#xA;&lt;/ol&gt;&#xA;&lt;p&gt;The first part is old, I think. The second and third part are follow up questions that I came up with.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Benchmark of Approximate Nearest Neighbor libraries</title>
      <link>https://erikbern.com/2015/07/04/benchmark-of-approximate-nearest-neighbor-libraries.html</link>
      <pubDate>Sat, 04 Jul 2015 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2015/07/04/benchmark-of-approximate-nearest-neighbor-libraries.html</guid>
      <description>&lt;p&gt;&lt;a href=&#34;https://github.com/spotify/annoy&#34;&gt;Annoy&lt;/a&gt; is a library written by me that supports fast approximate nearest neighbor queries. Say you have a high (1-1000) dimensional space with points in it, and you want to find the nearest neighbors to some point. Annoy gives you a way to do this very quickly. It could be points on a map, but also word vectors in a latent semantic representation or latent item vectors in collaborative filtering.&lt;/p&gt;</description>
    </item>
    <item>
      <title>More Luigi alternatives</title>
      <link>https://erikbern.com/2015/07/02/more-luigi-alternatives.html</link>
      <pubDate>Thu, 02 Jul 2015 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2015/07/02/more-luigi-alternatives.html</guid>
      <description>&lt;p&gt;The workflow engine battle has intensified with some more interesting entries lately! Here are a couple I encountered in the last few days. I love that at least two of them are direct references to Luigi!&lt;/p&gt;</description>
    </item>
    <item>
      <title>3D in D3</title>
      <link>https://erikbern.com/2015/06/21/3d-in-d3.html</link>
      <pubDate>Sun, 21 Jun 2015 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2015/06/21/3d-in-d3.html</guid>
      <description>&lt;p&gt;I have spent some time lately with &lt;a href=&#34;http://d3js.org/&#34;&gt;D3&lt;/a&gt;. It&amp;rsquo;s a lot of fun to build interactive graphs. See for instance this &lt;a href=&#34;https://rawgit.com/bettermg/crossfader/master/demo.html#wine&#34;&gt;demo&lt;/a&gt; (will provide a longer writeup soon).&lt;/p&gt;&#xA;&lt;p&gt;D3 doesn&amp;rsquo;t have support for 3D but you can do projections into 2D pretty easily. It&amp;rsquo;s just old school computer graphics. I ended up adding an animated background to this blog based on &lt;a href=&#34;https://github.com/erikbern/d3-3d&#34;&gt;an experiment&lt;/a&gt;. The math is simple.&lt;/p&gt;</description>
    </item>
    <item>
      <title>The hardest challenge about becoming a manager</title>
      <link>https://erikbern.com/2015/06/05/the-hardest-challenge-about-becoming-a-manager.html</link>
      <pubDate>Fri, 05 Jun 2015 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2015/06/05/the-hardest-challenge-about-becoming-a-manager.html</guid>
      <description>&lt;p&gt;Note: this post is full of pseudo-psychology and highly speculative content. Like most fun stuff!&lt;/p&gt;&#xA;&lt;p&gt;I became a manager back in 2009. Being a developer is fun. You have this very tangible way to measure yourself. Did I deploy something today? How much code did I write today? Did I solve some really cool machine learning problem on paper?&lt;/p&gt;</description>
    </item>
    <item>
      <title>The lane next to you is more likely to be slower than yours</title>
      <link>https://erikbern.com/2015/05/28/the-lane-next-to-you-is-more-likely-to-be-slower-than-yours.html</link>
      <pubDate>Thu, 28 May 2015 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2015/05/28/the-lane-next-to-you-is-more-likely-to-be-slower-than-yours.html</guid>
      <description>&lt;p&gt;Saw this link on Hacker News the other day: &lt;a href=&#34;http://www.citylab.com/commute/2015/05/the-highway-lane-next-to-yours-isnt-really-moving-any-faster/394079/&#34;&gt;The Highway Lane Next to Yours Isn’t Really Moving Any Faster&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;The article describes a phenomenon unique to traffic where cars spread out when they go fast and get more compact when they go slow. That&amp;rsquo;s supposedly the explanation.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Better precision and faster index building in Annoy</title>
      <link>https://erikbern.com/2015/05/26/40-better-precision-and-4x-faster-index-building-in-annoy.html</link>
      <pubDate>Tue, 26 May 2015 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2015/05/26/40-better-precision-and-4x-faster-index-building-in-annoy.html</guid>
      <description>&lt;p&gt;Sometimes you have these awesome insights. A few days ago I got an &lt;a href=&#34;https://github.com/spotify/annoy/issues/64&#34;&gt;idea&lt;/a&gt; for how to improve index building in &lt;a href=&#34;https://github.com/spotify/annoy&#34;&gt;Annoy&lt;/a&gt;.&lt;/p&gt;&#xA;&lt;p&gt;For anyone who isn&amp;rsquo;t acquainted with Annoy – it&amp;rsquo;s a C++ library with Python bindings that provides fast high-dimensional nearest neighbor search.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Annoy – now without Boost dependencies and with Python 3 Support</title>
      <link>https://erikbern.com/2015/05/03/annoy-now-without-boost-dependencies-and-with-python-3-support.html</link>
      <pubDate>Sun, 03 May 2015 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2015/05/03/annoy-now-without-boost-dependencies-and-with-python-3-support.html</guid>
      <description>&lt;p&gt;&lt;img src=&#34;https://erikbern.com/assets/2015/05/ann.png&#34; alt=&#34;image&#34;&gt;&lt;/p&gt;&#xA;&lt;p&gt;&lt;a href=&#34;https://github.com/spotify/annoy&#34;&gt;Annoy&lt;/a&gt; is a C++/Python package I built for fast approximate nearest neighbor search in high dimensional spaces. Spotify uses it a lot to find similar items. First, matrix factorization gives a low dimensional representation of each item (artist/album/track/user) so that every item is a k-dimensional vector, where k is typically 40-100. This is then loaded into an Annoy index for a number of things: fast similar items, personal music recommendations, etc.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Ping the world</title>
      <link>https://erikbern.com/2015/04/26/ping-the-world.html</link>
      <pubDate>Sun, 26 Apr 2015 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2015/04/26/ping-the-world.html</guid>
      <description>&lt;p&gt;I just pinged a few million random IP addresses from my apartment in NYC. Here&amp;rsquo;s the result:&lt;/p&gt;&#xA;&lt;p&gt;&lt;img src=&#34;https://erikbern.com/assets/2015/04/nyc.png&#34; alt=&#34;image&#34;&gt;&lt;/p&gt;&#xA;&lt;p&gt;Some notes:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;What&amp;rsquo;s going on with Sweden? Too much torrenting?&lt;/li&gt;&#xA;&lt;li&gt;Ireland is likewise super slow, but &lt;em&gt;not&lt;/em&gt; Northern Ireland&lt;/li&gt;&#xA;&lt;li&gt;Eastern Ukraine is also super slow, maybe not surprising given current events.&lt;/li&gt;&#xA;&lt;li&gt;Toronto seems screwed too, as well as part of NH and western PA.&lt;/li&gt;&#xA;&lt;li&gt;Russia has &lt;em&gt;fast&lt;/em&gt; internet.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;The world:&lt;/p&gt;</description>
    </item>
    <item>
      <title>Black Box Machine Learning in the Cloud</title>
      <link>https://erikbern.com/2015/04/22/black-box-machine-learning-in-the-cloud.html</link>
      <pubDate>Wed, 22 Apr 2015 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2015/04/22/black-box-machine-learning-in-the-cloud.html</guid>
      <description>&lt;p&gt;&lt;img src=&#34;https://erikbern.com/assets/2015/04/black-cloud-4g9h.jpg&#34; alt=&#34;image&#34;&gt;&lt;/p&gt;&#xA;&lt;p&gt;There&amp;rsquo;s a bunch of companies working on machine learning as a service. Some old companies like &lt;a href=&#34;https://cloud.google.com/prediction/docs&#34;&gt;Google&lt;/a&gt;, but now also &lt;a href=&#34;http://aws.amazon.com/machine-learning/&#34;&gt;Amazon&lt;/a&gt; and &lt;a href=&#34;http://azure.microsoft.com/en-us/services/machine-learning&#34;&gt;Microsoft&lt;/a&gt;.&lt;/p&gt;&#xA;&lt;p&gt;Then there&amp;rsquo;s a ton of startups: &lt;a href=&#34;http://prediction.io/&#34;&gt;PredictionIO&lt;/a&gt; ($2.7M funding), &lt;a href=&#34;https://bigml.com/&#34;&gt;BigML&lt;/a&gt; ($1.6M funding), &lt;a href=&#34;http://www.clarifai.com/&#34;&gt;Clarifai&lt;/a&gt;, etc, etc. Here&amp;rsquo;s a &lt;a href=&#34;http://www.bloomberg.com/company/content/uploads/sites/2/2014/12/machine-learning-jpeg.jpg&#34;&gt;nice map&lt;/a&gt; from Bloomberg showing some of the landscape.&lt;/p&gt;</description>
    </item>
    <item>
      <title>It&#39;s called Berkson&#39;s paradox!</title>
      <link>https://erikbern.com/2015/04/09/its-called-berksons-paradox.html</link>
      <pubDate>Thu, 09 Apr 2015 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2015/04/09/its-called-berksons-paradox.html</guid>
      <description>&lt;p&gt;As noted by &lt;a href=&#34;https://twitter.com/davidandrzej/status/585940491927027712&#34;&gt;multiple&lt;/a&gt; &lt;a href=&#34;https://twitter.com/JSEllenberg/status/585959375769972736&#34;&gt;tweets&lt;/a&gt;, my previous post describes a phenomenon denoted &lt;a href=&#34;http://en.wikipedia.org/wiki/Berkson%27s_paradox&#34;&gt;Berkson&amp;rsquo;s paradox&lt;/a&gt;.&lt;/p&gt;&#xA;&lt;p&gt;Here&amp;rsquo;s another example: &lt;a href=&#34;http://www.slate.com/blogs/how_not_to_be_wrong/2014/06/03/berkson_s_fallacy_why_are_handsome_men_such_jerks.html&#34;&gt;Why Are Handsome Men Such Jerks?&lt;/a&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <title>Norvig&#39;s claim that programming competitions correlate negatively with being good on the job</title>
      <link>https://erikbern.com/2015/04/07/norvigs-claim-that-programming-competitions-correlate-negatively-with-being-good-on-the-job.html</link>
      <pubDate>Tue, 07 Apr 2015 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2015/04/07/norvigs-claim-that-programming-competitions-correlate-negatively-with-being-good-on-the-job.html</guid>
      <description>&lt;p&gt;I saw a bunch of tweets over the weekend about Peter Norvig &lt;a href=&#34;http://www.catonmat.net/blog/programming-competitions-work-performance/&#34;&gt;claiming there&amp;rsquo;s a negative correlation&lt;/a&gt; between being good at programming competitions and being good at the job. There were some decent &lt;a href=&#34;https://news.ycombinator.com/item?id=9324209&#34;&gt;Hacker News comments&lt;/a&gt; on it.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Pinterest open sources Pinball</title>
      <link>https://erikbern.com/2015/03/14/pinterest-open-sources-pinball.html</link>
      <pubDate>Sat, 14 Mar 2015 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2015/03/14/pinterest-open-sources-pinball.html</guid>
      <description>&lt;p&gt;&lt;img src=&#34;https://erikbern.com/assets/2015/03/41Pz5ClQ46L._SY300_.jpg&#34; alt=&#34;image&#34;&gt;&lt;/p&gt;&#xA;&lt;p&gt;Pinterest just open sourced &lt;a href=&#34;https://github.com/pinterest/pinball&#34;&gt;Pinball&lt;/a&gt; which seems like an interesting &lt;a href=&#34;https://github.com/spotify/luigi&#34;&gt;Luigi&lt;/a&gt; alternative. There&amp;rsquo;s two blog posts: &lt;a href=&#34;http://engineering.pinterest.com/post/74429563460/pinball-building-workflow-management&#34;&gt;Pinball: Building workflow management&lt;/a&gt; (from 2014) and &lt;a href=&#34;http://engineering.pinterest.com/post/113376157699/open-sourcing-pinball&#34;&gt;Open-sourcing Pinball&lt;/a&gt; (from this week). The author has a comment in the &lt;a href=&#34;https://news.ycombinator.com/item?id=9189196&#34;&gt;comments thread&lt;/a&gt; on Hacker News:&lt;/p&gt;</description>
    </item>
    <item>
      <title>The relationship between commit size and commit message size</title>
      <link>https://erikbern.com/2015/02/26/the-relationship-between-commit-size-and-commit-message-size.html</link>
      <pubDate>Thu, 26 Feb 2015 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2015/02/26/the-relationship-between-commit-size-and-commit-message-size.html</guid>
      <description>&lt;img src=&#34;https://erikbern.com/assets/2015/02/Screen-Shot-2015-02-24-at-8.56.35-PM.png&#34; alt=&#34;Screen Shot 2015-02-24 at 8.56.35 PM&#34; width=&#34;585&#34; height=&#34;241&#34; class=&#34;alignnone size-full wp-image-1100&#34; /&gt;&#xA;&lt;p&gt;Wow I guess it was more than a year ago that I tweeted this. Crazy how time flies by. Anyway, here&amp;rsquo;s my rationale:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;When I update one line of code I feel like I have to put in a long explanation about its side effects, why it&amp;rsquo;s fully backwards compatible, and why it fixes some issue #xyz.&lt;/li&gt;&#xA;&lt;li&gt;When I refactor 500 lines of code, I get too lazy to write anything sensible, so I just put “refactoring FooBarController”. Note: &lt;em&gt;don&amp;rsquo;t do at home!&lt;/em&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;I decided to plot the relationship for &lt;a href=&#34;https://github.com/spotify/luigi&#34;&gt;Luigi&lt;/a&gt;:&#xA;{% include 2015-02-26-the-relationship-between-commit-size-and-commit-message-size.html %}&lt;/p&gt;</description>
    </item>
    <item>
      <title>My favorite management failures</title>
      <link>https://erikbern.com/2015/02/22/my-favorite-management-failures.html</link>
      <pubDate>Sun, 22 Feb 2015 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2015/02/22/my-favorite-management-failures.html</guid>
      <description>&lt;p&gt;For most people straight out of school, work life is a bit of a culture shock. For me it was an awesome experience, but a lot of the constraints were different and I had to learn to optimize for different things. It wasn&amp;rsquo;t necessarily the technology that I struggled with. The hardest part was how to manage my own projects and my time, as well as how to grow and make impact as an engineer. I&amp;rsquo;ve listed some of my biggest mistakes, which are also mistakes I see other (mostly junior) engineers make.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Leaving Spotify</title>
      <link>https://erikbern.com/2015/02/11/leaving-spotify.html</link>
      <pubDate>Wed, 11 Feb 2015 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2015/02/11/leaving-spotify.html</guid>
      <description>&lt;p&gt;Febrary 6 was my last day at Spotify. In total I spent more than six years at Spotify and it was an amazing experience.&lt;/p&gt;&#xA;&lt;p&gt;I joined Spotify in Stockholm in 2008, mainly because a bunch of friends from programming competitions had joined already. Their goal to change music consumption seemed ridiculous at that point, but six years later I think it&amp;rsquo;s safe to say they actually succeeded.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Scala Data Pipelines for Music Recommendations</title>
      <link>https://erikbern.com/2015/01/13/scala-data-pipelines-for-music-recommendations.html</link>
      <pubDate>Tue, 13 Jan 2015 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2015/01/13/scala-data-pipelines-for-music-recommendations.html</guid>
      <description>&lt;p&gt;&lt;a href=&#34;https://twitter.com/MrChrisJohnson&#34;&gt;Chris Johnson&lt;/a&gt;‘s presentation from &lt;a href=&#34;http://datadaytexas.com/&#34;&gt;Data Day Texas&lt;/a&gt;:&lt;/p&gt;</description>
    </item>
    <item>
      <title>Everything I learned about technical debt</title>
      <link>https://erikbern.com/2014/12/30/everything-i-learned-about-technical-debt.html</link>
      <pubDate>Tue, 30 Dec 2014 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2014/12/30/everything-i-learned-about-technical-debt.html</guid>
      <description>&lt;p&gt;I just made it to Sweden suffering from jet lag induced insomnia, but this blog post will not cover that. Instead, I will talk a little bit about &lt;a href=&#34;http://en.wikipedia.org/wiki/Technical_debt&#34;&gt;technical debt&lt;/a&gt;.&lt;/p&gt;&#xA;&lt;p&gt;The concept of technical debt always resonated with me, partly because I always like the analogy with “real” debt. If you take the analogy really far, there are some curious implications. I always like to think of the “interest rate” of software development. Debt is really just borrowing from the future, with some interest rate. You are getting a free lunch right now, but you need to pay back 1.2 free lunches in a few months. That&amp;rsquo;s the interest rate. In a software project the equivalent could be to pick a database that will have scalability issues later, or to make all member variables of some class public. You are doing it because it makes it easier to do things &lt;em&gt;now&lt;/em&gt; but you will have to pay the cost of that later.&lt;/p&gt;</description>
    </item>
    <item>
      <title>I already found the best gifs</title>
      <link>https://erikbern.com/2014/12/28/i-already-found-the-best-gifs.html</link>
      <pubDate>Sun, 28 Dec 2014 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2014/12/28/i-already-found-the-best-gifs.html</guid>
      <description>&lt;p&gt;Just search for “&lt;a href=&#34;https://www.google.com/search?q=hackers+gif&amp;amp;safe=off&amp;amp;espv=2&amp;amp;biw=1289&amp;amp;bih=706&amp;amp;source=lnms&amp;amp;tbm=isch&amp;amp;sa=X&amp;amp;ei=_FSfVJfWKYuVNvJK&amp;amp;ved=0CAgQ_AUoAQ&#34;&gt;hackers gif&lt;/a&gt;“.&lt;/p&gt;&#xA;&lt;p&gt;&lt;img src=&#34;https://erikbern.com/assets/2014/12/hackers-the-plague-1.gif&#34; alt=&#34;image&#34;&gt;&lt;/p&gt;&#xA;&lt;p&gt;&lt;img src=&#34;https://erikbern.com/assets/2014/12/hackers-gif-preparing-to-hack.gif&#34; alt=&#34;image&#34;&gt;&lt;/p&gt;&#xA;&lt;p&gt;&lt;img src=&#34;https://erikbern.com/assets/2014/12/hackers-mathew-lillard.gif&#34; alt=&#34;image&#34;&gt;&lt;/p&gt;&#xA;&lt;p&gt;There you go. Fun for your work emails for the next 500 years. From the awesome movie &lt;a href=&#34;http://www.imdb.com/title/tt0113243/&#34;&gt;Hackers&lt;/a&gt;. That movie together with &lt;a href=&#34;http://www.imdb.com/title/tt0080120/&#34;&gt;The Warriors&lt;/a&gt; convinced me that I wanted to live in NYC when I was like… 14 years old.&lt;/p&gt;</description>
    </item>
    <item>
      <title>A brief history of Hadoop at Spotify</title>
      <link>https://erikbern.com/2014/12/20/a-brief-history-of-hadoop-at-spotify-2008-2009.html</link>
      <pubDate>Sat, 20 Dec 2014 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2014/12/20/a-brief-history-of-hadoop-at-spotify-2008-2009.html</guid>
      <description>&lt;p&gt;I was talking with some data engineers at Spotify and had a moment of nostalgia.&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;2008&lt;/strong&gt;&lt;/p&gt;&#xA;&lt;p&gt;I was writing my master&amp;rsquo;s thesis at Spotify and had to run a Hadoop job to extract some data from the logs. Every time I started running the job, I kept hearing this subtle noise. I kept noticing the correlation for a few days but I was too intimidated to ask. Finally people starting cursing that their machines had gotten really slow lately and I realized &lt;em&gt;we were running Hadoop on the developer&amp;rsquo;s desktop machines&lt;/em&gt;. No one had told me. I think back then we had only GB&amp;rsquo;s of log data. I remember running &lt;em&gt;less&lt;/em&gt; on the log and I would recognize half the usernames because they were my friends.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Luigi Presentation @ NYC Data Science, Dec 16, 2014</title>
      <link>https://erikbern.com/2014/12/17/luigi-presentation-nyc-data-science-dec-16-2014.html</link>
      <pubDate>Wed, 17 Dec 2014 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2014/12/17/luigi-presentation-nyc-data-science-dec-16-2014.html</guid>
      <description>&lt;p&gt;More Luigi presentations!&lt;/p&gt;</description>
    </item>
    <item>
      <title>Luigi talk tomorrow</title>
      <link>https://erikbern.com/2014/12/16/luigi-talk-tomorrow.html</link>
      <pubDate>Tue, 16 Dec 2014 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2014/12/16/luigi-talk-tomorrow.html</guid>
      <description>&lt;p&gt;At &lt;a href=&#34;http://www.meetup.com/NYC-Data-Science/events/218604422/&#34;&gt;NYC Data Science meetup&lt;/a&gt;! Unfortunately the space is full but the talk will be livestreamed – check out the meetup web page for a link tomorrow.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Deep learning for&amp;#8230; Go</title>
      <link>https://erikbern.com/2014/12/11/deep-learning-for-go.html</link>
      <pubDate>Thu, 11 Dec 2014 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2014/12/11/deep-learning-for-go.html</guid>
      <description>&lt;p&gt;This is the last post about deep learning for chess/go/whatever. But &lt;a href=&#34;http://arxiv.org/abs/1412.3409&#34;&gt;this really cool paper&lt;/a&gt; by Christopher Clark and Amos Storkey was forwarded to me by &lt;a href=&#34;https://twitter.com/meickenberg&#34;&gt;Michael Eickenberg&lt;/a&gt;. It&amp;rsquo;s about using convolutional neural networks to play Go. The authors of the paper do a much better job than I would ever have done of modeling move prediction in Go and show that their model beat certain Go engines.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Deep learning for&amp;#8230; chess (addendum)</title>
      <link>https://erikbern.com/2014/12/08/deep-learning-for-chess-addendum.html</link>
      <pubDate>Mon, 08 Dec 2014 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2014/12/08/deep-learning-for-chess-addendum.html</guid>
      <description>&lt;p&gt;My previous blog post about deep learning for chess blew up and made it to Hacker News and a couple of other places. One pretty amazing thing was that the &lt;a href=&#34;https://github.com/erikbern/deep-pink&#34;&gt;Github repo&lt;/a&gt; got 150 stars overnight. There was also lots of &lt;a href=&#34;https://news.ycombinator.com/item?id=8685840&#34;&gt;comments&lt;/a&gt; on the Hacker News post that I thought were really interesting. (See this skeptical &lt;a href=&#34;https://news.ycombinator.com/item?id=8687273&#34;&gt;comment&lt;/a&gt; for instance).&lt;/p&gt;</description>
    </item>
    <item>
      <title>Deep learning for... chess</title>
      <link>https://erikbern.com/2014/11/29/deep-learning-for-chess.html</link>
      <pubDate>Sat, 29 Nov 2014 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2014/11/29/deep-learning-for-chess.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve been meaning to learn &lt;a href=&#34;http://deeplearning.net/software/theano/&#34;&gt;Theano&lt;/a&gt; for a while and I&amp;rsquo;ve also wanted to build a chess AI at some point. So why not combine the two? That&amp;rsquo;s what I thought, and I ended up spending way too much time on it. I actually built most of this back in September but not until Thanksgiving did I have the time to write a blog post about it.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Optimizing things: everything is a proxy for a proxy for a proxy</title>
      <link>https://erikbern.com/2014/11/22/optimizing-things-everything-is-a-proxy-for-a-proxy-for-a-proxy.html</link>
      <pubDate>Sat, 22 Nov 2014 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2014/11/22/optimizing-things-everything-is-a-proxy-for-a-proxy-for-a-proxy.html</guid>
      <description>&lt;p&gt;Say you build a machine learning model, like a movie recommender system. You need to optimize for something. You have 1-5 stars as ratings so let&amp;rsquo;s optimize for mean squared error. Great.&lt;/p&gt;&#xA;&lt;p&gt;Then let&amp;rsquo;s say you build a new model. It has even lower mean squared error. You deploy it. This model turns out to give a lower mean squared error. You roll it out to users and the metrics are tanking. Crap! Ok so maybe mean squared error isn&amp;rsquo;t the right thing to optimize for.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Luigi conquering the world</title>
      <link>https://erikbern.com/2014/11/15/luigi-spreading-to-the-west-coast.html</link>
      <pubDate>Sat, 15 Nov 2014 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2014/11/15/luigi-spreading-to-the-west-coast.html</guid>
      <description>&lt;p&gt;I keep forgetting to buy a costume for Halloween every year, so this year I prepared and got myself a Luigi costume a month in advance. Only to realize I was going to be out of town the whole weekend. If anyone wants a Luigi costume, let me know!&lt;figure id=&#34;attachment_816&#34; style=&#34;width: 395px;&#34; class=&#34;wp-caption alignnone&#34;&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <title>Annoying blog post</title>
      <link>https://erikbern.com/2014/11/11/annoying-blog-post.html</link>
      <pubDate>Tue, 11 Nov 2014 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2014/11/11/annoying-blog-post.html</guid>
      <description>&lt;p&gt;I spent a couple of hours this weekend going through some pull requests and issues to &lt;a href=&#34;https://github.com/spotify/annoy&#34;&gt;Annoy&lt;/a&gt;, which is an open source C++/Python library for &lt;a href=&#34;http://en.wikipedia.org/wiki/Nearest_neighbor_search#Approximate_nearest_neighbor&#34;&gt;Approximate Nearest Neighbor&lt;/a&gt; search.&lt;/p&gt;&#xA;&lt;p&gt;I set up Travis-CI integration and spent some time on &lt;a href=&#34;https://github.com/spotify/annoy/issues/13&#34;&gt;one of the issues&lt;/a&gt; that multiple people had reported. At the end of the day, it turns out the issue was actually caused by a bug in GCC 4.8. Some crazy compiler optimization introduced between 4.6 and 4.8 caused this loop to be removed:&lt;/p&gt;</description>
    </item>
    <item>
      <title>The Filter Bubble is Silly and you Can&#39;t Guess What Happened Next</title>
      <link>https://erikbern.com/2014/10/10/the-filter-bubble-is-silly-and-you-cant-guess-what-happened-next.html</link>
      <pubDate>Fri, 10 Oct 2014 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2014/10/10/the-filter-bubble-is-silly-and-you-cant-guess-what-happened-next.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;m at &lt;a href=&#34;recsys.acm.org/recsys14/program/&#34;&gt;RecSys 2014&lt;/a&gt;, meeting a lot of people and hanging out at talks. Some of the discussions here was about the &lt;a href=&#34;http://en.wikipedia.org/wiki/Filter_bubble&#34;&gt;filter bubble&lt;/a&gt; which prompted me to formalize my own thoughts.&lt;/p&gt;&#xA;&lt;p&gt;I firmly believe that it&amp;rsquo;s the role of a system to respect the user&amp;rsquo;s intent. Any sensible system will optimize for user&amp;rsquo;s long-term happiness by providing info back to the user that s/he finds useful. This holds true as long as a system isn&amp;rsquo;t (a) stupid and recommends the wrong content (b) trying to push its own agenda, that may or may not be hidden.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Detecting corporate fraud using Benford&#39;s law</title>
      <link>https://erikbern.com/2014/10/07/detecting-corporate-fraud-using-benfords-law.html</link>
      <pubDate>Tue, 07 Oct 2014 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2014/10/07/detecting-corporate-fraud-using-benfords-law.html</guid>
      <description>&lt;p&gt;&lt;strong&gt;Note: This is a silly application. Don&amp;rsquo;t take anything seriously.&lt;/strong&gt;&lt;/p&gt;&#xA;&lt;p&gt;&lt;a href=&#34;http://en.wikipedia.org/wiki/Benford&#39;s_law&#34;&gt;Benford&amp;rsquo;s law&lt;/a&gt; describes a phenomenon where numbers in any data series will exhibit patterns in their first digit. For instance, if you took a list of the 1,000 longest rivers of Mongolia, or the average daily calorie consumption of mammals, or the wealth distribution of German soccer players, you will on average see that these numbers start with “1” about 30% of the time. I won&amp;rsquo;t attempt at proving this, but essentially it&amp;rsquo;s a result of scale invariance. It doesn&amp;rsquo;t apply to &lt;em&gt;all&lt;/em&gt; numerical series, like IQ or shoe size, but this pattern turns out to pop up in a lot of places.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Running Theano on EC2</title>
      <link>https://erikbern.com/2014/08/19/running-theano-on-ec2.html</link>
      <pubDate>Tue, 19 Aug 2014 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2014/08/19/running-theano-on-ec2.html</guid>
      <description>&lt;p&gt;Inspired by &lt;a href=&#34;http://benanne.github.io/2014/08/05/spotify-cnns.html&#34;&gt;Sander Dieleman&amp;rsquo;s internship&lt;/a&gt; at Spotify, I&amp;rsquo;ve been playing around with deep learning using &lt;a href=&#34;http://deeplearning.net/software/theano/&#34;&gt;Theano&lt;/a&gt;. Theano is this Python package that lets you define symbolic expressions (cool), does automatic differentiation (really cool), and compiles it down into bytecode to run on a CPU/GPU (super cool). It&amp;rsquo;s built by Yoshua Bengio&amp;rsquo;s deep learning team up in Montreal.&lt;/p&gt;</description>
    </item>
    <item>
      <title>In defense of false positives (why you can&#39;t fail with A/B tests)</title>
      <link>https://erikbern.com/2014/07/30/in-defense-of-false-positives-why-you-cant-fail-with-ab-tests.html</link>
      <pubDate>Wed, 30 Jul 2014 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2014/07/30/in-defense-of-false-positives-why-you-cant-fail-with-ab-tests.html</guid>
      <description>&lt;p&gt;Many years ago, I used to think that A/B tests were foolproof and all you need to do is compare the metrics for the two groups. The group with the highest conversion rate wins, right?&lt;/p&gt;</description>
    </item>
    <item>
      <title>Recurrent Neural Networks for Collaborative Filtering</title>
      <link>https://erikbern.com/2014/06/28/recurrent-neural-networks-for-collaborative-filtering.html</link>
      <pubDate>Sat, 28 Jun 2014 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2014/06/28/recurrent-neural-networks-for-collaborative-filtering.html</guid>
      <description>&lt;p&gt;I’ve been spending quite some time lately playing around with RNN’s for collaborative filtering. RNN’s are models that predict a &lt;em&gt;sequence&lt;/em&gt; of something. The beauty is that this something can be anything really – as long as you can design an output gate with a proper loss function, you can model essentially anything.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Where do locals go in NYC?</title>
      <link>https://erikbern.com/2014/06/17/where-do-locals-go-in-nyc.html</link>
      <pubDate>Tue, 17 Jun 2014 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2014/06/17/where-do-locals-go-in-nyc.html</guid>
      <description>&lt;p&gt;One obvious thing to anyone living in NYC is how tourists cluster in certain areas. I was curious about the larger patterns around this, so I spent some time looking at data. The thing I wanted to understand is: what areas are dominated by tourists? Or conversely, what areas are dominated by locals?&lt;/p&gt;</description>
    </item>
    <item>
      <title>How to build up a data team (everything I ever learned about recruiting)</title>
      <link>https://erikbern.com/2014/06/08/how-to-build-up-a-data-team-everything-i-ever-learned-about-recruiting.html</link>
      <pubDate>Sun, 08 Jun 2014 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2014/06/08/how-to-build-up-a-data-team-everything-i-ever-learned-about-recruiting.html</guid>
      <description>&lt;p&gt;During my time at Spotify, I&amp;rsquo;ve reviewed thousands of resumes and interviewed hundreds of people. Lots of them were rejected but lots of them also got offers. Finally, I&amp;rsquo;ve also had my share of offers rejected by the candidate.&lt;/p&gt;</description>
    </item>
    <item>
      <title>The power of ensembles</title>
      <link>https://erikbern.com/2014/04/24/the-power-of-ensembles.html</link>
      <pubDate>Thu, 24 Apr 2014 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2014/04/24/the-power-of-ensembles.html</guid>
      <description>&lt;p&gt;From my presentation at MLConf, one of the points I think is worth stressing again is how extremely well combining different algorithms works.&lt;/p&gt;&#xA;&lt;p&gt;&lt;img src=&#34;https://erikbern.com/assets/2014/04/ensembles.png&#34; alt=&#34;image&#34;&gt;&lt;/p&gt;&#xA;&lt;p&gt;In this case, we&amp;rsquo;re training machine learning algorithms on different data sets (playlists, play counts, sessions) and different objectives (least squares, max likelihood). Then we combine all the models using gradient boosted decision trees training on a smaller but higher quality data set. Finally, we validate on a third data set, in this case looking at recall for a ground truth data set of related artists.&lt;/p&gt;</description>
    </item>
    <item>
      <title>MLConf 2014</title>
      <link>https://erikbern.com/2014/04/12/mlconf-2014.html</link>
      <pubDate>Sat, 12 Apr 2014 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2014/04/12/mlconf-2014.html</guid>
      <description>&lt;p&gt;Just spent a day at &lt;a href=&#34;http://mlconf.com/&#34;&gt;MLConf&lt;/a&gt; where I was talking about how we do music recommendations. There was a whole range of great speakers (actually almost 2/3 women which was pretty cool in itself).&lt;/p&gt;&#xA;&lt;p&gt;Here are my slides:&lt;/p&gt;</description>
    </item>
    <item>
      <title>Music recommendations using cover images (part 1)</title>
      <link>https://erikbern.com/2014/04/01/music-recommendations-using-cover-images-part-1.html</link>
      <pubDate>Tue, 01 Apr 2014 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2014/04/01/music-recommendations-using-cover-images-part-1.html</guid>
      <description>&lt;p&gt;Scrolling through the &lt;a href=&#34;https://play.spotify.com/discover&#34;&gt;Discover page&lt;/a&gt; on Spotify the other day it occurred to me that the album is in fact a fairly strong visual proxy for what kind of content you can expect from it. I started wondering if the album cover can in fact be used for recommendations. For many obvious reasons this is a kind of ridiculous idea, but still interesting enough that I just had to explore it a bit. So, I embarked on a journey to see how far I could get in a few hours.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Luigi success</title>
      <link>https://erikbern.com/2014/03/22/luigi-party.html</link>
      <pubDate>Sat, 22 Mar 2014 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2014/03/22/luigi-party.html</guid>
      <description>&lt;p&gt;So &lt;a href=&#34;https://github.com/spotify/luigi&#34;&gt;Luigi&lt;/a&gt;, our open sourced workflow engine in Python, just recently passed 1,000 stars on Github, then shortly after passed &lt;a href=&#34;https://github.com/yelp/mrjob&#34;&gt;mrjob&lt;/a&gt; as (I think) the most popular Python package to do Hadoop stuff. This is exciting!&lt;/p&gt;&#xA;&lt;p&gt;&lt;img src=&#34;https://erikbern.com/assets/2014/03/luigi-toy.jpg&#34; alt=&#34;image&#34;&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <title>Welcome Echo Nest!</title>
      <link>https://erikbern.com/2014/03/22/welcome-echo-nest.html</link>
      <pubDate>Sat, 22 Mar 2014 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2014/03/22/welcome-echo-nest.html</guid>
      <description>&lt;p&gt;In case you missed it, we just acquired a company called &lt;a href=&#34;http://echonest.com&#34;&gt;Echo Nest&lt;/a&gt; in Boston. These people have been obsessed with understanding music for the past 8 years since it was founded by &lt;a href=&#34;https://twitter.com/bwhitman&#34;&gt;Brian Whitman&lt;/a&gt; and &lt;a href=&#34;https://twitter.com/tjehan&#34;&gt;Tristan Jehan&lt;/a&gt; out of MIT Medialab.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Momentum strategies</title>
      <link>https://erikbern.com/2014/03/03/momentum-strategies.html</link>
      <pubDate>Mon, 03 Mar 2014 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2014/03/03/momentum-strategies.html</guid>
      <description>&lt;p&gt;Haven&amp;rsquo;t posted anything in ages, so here&amp;rsquo;s a quick hack I threw together in Python on a Sunday night. Basically I wanted to know whether momentum strategies work well for international stock indexes. I spent a bit of time putting together a strategy that buys the stock index if the return during the previous n days was positive, otherwise doesn&amp;rsquo;t do anything. I ran this strategy for a basket of approximately 20 stock markets.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Ratio metrics</title>
      <link>https://erikbern.com/2014/01/23/ratio-metrics.html</link>
      <pubDate>Thu, 23 Jan 2014 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2014/01/23/ratio-metrics.html</guid>
      <description>&lt;p&gt;We run a ton of A/B tests at Spotify and we look at a ton of metrics. Defining metrics is a little bit of an art form. Ideally you want to define success metrics before you run a test to avoid cherry picking metrics. You also want to define a metric that has as high signal to noise ratio. And of course, most importantly, your metric should ideally correlate to high level business impact as much as possible.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Benchmarking nearest neighbor libraries in Python</title>
      <link>https://erikbern.com/2014/01/12/benchmarking-nearest-neighbor-libraries-in-python.html</link>
      <pubDate>Sun, 12 Jan 2014 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2014/01/12/benchmarking-nearest-neighbor-libraries-in-python.html</guid>
      <description>&lt;p&gt;&lt;a href=&#34;https://twitter.com/RadimRehurek&#34;&gt;Radim Rehurek&lt;/a&gt; has put together an excellent summary of approximate nearest neighbor libraries in Python. This is exciting, because one of the libraries he&amp;rsquo;s covering, &lt;a href=&#34;https://github.com/spotify/annoy&#34;&gt;annoy&lt;/a&gt;, was built by me.&lt;/p&gt;&#xA;&lt;p&gt;After &lt;a href=&#34;http://radimrehurek.com/2013/11/performance-shootout-of-nearest-neighbours-intro/&#34;&gt;introducing the problem&lt;/a&gt;, he goes &lt;a href=&#34;http://radimrehurek.com/2013/12/performance-shootout-of-nearest-neighbours-contestants&#34;&gt;through the list of contestants&lt;/a&gt; and sticks with five remaining ones. Finally, &lt;a href=&#34;http://radimrehurek.com/2014/01/performance-shootout-of-nearest-neighbours-querying/&#34;&gt;the benchmarks&lt;/a&gt; pits annoy against &lt;a href=&#34;https://github.com/mariusmuja/flann&#34;&gt;FLANN&lt;/a&gt;. Although FLANN seems to have roughly 4x better performance, somewhat surprisingly, Radim concludes annoy is the “winner”. Yay!&lt;figure id=&#34;attachment_443&#34; style=&#34;width: 1213px;&#34; class=&#34;wp-caption alignnone&#34;&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <title>More recommender algorithms</title>
      <link>https://erikbern.com/2013/12/20/more-insight-into-recommender-algorithms.html</link>
      <pubDate>Fri, 20 Dec 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/12/20/more-insight-into-recommender-algorithms.html</guid>
      <description>&lt;p&gt;I wanted to share some more insight into the algorithms we use at Spotify. One matrix factorization algorithm we have used for a while assumes that we have user vectors  $$ bf{a}_u $$ and item vectors $$ bf{b}_i $$ . The next track  $$ i $$ for a user is now given by the relation&lt;/p&gt;</description>
    </item>
    <item>
      <title>Microsoft&#39;s new marketing strategy: give up</title>
      <link>https://erikbern.com/2013/12/12/microsofts-new-marketing-strategy-give-up.html</link>
      <pubDate>Thu, 12 Dec 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/12/12/microsofts-new-marketing-strategy-give-up.html</guid>
      <description>&lt;p&gt;I think it&amp;rsquo;s funny how MS at some point realized they are not the cool kids and there&amp;rsquo;s no reason to appeal to that target audience. Their new marketing strategy finally admits what&amp;rsquo;s been long known: the correlation between “business casual” and using Microsoft products:&lt;/p&gt;</description>
    </item>
    <item>
      <title>Bagging as a regularizer</title>
      <link>https://erikbern.com/2013/12/06/bagging-as-a-regularizer.html</link>
      <pubDate>Fri, 06 Dec 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/12/06/bagging-as-a-regularizer.html</guid>
      <description>&lt;p&gt;One thing I encountered today was a trick using &lt;a href=&#34;http://en.wikipedia.org/wiki/Bootstrap_aggregating&#34;&gt;bagging&lt;/a&gt; as a way to go beyond a point estimate and get an approximation for the full distribution. This can then be used to penalize predictions with larger uncertainty, which helps reducing false positives.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Model benchmarks</title>
      <link>https://erikbern.com/2013/11/02/model-benchmarks.html</link>
      <pubDate>Sat, 02 Nov 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/11/02/model-benchmarks.html</guid>
      <description>&lt;p&gt;A lot of people have asked me what models we use for recommendations at Spotify so I wanted to share some insights. Here&amp;rsquo;s benchmarks for some models. Note that we don&amp;rsquo;t use all of them in production.&lt;figure id=&#34;attachment_341&#34; style=&#34;width: 495px;&#34; class=&#34;wp-caption alignnone&#34;&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <title>statself.com</title>
      <link>https://erikbern.com/2013/10/18/statself-com.html</link>
      <pubDate>Fri, 18 Oct 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/10/18/statself-com.html</guid>
      <description>&lt;p&gt;Btw I just put something up online that I spent a couple of evenings in my couch putting together: it&amp;rsquo;s a website where you can track any numerical data on the web. Want to know how many &lt;a href=&#34;http://statself.com/series/3acd40e5&#34;&gt;Twitter followers&lt;/a&gt; you have? &lt;a href=&#34;http://statself.com/series/5b46b219&#34;&gt;Temperature in NYC&lt;/a&gt;? Go to &lt;a href=&#34;http://statself.com&#34;&gt;statself.com&lt;/a&gt; and start tracking it.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Implicit data and collaborative filtering</title>
      <link>https://erikbern.com/2013/09/16/implicit-data-and-collaborative-filtering.html</link>
      <pubDate>Mon, 16 Sep 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/09/16/implicit-data-and-collaborative-filtering.html</guid>
      <description>&lt;p&gt;A lot of people these days know about collaborative filtering. It&amp;rsquo;s that Netflix Prize thing, right? People rate things 1-5 stars and then you have to predict missing ratings.&lt;/p&gt;&#xA;&lt;p&gt;While there&amp;rsquo;s no doubt that the Netflix Prize was successful, I think it created an illusion that all recommender systems care about explicit 1-5 ratings and RMSE as the objective. Some people even distrust me when I talk about the approach we take at Spotify.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Vote for our SXSW panel!</title>
      <link>https://erikbern.com/2013/09/04/vote-for-our-sxsw-panel.html</link>
      <pubDate>Wed, 04 Sep 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/09/04/vote-for-our-sxsw-panel.html</guid>
      <description>&lt;p&gt;If you have a few minutes, you should check out mine and &lt;a href=&#34;https://twitter.com/mrchrisjohnson&#34;&gt;Chris Johnson&lt;/a&gt;‘s panel proposal. Go here and vote: &lt;a href=&#34;http://panelpicker.sxsw.com/vote/24504&#34;&gt;http://panelpicker.sxsw.com/vote/24504&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Algorithmic Music Discovery at Spotify&lt;/strong&gt;&lt;/p&gt;&#xA;&lt;p&gt;****Spotify crunches hundreds of billions of streams to analyze user&amp;rsquo;s music taste and provide music recommendations for its users. We will discuss how the algorithms work, how they fit in within the products, what the problems are and where we think music discovery is going. The talk will be quite technical with a focus on the concepts and methods, mainly how we use large scale machine learning, but we will also some aspects of music discovery from a user perspective that greatly influenced the design decisions.&lt;/p&gt;</description>
    </item>
    <item>
      <title>What&#39;s up with music recommendations?</title>
      <link>https://erikbern.com/2013/08/17/306.html</link>
      <pubDate>Sat, 17 Aug 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/08/17/306.html</guid>
      <description>&lt;p&gt;I just answered a Quora question about &lt;a href=&#34;http://www.quora.com/Machine-Learning/What-if-any-are-the-differences-in-the-algorithms-that-are-behind-recommendations-for-music-and-movies/answer/Erik-Bernhardsson?__snids__=163790174&amp;amp;__nsrc__=1&#34;&gt;what, if any, are the differences in the algorithms that are behind recommendations for music and movies&lt;/a&gt;.&lt;/p&gt;&#xA;&lt;p&gt;Of course, every media type is different. For instance, there&amp;rsquo;s fundamental reasons why latent factor models works really well for music and movies, as opposed to &lt;a href=&#34;http://www.scribd.com/doc/86498718/Machine-Learning-with-Large-Networks-of-People-and-Places&#34;&gt;location recommendations&lt;/a&gt; where I suspect graph based models are more powerful. &lt;a href=&#34;http://www.stanford.edu/~rezab/papers/wtf_overview.pdf&#34;&gt;People recommendations&lt;/a&gt; is another animal and I&amp;rsquo;m sure &lt;a href=&#34;http://homepages.cae.wisc.edu/~jamieson/me/BeerMapper.html&#34;&gt;beer recommendations&lt;/a&gt; has its own domain-specific quirks.&lt;/p&gt;</description>
    </item>
    <item>
      <title>3D</title>
      <link>https://erikbern.com/2013/08/12/3d.html</link>
      <pubDate>Mon, 12 Aug 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/08/12/3d.html</guid>
      <description>&lt;p&gt;&lt;a href=&#34;http://www.a1k0n.net&#34;&gt;Andy Sloane&lt;/a&gt; decided to call my 2D visualization and &lt;a href=&#34;http://www.a1k0n.net/spotify/artist-viz/&#34;&gt;raise it to 3D&lt;/a&gt;.&lt;/p&gt;&#xA;&lt;p&gt;(Looks a little weird in the iframe but check out the link). It&amp;rsquo;s based on a &lt;a href=&#34;https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation&#34;&gt;LDA&lt;/a&gt; model with 200 topics, so the artists tend to stick to clusters where each cluster is a topic. The embedding also uses &lt;a href=&#34;http://homepage.tudelft.nl/19j49/t-SNE.html&#34;&gt;t-SNE&lt;/a&gt; but in three dimensions (obviously).&lt;/p&gt;</description>
    </item>
    <item>
      <title>2D embedding of 5k artists = WIN</title>
      <link>https://erikbern.com/2013/08/11/2d-embedding-of-5k-artists-win.html</link>
      <pubDate>Sun, 11 Aug 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/08/11/2d-embedding-of-5k-artists-win.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;m at &lt;a href=&#34;http://www.kdd.org/kdd2013/&#34;&gt;KDD&lt;/a&gt; in Chicago for a few days. We have a Spotify booth tomorrow, and I wanted to put together some cool graphics to show. I&amp;rsquo;ve been thinking about doing a 2D embedding of the top artists forever since I read about &lt;a href=&#34;http://homepage.tudelft.nl/19j49/t-SNE.html&#34;&gt;t-SNE&lt;/a&gt; and other papers so this was a perfect opportunity to spend some time on it.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Delivering Music Recommendations</title>
      <link>https://erikbern.com/2013/08/09/delivering-music-recommendations.html</link>
      <pubDate>Fri, 09 Aug 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/08/09/delivering-music-recommendations.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve turned into a lazy bastard and I&amp;rsquo;m just posting presentations on this blog, but here&amp;rsquo;s one from &lt;a href=&#34;http://rohanradio.com&#34;&gt;Rohan Singh&lt;/a&gt; at Spotify talking about the backend infrastructure of the &lt;a href=&#34;http://play.spotify.com/discover&#34;&gt;Discover&lt;/a&gt; page.&lt;/p&gt;</description>
    </item>
    <item>
      <title>ML&#43;Hadoop at NYC Predictive Analytics</title>
      <link>https://erikbern.com/2013/08/03/mlhadoop-at-nyc-predictive-analytics.html</link>
      <pubDate>Sat, 03 Aug 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/08/03/mlhadoop-at-nyc-predictive-analytics.html</guid>
      <description>&lt;p&gt;I was just at the &lt;a href=&#34;http://www.meetup.com/NYC-Predictive-Analytics/events/129778152/&#34;&gt;NYC Predictive Analytics meetup&lt;/a&gt; talking about how we build machine learning algorithms using Hadoop to power music recommendations.&lt;/p&gt;&#xA;&lt;p&gt;Great meetup, where we had two speakers, me and &lt;a href=&#34;http://www.metablake.com&#34;&gt;Blake Shaw&lt;/a&gt; from Foursquare. Blake talked about how they use machine learning at Foursquare, using Hadoop (and Luigi), and he uploaded his slides &lt;a href=&#34;https://www.dropbox.com/s/tn4f81is4p1a5ds/HadoopML.pdf&#34;&gt;here&lt;/a&gt;!&lt;/p&gt;</description>
    </item>
    <item>
      <title>HubSpot&#39;s Picture Shows how to Maintain Monocultures in the 21st Century</title>
      <link>https://erikbern.com/2013/07/28/hubspots-creepy-picture-shows-how-to-maintain-monocultures-in-the-21st-century.html</link>
      <pubDate>Sun, 28 Jul 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/07/28/hubspots-creepy-picture-shows-how-to-maintain-monocultures-in-the-21st-century.html</guid>
      <description>&lt;p&gt;I thought &lt;a href=&#34;http://www.businessinsider.com/hubspot-slidedeck-on-company-culture-2013-3&#34;&gt;this article&lt;/a&gt; about the company culture at HubSpot is kind of funny. “HubSpot&amp;rsquo;s Awesome Presentation Shows how to Create a 21st Century Culture”.&lt;/p&gt;&#xA;&lt;p&gt;&lt;img src=&#34;https://erikbern.com/assets/2013/07/Screen-Shot-2013-07-28-at-1.40.44-PM.png&#34; alt=&#34;image&#34;&gt;&lt;/p&gt;&#xA;&lt;p&gt;Just FYI: You&amp;rsquo;re not different. You&amp;rsquo;re a bunch of white hipsters aged 25-30 dressed up in the same theme. That&amp;rsquo;s not being different.&lt;/p&gt;</description>
    </item>
    <item>
      <title>More Luigi: Presentation from OSCON</title>
      <link>https://erikbern.com/2013/07/27/more-luigi-presentation-from-oscon.html</link>
      <pubDate>Sat, 27 Jul 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/07/27/more-luigi-presentation-from-oscon.html</guid>
      <description>&lt;p&gt;I was in Portland, OR for a few days hanging out at &lt;a href=&#34;https://www.oscon.com&#34;&gt;OSCON&lt;/a&gt;. Was fun. I also talked a bit about &lt;a href=&#34;https://github.com/spotify/luigi&#34;&gt;Luigi&lt;/a&gt;:&lt;/p&gt;&#xA;&lt;p&gt;Next week I&amp;rsquo;m presenting at the &lt;a href=&#34;http://www.meetup.com/NYC-Predictive-Analytics/events/129778152/&#34;&gt;NYC Predictive Analytics meetup&lt;/a&gt; together with Blake Shaw from Foursquare. The topic is ML + Hadoop. Will be fun!&lt;/p&gt;</description>
    </item>
    <item>
      <title>Optimizing over multinomial distributions</title>
      <link>https://erikbern.com/2013/07/24/normalizing-multinomial-distributions.html</link>
      <pubDate>Wed, 24 Jul 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/07/24/normalizing-multinomial-distributions.html</guid>
      <description>&lt;p&gt;Sometimes you have to maximize some function  $$ f(w_1, w_2, ldots, w_n) $$ where  $$ w_1 + w_2 + ldots + w_n = 1 $$ and $$ 0 le w_i le 1  $$ . Usually,  $$ f $$ is concave and differentiable, so there&amp;rsquo;s one unique global maximum and you can solve it by applying &lt;a href=&#34;http://en.wikipedia.org/wiki/Gradient_descent&#34;&gt;gradient ascent&lt;/a&gt;. The presence of the constraint makes it a little tricky, but we can solve it using the method of &lt;a href=&#34;http://en.wikipedia.org/wiki/Lagrange_multiplier&#34;&gt;Lagrange multipliers&lt;/a&gt;. In particular, since the surface  $$ w_1 + w_2 + ldots + w_n $$ has the normal $$ (1, 1, ldots, 1) $$ , the following optimization procedure works:&lt;/p&gt;</description>
    </item>
    <item>
      <title>More Luigi!</title>
      <link>https://erikbern.com/2013/06/26/more-luigi.html</link>
      <pubDate>Wed, 26 Jun 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/06/26/more-luigi.html</guid>
      <description>&lt;p&gt;Continuing in the same spirit of shameless self-promotion, here&amp;rsquo;s some recent &lt;a href=&#34;https://github.com/spotify/luigi&#34;&gt;Luigi&lt;/a&gt; press:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;a href=&#34;http://www.reddit.com/r/Python/comments/1h1won/luigi_is_a_python_module_that_helps_you_build/&#34;&gt;Reddit thread&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;a href=&#34;https://qconnewyork.com/sites/default/files/QConNY2013_UriLaserson_Python_Hadoop.pdf&#34;&gt;A Guide to Python Frameworks for Hadoop&lt;/a&gt; (slides from the &lt;a href=&#34;http://www.meetup.com/Hadoop-NYC/events/118226212/&#34;&gt;NYC Hadoop User Group&lt;/a&gt;)&lt;/li&gt;&#xA;&lt;li&gt;This presentation from the &lt;a href=&#34;http://www.meetup.com/Open-Analytics-NYC/&#34;&gt;Open Analytics NYC meetup&lt;/a&gt; about how Foursquare uses Luigi&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt; &lt;/p&gt;</description>
    </item>
    <item>
      <title>hdfs2cass</title>
      <link>https://erikbern.com/2013/06/19/hdfs2cass.html</link>
      <pubDate>Wed, 19 Jun 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/06/19/hdfs2cass.html</guid>
      <description>&lt;p&gt;Just open sourced &lt;a href=&#34;https://github.com/spotify/hdfs2cass&#34;&gt;hdfs2cass&lt;/a&gt; which is a Hadoop job (written in Java) to do efficient Cassandra bulkloading. The nice thing is that it queries Cassandra for its topology and uses that to partition the data so that each reducer can upload data directly to a Cassandra node. It also builds SSTables locally etc. Not an expert at Cassandra so I&amp;rsquo;ll stop describing those parts before I embarrass myself.&lt;/p&gt;</description>
    </item>
    <item>
      <title>NoDoc</title>
      <link>https://erikbern.com/2013/06/16/nodoc.html</link>
      <pubDate>Sun, 16 Jun 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/06/16/nodoc.html</guid>
      <description>&lt;p&gt;We had an &lt;a href=&#34;http://en.wikipedia.org/wiki/Unconference&#34;&gt;unconference&lt;/a&gt; at &lt;a href=&#34;http://spotify.com/&#34;&gt;Spotify&lt;/a&gt; last Thursday and I added a semi-trolling semi-serious topic about abolishing documentation. Or &lt;em&gt;NoDoc&lt;/em&gt;, as I&amp;rsquo;m going to call this movement. This was meant to be mostly a thought experiment, but I don&amp;rsquo;t see it as complete madness.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Wikiphilia</title>
      <link>https://erikbern.com/2013/06/02/wikiphilia.html</link>
      <pubDate>Sun, 02 Jun 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/06/02/wikiphilia.html</guid>
      <description>&lt;p&gt;I&amp;rsquo;ve been obsessed with Wikipedia for the past ten years. Occasionally I find some good articles worth sharing and that&amp;rsquo;s why I created the &lt;a href=&#34;http://twitter.com/wikiphilia&#34;&gt;wikiphilia&lt;/a&gt; Twitter handle. Just a long stream of stuff that for one reason or another may be interesting.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Spotify&#39;s Discovery page</title>
      <link>https://erikbern.com/2013/05/31/spotifys-discovery-page.html</link>
      <pubDate>Fri, 31 May 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/05/31/spotifys-discovery-page.html</guid>
      <description>&lt;p&gt;The Discovery page, the new start page in Spotify, is finally out to a fairly significant percentage of all users. Really happy since we have worked on it for the past six months. Here&amp;rsquo;s a screen shot:&lt;/p&gt;</description>
    </item>
    <item>
      <title>Fermat&#39;s principle</title>
      <link>https://erikbern.com/2013/05/21/fermats-principle.html</link>
      <pubDate>Tue, 21 May 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/05/21/fermats-principle.html</guid>
      <description>&lt;p&gt;I was browsing around on the Internet and the physics geek in me started reading about &lt;a href=&#34;http://en.wikipedia.org/wiki/Fermat&#39;s_principle&#34;&gt;Fermat&amp;rsquo;s principle&lt;/a&gt;. And suddenly something came back to me that I&amp;rsquo;ve been trying to suppress for many years – how I never understood why there&amp;rsquo;s anything fundamental about the &lt;strong&gt;principal of least time.&lt;/strong&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <title>Snakebite</title>
      <link>https://erikbern.com/2013/05/07/snakebite.html</link>
      <pubDate>Tue, 07 May 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/05/07/snakebite.html</guid>
      <description>&lt;p&gt;Just promoting Spotify stuff here: check out the &lt;a href=&#34;https://github.com/spotify/snakebite&#34;&gt;Snakebite&lt;/a&gt; repo on Github, written by Wouter de Bie. It&amp;rsquo;s a super fast tool to access HDFS over CLI/Python, by accessing the namenode directly over sockets/protobuf.&lt;/p&gt;&#xA;&lt;p&gt;Spotify&amp;rsquo;s developer blog features a &lt;a href=&#34;http://labs.spotify.com/2013/05/07/snakebite/&#34;&gt;nice blog&lt;/a&gt; post outlining what it&amp;rsquo;s useful for. I think this kicks ass and there will definitely be some kind of &lt;a href=&#34;https://github.com/spotify/luigi&#34;&gt;Luigi&lt;/a&gt; integration coming up at some point&lt;/p&gt;</description>
    </item>
    <item>
      <title>Stuff that bothers me: &amp;#8220;100x faster than Hadoop&amp;#8221;</title>
      <link>https://erikbern.com/2013/04/27/stuff-that-bothers-me-100x-faster-than-hadoop.html</link>
      <pubDate>Sat, 27 Apr 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/04/27/stuff-that-bothers-me-100x-faster-than-hadoop.html</guid>
      <description>&lt;p&gt;The simple way to get featured on big data blog these days seem to be&lt;/p&gt;&#xA;&lt;ol&gt;&#xA;&lt;li&gt;Build something that does 1 thing super well but nothing else&lt;/li&gt;&#xA;&lt;li&gt;Benchmark it against Hadoop&lt;/li&gt;&#xA;&lt;li&gt;Publish stats showing that it&amp;rsquo;s 100x faster than Hadoop&lt;/li&gt;&#xA;&lt;li&gt;$$$&lt;/li&gt;&#xA;&lt;/ol&gt;&#xA;&lt;p&gt;Spark claims their &lt;a href=&#34;http://spark-project.org/&#34;&gt;100x faster than Hadoop&lt;/a&gt; and there&amp;rsquo;s a lot of stats showing &lt;a href=&#34;http://www.hapyrus.com/blog/posts/behind-amazon-redshift-is-10x-faster-and-cheaper-than-hadoop-hive-slides&#34;&gt;Redshift is 10x faster than Hadoop&lt;/a&gt;. There&amp;rsquo;s a bunch of papers with &lt;a href=&#34;http://database.cs.brown.edu/sigmod09/benchmarks-sigmod09.pdf&#34;&gt;similar claims&lt;/a&gt;. I spent five minutes Googling “Xx faster than Hadoop” and found a ton of &lt;a href=&#34;http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-time-queries-in-apache-hadoop-for-real/&#34;&gt;other stats&lt;/a&gt;.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Presentation about Luigi</title>
      <link>https://erikbern.com/2013/04/26/presentation-about-luigi.html</link>
      <pubDate>Fri, 26 Apr 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/04/26/presentation-about-luigi.html</guid>
      <description>&lt;p&gt;I like the editing!&lt;/p&gt;</description>
    </item>
    <item>
      <title>Being data driven</title>
      <link>https://erikbern.com/2013/04/13/being-data-driven.html</link>
      <pubDate>Sat, 13 Apr 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/04/13/being-data-driven.html</guid>
      <description>&lt;p&gt;I picked up an issue of &lt;em&gt;Foreign Affairs&lt;/em&gt; while flying back to NYC from SFO. It features &lt;a href=&#34;http://www.foreignaffairs.com/discussions/interviews/generation-kill&#34;&gt;this long interview with U.S. General Stanley McChrystal&lt;/a&gt; and I thought it was pretty interesting how striking some of the similarities are between fighting in a war and developing software.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Annoy</title>
      <link>https://erikbern.com/2013/04/12/annoy.html</link>
      <pubDate>Fri, 12 Apr 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/04/12/annoy.html</guid>
      <description>&lt;p&gt;&lt;a href=&#34;https://github.com/spotify/annoy&#34;&gt;Annoy&lt;/a&gt; is a simple package to find approximate nearest neighbors (ANN) that I just put on Github. I&amp;rsquo;m not trying to compete with existing packages, but Annoy has a couple of features that makes it pretty useful. Most importantly, it uses very little memory and can put everything in a contiguous blob that you can mmap from disk. This way multiple processes can share the same index.&lt;/p&gt;</description>
    </item>
    <item>
      <title>More Luigi!</title>
      <link>https://erikbern.com/2013/03/22/more-luigi-pres.html</link>
      <pubDate>Fri, 22 Mar 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/03/22/more-luigi-pres.html</guid>
      <description>&lt;p&gt;Elias Freider just talked about Luigi at PyData 2013:&lt;/p&gt;&#xA;&lt;div style=&#34;margin-bottom: 5px;&#34;&gt;&#xA;  The presentation above is much better than one I put together a few weeks ago. In case anyone is interested I&#39;ll include it too:&#xA;&lt;/div&gt;</description>
    </item>
    <item>
      <title>ML at Twitter</title>
      <link>https://erikbern.com/2013/02/27/ml-at-twitter.html</link>
      <pubDate>Wed, 27 Feb 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/02/27/ml-at-twitter.html</guid>
      <description>&lt;p&gt;I recently came across &lt;a href=&#34;http://www.umiacs.umd.edu/~jimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf&#34;&gt;this paper describing how they do ML at Twitter&lt;/a&gt;.&lt;/p&gt;&#xA;&lt;p&gt;TL;DR Their approach is pretty interesting. Everything is a &lt;a href=&#34;http://pig.apache.org/&#34;&gt;Pig&lt;/a&gt; workflow and then they do everything as &lt;a href=&#34;http://pig.apache.org/docs/r0.9.1/udf.html&#34;&gt;UDF&amp;rsquo;s&lt;/a&gt;.&lt;/p&gt;&#xA;&lt;p&gt;This approach seems pretty interesting. As long as your data can be expressed as small atomic machine learning functions, I&amp;rsquo;m sure it works great. But there&amp;rsquo;s so much more than that. All small slicing, transforming etc is so much easier to express in a language like Python. I&amp;rsquo;m still not really comfortable with Pig as a language to power these data flows.&lt;/p&gt;</description>
    </item>
    <item>
      <title>I&#39;m featured in Mashable</title>
      <link>https://erikbern.com/2013/02/06/im-featured-in-mashable.html</link>
      <pubDate>Wed, 06 Feb 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/02/06/im-featured-in-mashable.html</guid>
      <description>&lt;p&gt;&lt;a href=&#34;http://mashable.com/2013/02/05/10-awesome-stem-jobs/&#34;&gt;This article&lt;/a&gt; from today in Mashable describes some of the fun stuff I get to work with:&lt;/p&gt;&#xA;&lt;p&gt;&lt;em&gt;&lt;a href=&#34;http://www.linkedin.com/profile/view?id=12890189&amp;locale=en_US&amp;trk=tyah&#34; target=&#34;_blank&#34;&gt;Erik Bernhardsson&lt;/a&gt; is technical lead at Spotify, where he helped to build a music recommendation system based on large-scale machine learning algorithms, mainly matrix factorization of big matrices using &lt;a href=&#34;http://hadoop.apache.org/&#34; target=&#34;_blank&#34;&gt;Hadoop&lt;/a&gt;. He moved into this role after heading the Business Intelligence team, where he collected, aggregated and made sense of all the data at Spotify, whether that&amp;rsquo;s ad-hoc insights, A/B testing, visualization or ad optimization.&lt;/em&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <title>Slides from NYC Machine Learning talk</title>
      <link>https://erikbern.com/2013/01/27/slides-from-nyc-machine-learning-talk.html</link>
      <pubDate>Sun, 27 Jan 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/01/27/slides-from-nyc-machine-learning-talk.html</guid>
      <description>&lt;p&gt;Slides from the talk. Slightly edited because (a) some of the slides make little sense taken out of context (b) Slideshare seem to have problem converting some of the stuff.&lt;/p&gt;&#xA;&lt;div style=&#34;margin-bottom: 5px;&#34;&gt;&#xA;  &lt;strong&gt; &lt;a title=&#34;Collaborative filtering at Spotify&#34; href=&#34;http://www.slideshare.net/erikbern/collaborative-filtering-at-spotify-16182818&#34; target=&#34;_blank&#34;&gt;Collaborative filtering at Spotify&lt;/a&gt; &lt;/strong&gt; from &lt;strong&gt;&lt;a href=&#34;http://www.slideshare.net/erikbern&#34; target=&#34;_blank&#34;&gt;Erik Bernhardsson&lt;/a&gt;&lt;/strong&gt;&#xA;&lt;/div&gt;</description>
    </item>
    <item>
      <title>NYC Machine Learning meetup</title>
      <link>https://erikbern.com/2013/01/22/nyc-machine-learning-meetup.html</link>
      <pubDate>Tue, 22 Jan 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/01/22/nyc-machine-learning-meetup.html</guid>
      <description>&lt;p&gt;From the &lt;a href=&#34;http://www.meetup.com/NYC-Machine-Learning/&#34;&gt;NYC Machine Learning&lt;/a&gt; talk I had last week:&lt;/p&gt;&#xA;&lt;p&gt;Haven&amp;rsquo;t looked at it yet except briefly. Unfortunately the quality isn&amp;rsquo;t the best.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Momentum and mean reversion might just be volatility bias</title>
      <link>https://erikbern.com/2013/01/13/momentum-and-mean-reversion-might-just-be-volatility-bias.html</link>
      <pubDate>Sun, 13 Jan 2013 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2013/01/13/momentum-and-mean-reversion-might-just-be-volatility-bias.html</guid>
      <description>&lt;p&gt;The Economist just published an article called &lt;a href=&#34;http://www.economist.com/news/finance-and-economics/21569397-art-picking-mutual-funds-best-worst-and-ugly&#34;&gt;The best, the worst and the ugly&lt;/a&gt;. By looking at historical performance for mutual funds, they find strong support for momentum and mean reversion. Picking the &lt;em&gt;best&lt;/em&gt; or the &lt;em&gt;worst&lt;/em&gt; fund over the previous five years gives great returns over the next five years.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Calculating cosine similarities using dimensionality reduction</title>
      <link>https://erikbern.com/2012/12/05/calculating-cosine-similarities-using-dimensionality-reduction.html</link>
      <pubDate>Wed, 05 Dec 2012 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2012/12/05/calculating-cosine-similarities-using-dimensionality-reduction.html</guid>
      <description>&lt;p&gt;This was posted on the Twitter Engineering blog a few days ago: &lt;a href=&#34;http://engineering.twitter.com/2012/11/dimension-independent-similarity.html&#34;&gt;Dimension Independent Similarity Computation (DISCO)&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;I just glanced at the paper, and there&amp;rsquo;s some cool stuff going on from a theoretical perspective. What I&amp;rsquo;m curious about is why they didn&amp;rsquo;t decide to use dimensionality reduction to solve such a big problem. The benefit of this approach is that it scales much better (linear in input data size) and produces much better results. The drawback is that it&amp;rsquo;s much harder to implement.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Tumblr&#39;s awesome project names</title>
      <link>https://erikbern.com/2012/11/18/tumblrs-awesome-project-names.html</link>
      <pubDate>Sun, 18 Nov 2012 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2012/11/18/tumblrs-awesome-project-names.html</guid>
      <description>&lt;p&gt;&lt;img src=&#34;https://erikbern.com/assets/2012/11/ad_2_13_7.jpg&#34; alt=&#34;image&#34;&gt;&lt;/p&gt;&#xA;&lt;p&gt;Not sure how I managed to miss this, but I&amp;rsquo;m watching this &lt;a href=&#34;http://www.infoq.com/presentations/Concurrency-Tumblr&#34;&gt;Tumblr presentation&lt;/a&gt; and they talk about their projects named after &lt;a href=&#34;http://en.wikipedia.org/wiki/Arrested_Development_(TV_series)&#34;&gt;Arrested Development&lt;/a&gt; topics: Gob, Parmesan, Buster, &lt;a href=&#34;https://github.com/tumblr/jetpants&#34;&gt;Jetpants&lt;/a&gt;, Oscar, George and Motherboy.&lt;/p&gt;&#xA;&lt;p&gt;Still, the best software project name is probably still Apple&amp;rsquo;s &lt;a href=&#34;http://en.wikipedia.org/wiki/Apple_Inc._litigation#Libel_dispute_with_Carl_Sagan&#34;&gt;BHA&lt;/a&gt;.&lt;/p&gt;</description>
    </item>
    <item>
      <title>A neat little trick with time decay</title>
      <link>https://erikbern.com/2012/10/29/a-neat-little-trick-with-time-decay.html</link>
      <pubDate>Mon, 29 Oct 2012 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2012/10/29/a-neat-little-trick-with-time-decay.html</guid>
      <description>&lt;p&gt;Something that pops up pretty frequently is to implement time decay, especially where you have recursive chains of jobs. For instance, say you want to keep track of a popularity score. You calculate today&amp;rsquo;s output by reading yesterday&amp;rsquo;s output, discounting it by  $$ exp(-lambda Delta T) $$ and then adding some hit count for today. Typically you choose  $$ lambda $$ so that  $$ exp(-lambda Delta T) = 0.95 $$ for a day or something like that. We do this to generate popularity scores for every track at Spotify.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Luigi: complex pipelines of tasks in Python</title>
      <link>https://erikbern.com/2012/10/21/luigi-build-complex-pipelines-of-tasks.html</link>
      <pubDate>Sun, 21 Oct 2012 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/2012/10/21/luigi-build-complex-pipelines-of-tasks.html</guid>
      <description>&lt;p&gt;&lt;a href=&#34;https://github.com/spotify/luigi&#34;&gt;&lt;img src=&#34;https://erikbern.com/assets/luigi.png&#34; alt=&#34;&#34;&gt;&lt;/a&gt;&lt;/p&gt;&#xA;&lt;p&gt;I&amp;rsquo;m shamelessly promoting my first major open source project. Luigi is a Python module that helps you build complex pipelines of batch jobs, handle dependency resolution, and create visualizations to help manage multiple workflows. It also comes with &lt;a href=&#34;http://hadoop.apache.org/&#34;&gt;Hadoop&lt;/a&gt; support built in (because that&amp;rsquo;s where really where its strength becomes clear).&lt;/p&gt;</description>
    </item>
    <item>
      <title>About</title>
      <link>https://erikbern.com/about.html</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/about.html</guid>
      <description></description>
    </item>
    <item>
      <title>Domains for sale</title>
      <link>https://erikbern.com/domains.html</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/domains.html</guid>
      <description>&lt;p&gt;Contact me at mail at erik bern dot com!&lt;/p&gt;</description>
    </item>
    <item>
      <title>Home</title>
      <link>https://erikbern.com/resume-with-contact-information.html</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/resume-with-contact-information.html</guid>
      <description></description>
    </item>
    <item>
      <title>Home</title>
      <link>https://erikbern.com/resume.html</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/resume.html</guid>
      <description></description>
    </item>
    <item>
      <title>Open source</title>
      <link>https://erikbern.com/open-source.html</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/open-source.html</guid>
      <description></description>
    </item>
    <item>
      <title>Top posts</title>
      <link>https://erikbern.com/top-posts.html</link>
      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
      <guid>https://erikbern.com/top-posts.html</guid>
      <description>&lt;p&gt;These are some blog posts which have gotten a disproportionate amount of traffic (10,000+ page views):&lt;/p&gt;&#xA;&lt;h1 id=&#34;2024&#34;&gt;2024&lt;/h1&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;a href=&#34;https://erikbern.com/2024/09/27/its-hard-to-write-code-for-humans.html&#34;&gt;It&amp;rsquo;s hard to write code for computers, but it&amp;rsquo;s even harder to write code for humans&lt;/a&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h1 id=&#34;2023&#34;&gt;2023&lt;/h1&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;a href=&#34;https://erikbern.com/2023/12/13/simple-sabotage-for-software.html&#34;&gt;Simple sabotage for software&lt;/a&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h1 id=&#34;2022&#34;&gt;2022&lt;/h1&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;a href=&#34;https://erikbern.com/2022/10/19/we-are-still-early-with-the-cloud.html&#34;&gt;We are still early with the cloud: why software development is overdue for a change&lt;/a&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h1 id=&#34;2021&#34;&gt;2021&lt;/h1&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;a href=&#34;https://erikbern.com/2021/11/30/storm-in-the-stratosphere-how-the-cloud-will-be-reshuffled.html&#34;&gt;Storm in the stratosphere: how the cloud will be reshuffled&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;a href=&#34;https://erikbern.com/2021/07/07/the-data-team-a-short-story.html&#34;&gt;Building a data team at a mid-stage startup: a short story&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;a href=&#34;https://erikbern.com/2021/04/19/software-infrastructure-2.0-a-wishlist.html&#34;&gt;Software infrastructure 2.0: a wishlist&lt;/a&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h1 id=&#34;2020&#34;&gt;2020&lt;/h1&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;a href=&#34;https://erikbern.com/2020/01/13/how-to-hire-smarter-than-the-market-a-toy-model.html&#34;&gt;How to hire smarter than the market: a toy model&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;a href=&#34;https://erikbern.com/2020/03/10/never-attribute-to-stupidity-that-which-is-adequately-explained-by-opportunity-cost.html&#34;&gt;Never attribute to stupidity that which is adequately explained by opportunity cost&lt;/a&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h1 id=&#34;2019&#34;&gt;2019&lt;/h1&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;a href=&#34;https://erikbern.com/2019/04/15/why-software-projects-take-longer-than-you-think-a-statistical-model.html&#34;&gt;Why software projects take longer than you think: a statistical model&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;a href=&#34;https://erikbern.com/2019/10/16/buffet-lines-are-terrible.html&#34;&gt;Buffet lines are terrible, but let&amp;rsquo;s try to improve them using computer simulations&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;a href=&#34;https://erikbern.com/2019/09/26/misc-unsolicited-career-advice.html&#34;&gt;Miscellaneous unsolicited (and possibly biased) career advice&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;a href=&#34;https://erikbern.com/2019/02/21/headcount-targets-feature-factories-and-when-to-hire-those-mythical-10x-people.html&#34;&gt;Headcount goals, feature factories, and when to hire those mythical 10x people&lt;/a&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h1 id=&#34;2018&#34;&gt;2018&lt;/h1&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;a href=&#34;https://erikbern.com/2018/08/30/i-dont-want-to-learn-your-garbage-query-language.html&#34;&gt;I don&amp;rsquo;t want to learn your garbage query language&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;a href=&#34;https://erikbern.com/2018/10/08/the-hackers-guide-to-uncertainty-estimates.html&#34;&gt;The hacker&amp;rsquo;s guide to uncertainty estimates&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;a href=&#34;https://erikbern.com/2018/05/02/interviewing-is-a-noisy-prediction-problem.html&#34;&gt;Interviewing is a noisy prediction problem&lt;/a&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h1 id=&#34;2017&#34;&gt;2017&lt;/h1&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;a href=&#34;https://erikbern.com/2017/03/15/the-eigenvector-of-why-we-moved-from-language-x-to-language-y.html&#34;&gt;The eigenvector of &amp;ldquo;Why we moved from language X to language Y&amp;rdquo;&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;a href=&#34;https://erikbern.com/2017/02/01/language-pitch.html&#34;&gt;Language pitch&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;a href=&#34;https://erikbern.com/2017/05/23/conversion-rates-you-are-most-likely-computing-them-wrong.html&#34;&gt;Conversion rates – you are (most likely) computing them wrong&lt;/a&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h1 id=&#34;2016&#34;&gt;2016&lt;/h1&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;a href=&#34;https://erikbern.com/2016/01/21/analyzing-50k-fonts-using-deep-neural-networks.html&#34;&gt;Analyzing 50k fonts using deep neural networks&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;a href=&#34;https://erikbern.com/2016/04/04/nyc-subway-math.html&#34;&gt;NYC subway math&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;a href=&#34;https://erikbern.com/2016/12/05/the-half-life-of-code.html&#34;&gt;The half-life of code &amp;amp; the ship of Theseus&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;a href=&#34;https://erikbern.com/2016/03/16/exploding-offers-are-bullshit.html&#34;&gt;Exploding offers are bullshit&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;a href=&#34;https://erikbern.com/2016/01/08/i-believe-in-the-10x-engineer-but.html&#34;&gt;I believe in the 10x engineer, but&amp;hellip;&lt;/a&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h1 id=&#34;2014&#34;&gt;2014&lt;/h1&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;a href=&#34;https://erikbern.com/2014/11/29/deep-learning-for-chess.html&#34;&gt;Deep learning for chess&lt;/a&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;bonus&#34;&gt;&amp;hellip;Bonus&lt;/h2&gt;&#xA;&lt;p&gt;The post &lt;a href=&#34;https://erikbern.com/2016/08/05/when-machine-learning-matters.html&#34;&gt;When machine learning matters&lt;/a&gt; didn&amp;rsquo;t get a lot of &lt;em&gt;web traffic&lt;/em&gt;, but &lt;a href=&#34;https://erikbern.com/2017/08/19/machine-platform-crowd.html&#34;&gt;it was mentioned&lt;/a&gt; in the 2017 book &lt;a href=&#34;https://www.goodreads.com/book/show/38212111-machine-platform-crowd&#34;&gt;Machine, Wisdom Crowd&lt;/a&gt; by Andrew McAfee and Erik Brynjolfsson.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
