Erik Bernhardsson    About

Pinterest open sources Pinball

image

Pinterest just open sourced Pinball which seems like an interesting Luigi alternative. There’s two blog posts: Pinball: Building workflow management (from 2014) and Open-sourcing Pinball (from this week). The author has a comment in the comments thread on Hacker News:

Luigi was not available in public, when Pinball starts. So not sure the pros and cons between Pinball and Luigi.

When we build pinball, we aim to build a scalable and flexible workflow manager to satisfy the the following requirements (I just name a few here).

  1. easy system upgrade – when we fix bug or adding new features, there should be no interruption for current running workflow and jobs.
  2. easy add/test workflow – end user can easily add new jobs and workflows into pinball system, without affecting other running jobs and workflows.
  3. extensibility – a workflow manager should be easy to extended. As the company and business grows, there will be a lot new requirements and features needed. And also we love your contributions as well.
  4. flexible workflow scheduling policy, easy failure handling.
  5. We provide rich UI for you to easily manage your workflows – auto retry failed job, – you can retry failed job, can skip some job, can select a subset of jobs of a workflow to run (all from UI) – you can easily access all the running history of your job, and also get the stderr, stdout logs of your jobs – you can also explore the topology of your workflow, and also support easy search.
  6. Pinball is very generic can support different kind platform, you can use different hadoop clusters,e.g., quoble cluster, emr cluster. You can write different kind of jobs, e.g., hadoop streaming, cascading, hive, pig, spark, python …

There are a lot interesting things built in Pinball, and you probably want to have a try!

Sounds pretty similar to Luigi! My initial impression is that

  • The architecture is a bit more advanced than Luigi and has some features that Luigi lacks. From what I can tell, it comes with task storage out of the box (whereas Luigi’s task history DB is still not entirely integrated), distributed execution, and a triggering mechanism. These are all areas where Luigi still needs some love
  • The workflow API seems very convoluted. I don’t really understand how the code works and there’s a lot of boiler plate.

Fun to have something to compare to. Not that I want to rationalize Luigi’s missing features, but in general I would argue that the importance of good API design is underrated compared to good architecture. I still believe the key thing for a workflow manager is to reduce boiler plate and configuration at any point. It’s slightly harder to create an easy to use API than to think hard about architecture and check all the boxes for every feature.

image

Hopefully we’ll see more of these in the future. Obviously being Luigi’s author, I think Luigi is an awesome tool. But I think it’s 10% of what it could be, and diversity in this space is great for innovation. There’s a lot of them now: Oozie, Azkaban, Drake, Pinball, etc. Some people apparently use Jenkins for workflow management. A wildcard I encountered the other day is Ketrew. I wish I knew enough OCaml to understand what’s going on!

Erik Bernhardsson

... is the CTO at Better, which is a startup changing how mortgages are done. I write a lot of code, some of which ends up being open sourced, such as Luigi and Annoy. I also co-organize NYC Machine Learning meetup. You can follow me on Twitter or see some more facts about me.