Erik Bernhardsson    About

More Luigi alternatives

The workflow engine battle has intensified with some more interesting entries lately! Here are a couple I encountered in the last few days. I love that at least two of them are direct references to Luigi!

Airflow (Blog Post) (GitHub)

Airflow from Airbnb is probably the most interesting one. I’ve only glanced at it, but here are some very superficial notes

  • Seems mostly targeted to run raw UNIX commands using a pretty simple syntax with Jinja templates
  • Tasks don’t have support for parameters but it seems like you can build tasks dynamically by just putting them in a function
  • It seems to be built around daily jobs, meaning dates and backfill by dates is a foundational concept of the package (whereas in Luigi dates are just one out of many different parameters)
  • There is a database of task history which is great
  • The visualization seems great
  • It also supports farming out jobs to other machines. This is something Luigi definitely needs
  • It also comes with built-in support for HDFS, S3, MySQL and Postgres, similar to Luigi
  • There’s a built-in triggering mechanism (also something Luigi needs)

imageScreen shot of Airflow (from the blog post)

**Mario **(Blog post) (GitHub)

Mario seems to be a “Luigi in Scala”. It seems extremely simplistic, so it’s probably more conceptual at this point than meant to be full-fledged.

The choice of Scala is interesting. First of all, I think it’s fair to say that the JVM has taken over the data stack. A lot of Luigi’s core concept are really functional in nature and a language like Scala might be a better choice. A cool thing is that the RPC between Luigi clients and the scheduler is actually just a simple REST interface and Luigi’s scheduler could in fact support clients running other languages. Mario doesn’t do this but it’s something I’ve been meaning to explore for a long time.

**Ruigi **(GitHub)

Ruigi is “Luigi in R”. It follows the same set of conventions but seems to be pretty simple in that it runs everything locally.

**Makeflow **(Web site)

Seems to be some academic project mostly for publishing papers. What’s up with not using GitHub in 2015? And also having a “download” section with tarballs!

The benefit of Makeflow seems to be support for a bunch of batch systems commonly used in HPC in academia. The dependencies are specified using their own DSL with some simple Makefile-like notation.

Conclusions

So what’s the state of workflow engines at the moment? Allow me to say something provocative just for the sake of it: they all kind of suck. Including Luigi. There is basically so much trial and error when building a workflow engine and everytime I encounter one I just see a bunch of bad design decisions. Same when I look at Luigi. Luigi was the result of a bunch of many iterations and it avoids roughly 100 pitfalls we encountered in earlier attempts, but there are still some parts of the design that can be addressed.

I don’t mean to bash my own open source project here. Open sourcing Luigi has helped a lot of people building awesome stuff and I think it’s better than anything else out there.

I hope with the explosion of workflow engines lately, we will see a convergence of ideas to a second generation of much better, much more scalable, and much easier ones. My dream is that someone combines every god damned workflow engine in the world and write a new one, preferably in some JVM based language like Scala. I sincerely have no idea how that would look like, but I would love to do that some day as a Luigi 2.0!

imageMe and one of my cats (who is not too happy). Long story!

Erik Bernhardsson

... is the CTO at Better, which is a startup changing how mortgages are done. I write a lot of code, some of which ends up being open sourced, such as Luigi and Annoy. I also co-organize NYC Machine Learning meetup. You can follow me on Twitter or see some more facts about me.