Erik Bernhardsson    About

Plotting author statistics for Git repos using Git of Theseus

I spent a few days during the holidays fixing up a bunch of semi-dormant open source projects and I have a couple of blog posts in the pipeline about various updates. First up, I made a number of fixes to Git of Theseus which is a tool (written in Python) that generates statistics about Git repositories. I’ve written about it previously on this blog. The name is a horrible pun (I’m a dad!) on Ship of Theseus which is a philosophical thought experiment about what happens if you replace every single part of a boat — is it still the same boat ⁉️ 🤔

So anyway, here’s one of the plots you can generate for Kubernetes — a somewhat arbitrarily picked repository.

k8s git

So what’s new? I’ve updated the color scheme a bit, but also added the option to plot author statistics:

k8s git

And it doesn’t stop there! Here are some other minor updates:

  • I published the whole thing to PyPI which also means that the installation is far simpler: just run pip install git-of-theseus.
  • The pip package also installs binaries that lets you run the analyses in a more straightforward way: just run git-of-theseus-analyze on the command line.
  • By default it now only analyzes file types of certain extensions that indicate source code (by leveraging pygments)
  • You can also normalize stats using the --normalize flag. See plot below:

git git

That’s it! As I mentioned I got more where this came from. Some future blog posts will cover:

  • ann-benchmarks which is a tool to benchmark approximate nearest neighbor methods. Very niche, but very useful within its niche. I just spent a lot of time precomputing datasets and Dockerizing all algorithms.
  • convoys a new tool I built to model and plot time-lagged conversion. Fun stuff with Gamma and Weibull distributions.
  • champy which is a halfway implementation wrapper that lets you formulate and solve linear programming, mixed integer programming, and constraint programming problems in a much nicer way (IMO) than any other library I’ve encountered. Don’t hold your breath for this one — it’s pretty far from being production-grade.

EDIT(2018-01-016): added a few more notes

Want to get blog posts over email?

Enter in your email address and get weekly emails with new articles!

Erik Bernhardsson

... is the CTO at Better, which is a startup changing how mortgages are done. I write a lot of code, some of which ends up being open sourced, such as Luigi and Annoy. I also co-organize NYC Machine Learning meetup. You can follow me on Twitter or see some more facts about me.