Plotting author statistics for Git repos using Git of Theseus

I spent a few days during the holidays fixing up a bunch of semi-dormant open source projects and I have a couple of blog posts in the pipeline about various updates. First up, I made a number of fixes to Git of Theseus which is a tool (written in Python) that generates statistics about Git repositories. I've written about it previously on this blog. The name is a horrible pun (I'm a dad!) on Ship of Theseus which is a philosophical thought experiment about what happens if you replace every single part of a boat — is it still the same boat ⁉️ 🤔

So anyway, here's one of the plots you can generate for Kubernetes — a somewhat arbitrarily picked repository.

k8s git

So what's new? I've updated the color scheme a bit, but also added the option to plot author statistics:

k8s git

And it doesn't stop there! Here are some other minor updates:

  • I published the whole thing to PyPI which also means that the installation is far simpler: just run pip install git-of-theseus.
  • The pip package also installs binaries that lets you run the analyses in a more straightforward way: just run git-of-theseus-analyze on the command line.
  • By default it now only analyzes file types of certain extensions that indicate source code (by leveraging pygments)
  • You can also normalize stats using the --normalize flag. See plot below:

git git

That's it! As I mentioned I got more where this came from. Some future blog posts will cover:

  • ann-benchmarks which is a tool to benchmark approximate nearest neighbor methods. Very niche, but very useful within its niche. I just spent a lot of time precomputing datasets and Dockerizing all algorithms.
  • convoys a new tool I built to model and plot time-lagged conversion. Fun stuff with Gamma and Weibull distributions.
  • champy which is a halfway implementation wrapper that lets you formulate and solve linear programming, mixed integer programming, and constraint programming problems in a much nicer way (IMO) than any other library I've encountered. Don't hold your breath for this one — it's pretty far from being production-grade.

EDIT(2018-01-16): added a few more notes

Tagged with: software