Git and hg


John Goerzen recently posted about Git, Mercurial and Bzr that I found interesting, especially since I used to be in the hg camp, but have been gradually using git more and more, even to the point making minor improvements to the git documentation and writing the git mergetool, since being able to automatically fire up a graphical merge tool was one of the features which I missed from hg. So while I haven’t yet converted the primary SCM repository for e2fsprogs to use git (yet), I’ve reached an opposite concolusion from John, and yet, I can’t really argue with his observations.

The main reason why I’ve come out in favor of git is that I see its potential as being greater than hg, and so while it definitely has some ease-of-use and documentation shortcomings, in the long run I think it has “more legs” than hg, and with the release of git 1.5.0, it became clear that the git community was willing to work on these particular shortcomings, which is why I started working on it and making plans to migrate e2fsprogs to use git. The main reasons why I think git is more powerful is that not just that it supports lightweight branches inside a single repository (although I really do like that a lot since it makes it a lot easier to run multiple experiments in parallel and switch back and forth between them), but also because git is much more Unix-like; there are lots of tools that enable scripting for either new git extensions or for ad-hoc shell pipelines that you can’t really easily do in hg without breaking into Python.

But git definitely does have shortcomings, and John is on the mark with most of them. Git’s documentation is very poor. Part of the problem stems for tutorials that were written to work with older versions of git (don’t try using anything older than git 1.5 if you are a git newbie; git 1.4.x is far more user-hostile) or were written for use with systems built on top of git 1.4.x, such as Cogito. (I don’t recommend Cogito, since git 1.5 is significantly more usable, and I’ve always found Cogito to be more confusing that just using stack git.) The tutorials that are distributed with git 1.5 are much better, but they clear do need more work.

It is true that the many of the man pages in git are lacking, and John’s criticism of the fact that many man pages do not list all of the options that they take, but rather refer to other man pages is on point and accurate. It has gotten better (take a look at the git-diff man page from the git 1.4.x days, and laugh or cry, depending on your point of view) but there is still much work to be done. In practice, the better way to use the git man pages is to skip past the options section, and take a look at the Examples section. This shows a number of ways that a particular command might be used, just as a Unix master might say, “Grasshoper, see how you can use awk to do all of these amazing things.”

I do take issue with John’s assertion that git’s philosophy has been to make life easy for the central maintainer, and not to pay much attention to the needs of individual contributors. That may have been Linus’s development priorities but with Junio having taking over maintenance, there have been a lot of improvements in git 1.5.0 to make life easier for people who are tracking remote repositories and making changes, in particular the git remote command.

The most interesting observation which John made was the non-intuitive semantics of git-format-patches, and I did find it interesting that one commenter posted a solution which made perfect sense given other git commands, and yet didn’t work. Obviously the person who posted the comment didn’t bother to try it first, probably because in most other places git is actually pretty consistent; but git-format-patch is one of the places where most painfully is not. I view this as a bug that should be fixed, and fortunately I think it can be fixed without breaking the git-format-patches is currently being used. (The problem is that it doesn’t use the standard notation so it is more convenient in the common case of how it is normally used, where the end point is not specified and is always the the tip of the current branch. I believe we can avoid the UI surprise by making git-format-patch use the standard revision range parsing if parameter contains the “..” or “…” operators. I’ll have to try to prepare a patch which does this and see whether it gets accepted.)

So basically git does have short-comings, yes, but people will come out in different places about which tools is best for them, and that’s OK. Actually, I think the ultimate solution for this problem is to build a bidrectoinal hg/git gateway. There are tools that will export from hg to git, and vice versa, and they are actually pretty sophisticated. I don’t think it should be that painful to create a tool that does incremental exports in both directions, maintaining state so that the right thing happens when a commit gets made on the git side, and gets exported into hg, or vice versa. Ultimately I think that’s the best solution, since that way people can use whatever tool they want, and still contribute and development as first class citizens. This is the main reason why I’ve held off on converting e2fsprogs to git (although I have made some private test repositories which I’ll use to take advantage of git’s superior annotation and query/log utilities); I don’t want to make a git repository of e2fsprogs public until I’m sure that a bidrectional gateway tool won’t require me to make any changes that affect the commit-id’s, since that would invalidate any work that people have done that was based on a clone of the coverted git repository.

I have a rough design for how to do the bidirectional gateway, but the issue is finding time to implement it. Anyone with free time looking for a project? If so, contact me. I probably should have written this up as a potential Google Summer of Code project, but it’s too late for this year. Oh, well.