Reflections on a complaint from a frustrated git user

Last week, Scott James Remnant posted a series of “Git Sucks” on his blog, starting with this one here, with follow up entries here and here.  His problem?  To quote Scott, “I want to put a branch I have somewhere so somebody else can get it.  That’s the whole point of distributed revision-control, collaboration.”   He thought this was a “mind-numbingly trivial” operation, and was frustrated when it wasn’t a one-line command in git.

Part of the problem here is that for most git workflows, most people don’t actually use “git push”.   That’s why it’s not covered in the git tutorial (this was a point of frustration for Scott).   In fact, in most large projects, the number of people need to use the “scm push” command is a very small percentage of the developer population, just as very few developers have commit privileges and are allowed to use the “svn commit” command in a project using Subversion.  When you have a centralized repository, only the privileged few will given commit privileges, for obvious security and quality control reasons.

Ah, but in a distributed SCM world, things are more democratic — anyone can have their own repository, and so everyone can type the commands “git commit” or “bzr commit”.   While this is true, the number of people who need to be able to publish their own branch is small.  After all, the overhead in setting up your own server just so people can “pull” changes from you is quite large; and if you are just getting started, and only need to submit one or two patches, or even a large series of patches, e-mail is a far more convenient route.  This is especially true in the early days of git’s development, before web sites such as git.or.cz, github, and gitorious made it much easier for people to publish their own git repository.   Even for a large series of changes, tools such as “git format-patch” and “git send-email” are very convenient for sending a patch series, and on the receiving side, the maintainer can use “git am” to apply a patch series sent via e-mail.

It turns out that from a maintainer’s point of view, reviewing patches via e-mail is often much more convenient.  Especially for developers who are just starting out with submitting patches to a project, it’s rare that a patch is of sufficiently high quality that it can be applied directly into the repository without needing fixups of one kind or another.   The patch might not have the right coding style compared to the surrounding code, or it might be fundamentally buggy because the patch submitter didn’t understand the code completely.   Indeed, more often than not, when someone submits a patch to me, it is more useful for indicating the location of the bug more than anything else, and I often have to completely rewrite the patch before it enters into the e2fsprogs mainline repository.    Given that, publishing a patch that will require modification in a public repository where it is ready to be pulled just doesn’t make sense for many entry-level patch submitters.   E-mail is in fact less work, and more appropriate for review purposes.

It is only when a mid-level to senior developer is trusted to create high quality patches that do not need review that publishing their branch in a pull-ready form really makes sense.   And that is fairly rare, and why it is not covered in most entry-level git documentation and tutorials.   Unfortunately, many people expect to see the command “scm push” in a distributed SCM, and since “git pull” is a commonly used command for beginning git users, they expect that they should use “git push” as well — not realizing that in a distributed SCM, “push” and “pull” are not symmetric operations.   Therefore, while most git users won’t need to use “git push”, git tutorials and other web pages which are attempting to introduce git to new users probably do need to do a better job explaining why most beginning participants in a project probably don’t need their own publically accessible repository that other people can pull from, and which they can push changes for publication.

There is one exception to this, of course, and this is a developer who wants to get started using git for a new project which he or she is starting and is the author/maintainer, or someone who is interested in converting their project to git.   And this is where bzr has an advantage over git, in that bzr is primarily funded by Canonical, which has a strong interest in pushing an on-line web service, Launchpad.   This makes it easier for bzr to have relatively simple recipes for sharing a bzr repository, since the user doesn’t need to have access to a server with a public IP address, or need to set up a web or bzr server; they can simply take advantage of Launchpad.

Of course, there are web sites which make it easy for people to publish their git repositories; earlier, I had mentioned git.or.cz, github, and gitorious.   Currently, the git documentation and tutorials don’t mention them since they aren’t formally affiliated with the git project (although they are used by many git users and developers and the maintainers of these sites have contributed a large amount of code and documentation to git).   This should change, I think.   Scott’s frustrations which kicked off his “git sucks” complaints would have been solved if the Git tutorial recommended that the easist ways for someone to publicly publish their repository is via one of these public web sites (although people who want to set up their own server certainly free to do so).

Most of these public repositories probably won’t have much reason to exist, but they don’t do much harm, and who knows?  While most of the repositories published at github and gitoriuous will be like the hundreds of thousands of abandoned projects on Sourceforge, one or two of the new projects which someone starts experimenting on at github or gitorious could turn out to be the next Ruby on Rails or Python or Linux.  And hopefully, they will allow more developers to be able to experiment with publishing commits on their own repositories, and lessen the frustrations of people like Scott who thought they needed their own repositories; whether or not a public repository is the best way for them to do what they need to do, at least this way they won’t get as frustrated about git.  :-)

27 thoughts on “Reflections on a complaint from a frustrated git user

  1. This mirrors a lot of my own thinking after reading Scott’s posts as well, although it also got me wondering if it makes sense to write a git-publish to make it easier to put a tree somewhere that you have ssh access to. Haven’t had the time to bang out a simple proof of concept yet, but it at least *seems* sensible at first blush.

  2. An idea which I posted as a comment to a post which Scott made today as a follow up to his “git sucks” posts:

    Ideally the git gui should have an easy way of highlighting a series of commits, and then all the user would have to do is click on the “send e-mail” button, with the default patch submission address stored in a .git/config, perhaps with a top-level dot file where that information could be stored on a per-project basis.

    I’m not a gui hacker, but maybe someone who is will be inspired to try to whip something up….

  3. Hi!

    I won’t accept patches for Drizzle by email :)

    While I use BZR, and not git, the systems are similar in concept. My work flow is to have users commit and push to their own branches. I will then “merge” these branches which allows me to take all of their patches as one big commit (though I ask for these to be kept to a reviewable size). During the merge I can either reject the patch or make changes to the patch while I review.

    I don’t bother rejecting patches when the changes are minor.

    And why don’t I take patches from email?

    1) Anything committed on Launchpad is trackable. I can go back to the account that submitted the code (much easier long term to pull a copyright infringement this way).

    2) For me workflow through email is a pain. I have to save out the file, pull it in… I would just rather do a bzr merge lp:~someone/tree.

    Cheers,
    -Brian

  4. I use github as a backup for my local git repository. One simple ‘git push’ and I have an off-site backup.

    My public repository is on dreamhost.com. Now that they finally installed git it’s not too hard to maintain. You need to use ssh to push and http to pull. They don’t run a git server.

  5. Thanks, that is a nice explanation why git sucks so much in this regard. Now we prospective git users can only hope someone will finally fix git instead of explaining again and again the reasons for its shortcomings.

  6. @3: Brian,

    The main reason why I like e-mail is that I can ask people to make changes to the patches and then have a record of what I asked for; also, it means that other people on the mailing list get to participate in the review and get to see what sort of design choices are preferred. For a big project, IMHO, the group review process is essential.

    As far as workflow through e-mail, if the patch series is perfect, sucking things in is really easy; all I have to do is save the patch(es) in an e-mail folder, and then point “git am” at the folder. It will apply all of the patches, with commit logs, and it even keeps track of who originally submitted the patch (which is stored in a different place from the person who created the git commit — i.e., who ran “git am”). I use the Linux-style “Signed-off-by: ” convention to track copyright authorship issues, and that works well enough for me.

  7. Good apologia, but at least two missing points:

    1. Another common reason to want to use git push is if you’re making a fork of a project, for whatever reason. It need not be a long-term fork, or a hostile fork. Think about distributions wanting to publish their slightly modified source tree, or someone who is developing something like ext4 before it’s ready for mainline but still wants to share the work in a form that the world can easily access. People in these situations often are not trusted lieutenants who have a strong knowledge of git.

    2. One of the typical newbie ways to set up a public repository involves trying to clone an empty repository. (ie, make remote bare repo first, clone it to laptop, hack, push) An act that git denies for no particularly good reason. Happily, this looks like it might actually be fixed, as patches are floating around (in email ;) http://osdir.com/ml/git/2009-01/msg00452.html

    With that said, it’s a very good point that wanting to set up a git repo too early is a common newbie mistake. I know, because I made it. :)

  8. If git workflow is more pull based, then the main developer needs something to pull from. Say I want to contribute to a project lead by developer A. I pull from the git repository of A and make my own changes. A woul want to pull those changes back in, but to do that I need to make a public repository.

    Pushing to a public repository and creating a public repository are operations that are as common as pulling from them for the most desirable people in our community – people that want to contribute back. Even the smallest obstacle to contributing back to a project can be the difference between a successful project and a mediocre project.

  9. @7: Joey,

    Both are good points. The number of people who need to publish changes for others to use (i.e., because they are packaging the project for a distro, et. al.) is still fairly rare, but it’s true, there are other reason to publish a repo other than the original maintainer and senior lieutenants.

    And it never occurred to me that it would make sense to create an empty repository on a remote server, clone it to the laptop, hack, and push, as it’s much easier to just create the repo locally and then publish it. I do agree, though, that git should allow that method to work — although the git documentation and tutorials should explain a better way of doing things, both in the case where no publicly accessible repo is really required by the user, and the case where the user really does need the public, pullable repository.

  10. The idea that push and pull are not symmetric is just a symptom of bad UI: Mercurial gets this right, as there push/pull are symmetric — and both get used quite a lot. I think hg’s pull is symmetric to git’s fetch, though. Mercurial has a fetch extension to mirror git’s pull.

  11. For what it is worth, Bazaar lets you record an author for a commit independent of the committer by passing the –author option to “bzr commit”.

    I use this when committing changes to my projects that have been submitted as patches. That said, I probably do more branch merges than manually applied patches.

  12. @10: Dirkjan,

    Yes, Mercurial’s push and pull are symmetric, but it’s a tradeoff. Now if you don’t remember to use the command “hg update” after doing an “hg pull”, the wrong thing happens. Yes, “hg pull” reminds the user not to forget to run “hg update” after “hg pull”, but that’s rather inconvenient because now the user needs to type two commands instead of one for a very common operation.

    The asymmetry that I was referring to is slightly different, though, which is that while “git pull” is very common, the number of people who need “git push” is much smaller, for the reasons I mentioned; and this is true for hg as well. Yes, hg does have as an advantage that it can more easily “instantly” set up an hg server by using the command “hg serve”, so that people can more easily pull from some user’s random laptop. But it is still the case that in general, it’s rare that most hg users would need to someone to be able to pull from from their laptop. Yet, most hg users need to use “hg pull”, and that’s a rather fundamental asymmetry based on most OSS projects having a small core team, a somewhat larger set of trusted developers, and lots of people following the project that test the bleeding edge code and provide bug reports, but who much less likely to submit patches.

  13. This is why I like repo.or.cz’s idea of the “mob” branch that anyone can push a patch to. As long as it’s kept relatively clean it seems like a great idea.

  14. > The idea that push and pull are not symmetric is just a symptom of bad UI: Mercurial gets this right, as there push/pull are symmetric — and both get used quite a lot. I think hg’s pull is symmetric to git’s fetch, though. Mercurial has a fetch extension to mirror git’s pull.

    Could it be that you guys are just confused because Git doesn’t follow the conventions followed by other revision control systems? That it’s not because Git sucks, but it’s because your just not used to working that way?

    I mean personally I always abhorred things like CVS or subversion and whatnot. Never liked working with any of those and never really understood it. But Git came along and it was just so much easier to understand and made so much more sense. Sure there were usability issues with the command line stuff and other shortcomings.

    But really git was the first system that I used that I could easily track changes I made to home directories on multiple disconnected computers and manage my documents at work and that sort of thing. I could never figure out how to do that sort of thing with any other system I tried.

    Plus I like the ability that I can just use my files and folders how I am used to them.. moving stuff around, copying files, shoving files into other files, deleting things and renaming things, but Git is able to track all that. I like not having to give a crap about that sort of thing.

    I don’t really understand the difficulty that you guys are running into here. It’s just makes a hell of a lot more sense that each developer has their own little private sandbox and you send each other changes rather then having some central thing were people can just upload whatever without any real coordination.

  15. Thank for apologize for git without providing a patch to make “push” easier for your “average developers”.

  16. Jeremy: I gathered that git supported tracking author separate from committer from Ted’s comment. It is nice to know that both Bazaar and git have the same syntax for making use of the feature.

  17. Huh? Sounds like a circular argument?

    In the post:

    “Unfortunately, many people expect to see the command “scm push” in a distributed SCM, and since “git pull” is a commonly used command for beginning git users, they expect that they should use “git push” as well — not realizing that in a distributed SCM, “push” and “pull” are not symmetric operations.”

    So you’re saying here, push and pull are asymmetric => push is rarely used.

    But then in your last reply, you say

    “The asymmetry that I was referring to is slightly different, though, which is that while “git pull” is very common, the number of people who need “git push” is much smaller, for the reasons I mentioned;”

    so push is rare => push and pull are asymmetric. This sounds like a circular argument?

    Just wanted to point this out, since I’m a beginner to git (although have used bzr a lot), and I would be more interested in underdstanding what this “asymmetricity” means here.

  18. @18: meeble,

    So the fundamental asymmetry is caused by the fact that most users are pulling from a trusted source, so they don’t need to do any kind of review as part of a “git pull”. However, most downstream users’ repositories are not trusted, so it is highly unlikely that anyone — either another downstream user, or the upstream maintainer, will simply pull from their repository. In other words, the asymmetry is caused by the difference in trust relationships, and the fact that there are many more untrusted repositories than there are trusted repositories — and fundamentally it’s not all that interesting to push an untrusted repository to a remote server for the purpose of publishing it.

    The problem, of course, is everyone is certain their repository is worthy of being trusted. So it’s not politically acceptable or popular to say, “we’re just not into your commits”.

    What about Brian’s workflow of pulling someone else’s (untrusted) commit, reviewing it, and then fixing it up as part of the merge operation? All aside from the fact that most DSCM’s don’t have a convenient for reviewing a remote branch and sending comments back, if your workflow involves your fixing up the commit as a part of the merge, the problem is the broken commit stays in your repository, and can potentially cause problems when you try to do a bisection search to find a completely unrelated bug.

  19. @9 tytso: The advantage of git init –bare on the server and git clone on the laptop would be that it allows setting up the repos with just two commands, and without resorting to any non-git commands (rsync, scp -R, etc). Also you don’t have to worry about manually setting up remotes, which avoids a certian bit of learning curve.

    (It’s also very symmetric with running git svn on the server and git clone on the laptop.)

    Note that if you set up a new repository on github, it necessarily creates the repo on the server first. If/when git is able to clone empty repos, the github instructions for this could be much simplified. The current instructions are about 12 lines long. IMHO they should be 2 lines long.

    A hypothetical client-side git publish command could have the same benefits, but I don’t see how such a command could handle the github case, where the server creates the remote repo and there is no way to scp into it.

  20. I sometimes create small repositories on my laptop, and want to share them with the world. I need to mirror them on a server, since my laptop is (a) behind NAT most of the time, and (b) offline or suspended the rest of the time. I would prefer using my own web server (with ssh and apache on it) so I don’t have to figure out how to sign up to some website, figure out its terms & conditions, then trust that it won’t go offline and leave me stranded.
    I want to push from my server rather than ssh to the server and pull from the laptop, since the laptop’s IP address is not constant (I could be in a different network somewhere, perhaps even behind a NAT box I don’t control).

    I can mirror darcs and bzr repositories by doing a push operation over SSH to a new public_html/ directory that’s served over Apache, and then keep that mirror up-to-date by doing repeated pushes. Accessing that repository over plain HTTP may be slow (especially with bzr), but it’ll work, and it won’t matter too much on small toy projects.

    I wish git also let me do this with a single command.

    (Yes, I should try to write a patch for git instead of whining here.)

  21. @22: Marius,

    Well, you can do it as a one-liner:

    % git clone –bare . /tmp/myproject.git; (cd /tmp/myproject.git; git update-server-info; chmod +x hooks/post-update); scp -r /tmp/myproject.git http://www.example.com:~/public_html/myproject.git

    Although, granted it’s a rather long one-liner. Turning this into a shell script you drop in /usr/libexec/git/git-http-publish so that “git http-publish” will work is left as an exercise to the reader.

  22. A good explanation, but the statements indicating that push is rare or that push/pull is asymmetric shows a bias to a particular kind of workflow and usage pattern typical of large open source software. However, git is also used in other environments where those kinds of assumptions are no longer true. For example, I have worked in several organizations where every member (dozen plus people) had commit privileges into the svn repository and there was no hierarchy of individuals committing other people’s patches in. There *are* valid usage patterns of git where ‘push’ can be legitimately common operation and yet a vast number of ‘how to use git’ and ‘how to setup your git workflow’ tutorials out there still don’t seem to cover the ‘largish group of peers, all with commit privileges/responsibility working on a common source base’ workflow very well. That’s certainly not a “bug” with git, but it can help explain why git occasionally gets slammed when a user with a workflow that doesn’t parallel the linux kernel’s workflow tries to become an early adopter in an environment.

  23. @24: William,

    Agreed, you make a good point. Git would be applicable to more more uses cases if it were easier to set things up for two uses cases. (a) A personal public repository from which other people can pull changes, and (b) a shared repository to which a group of people can collectively push changes that others can download. I do wonder though, how much of this is necessary given the presence of services like github and gitorious. Yes, the third party service could go away at any time, but unlike a central SVN server, every developer has a full copy of the repository on their laptop or development machine. So in the worst case, the project might need to promulgate a new URL if their git hosting service “went away”, but that ought to be an acceptable risk for most projects.

    So I see this as something which is about 20% technology (having a single command instead of four commands, including the scp, does make life easier), and 80% documentation — and that’s something I’ve long acknowledged to be a weakness with the git project. It’s been getting better, and so things like Scott’s complaints are in the long run good for git, but there are still a lot of things that git could do better in terms of documentation and tutorials.

  24. I’m new to git, but I was able without additional help/troubles to deploy my own git repository. Just used the Ubuntu community git guide.

    I also use bzr, but when it comes to host your college stupid projects in a scm, I’m not sure Launchpad is the perfect solution, and here it comes git which is really cool.

    Cheers.

  25. @nate:

    I hate cvs like you. You first learn git and I first learn mercurial.
    For me the difference between git and mercurial is like the same than cvs and subversion.
    Learning git really sucks.

    But don’t miss me, these two tools are really usefull. I’m just sad that git is this hard to learn comparing to mercurial.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>