A year of using Git: the good, the bad, and the ugly

I have been working with Git for about a year now, and I think I am ready for some summary. Before Git, I used SVN, and before it Perforce, TFS, and some other products, but I will use SVN as a prototypical “non-Git” system.

Git buys you flexibility and performance for the price of greatly increased complexity of workflow.

SVN will work for you better if

  1. All your developers work under single management, AND
  2. All developers can get relatively fast access to a central server.

Conversely, you should prefer Git if

  • There are multiple independent groups of developers that can contribute to the project, AND/OR
  • It is difficult to provide all developers with fast access to a central server

Here is my take on the good, the bad, and the ugly sides of Git.

GOOD: great performance, full local source control that works even when disconnected from the server, ability to move work quickly between servers without losing history.

BAD: complexity of workflow, more entities to keep track of, lots of new confusing terms, “one repo is one project” policy, limited/outdated information about the remote, misleading messages. Commit labels are not integers, which complicates build/version number generation.

UGLY: you can lose work by deleting branches or tags.

More Details

Great performance

Performance is one of Git design goals. It is partially achieved through clever programming, and partially through caching information locally and accessing remote servers only when specifically instructed. Obviously, this leads to a possibility that local data may become outdated.

Full local source control

Full local source control is convenient for implementing complex tasks in stages. I finish a stage, commit work locally, then continue to the next stage. If things go terribly wrong, I can always roll back to the last known good stage without losing much work. Unlike SVN, I don’t have to commit to the server (or even be connected to the server) and worry about showing half-baked work to others. Same for local experiments: I can quickly create a local branch, do some crazy stuff there, and push it to the server only if something good came out of it. Furthermore, I can do it on my laptop while disconnected from the central server.

Ability to move work quickly

Ability to move work quickly between repositories is built into Git. It is possible to move an entire history line from, say Visual Studio Team Services to GitHub using standard Git commands. This is something not easily doable with SVN or Perforce.

Complexity of Workflow

Simply put, with Git you need more steps to move your work to the server, even in simple cases. With traditional source control systems you record the changes using the “commit” command. With Git you also need to “push” your work to origin, which is easy to forget. This is not a theoretical concern: every developer in my team, myself included, had cases when they forgot to push the code to origin. You commit locally, you get distracted, and voila: your college comes to you with “I pulled the latest, but I can’t see your changes!”.

Unlike SVN, Git requires you to explicitly specify which modified files you want to check in. Fortunately, this is a problem only for strict command line environments. Most tools automate the task of finding modified files and adding them to “the index” before check in.

If you use branches and tags heavily, the situation is even worse. Each branch and tag must be created twice: in the local world and in the remote world, or they will be visible only to you. It is not easy to check which branches were pushed to origin, and tags are not pushed to origin by default.

Certain operations that are trivial in SVN become an issue with Git. E.g. checking what other people have done on the remote repository cannot be done directly, unless you are using something like GitHub. To get this information, you need to fetch the remote and issue funny commands. You also need to know what branches to fetch.

More entities to keep track of

In a centralized version control system like SVN, as a developer you need to keep track of two data repositories:

  • The SVN server
  • Files in your local working copy

In Git, in the simplest case you need to worry about FOUR data repositories:

  • Remote Git repository
  • Files in your working copy
  • Local Git repository, which is not the same as the working copy
  • Local view of the remote repository; it may not coincide with actual state of the remote repository

Not only there are more things to worry about, some of these new things are also largely invisible, but nevertheless very important.

The local repository is in the “.git” directory, and contains history of all local branches and tags, as well as remote references. The working copy corresponds to the contents of the currently selected branch plus any uncommitted changes. When committing, it is very important to remember which branch is currently active. Active branch is not clearly visible, and quite often one starts working in one branch when he meant to work in another branch. The logic of what happens to updated files when you switch branches is complex. There is a “git stash” command to save you current work without making a real commit, but it is another complication of the workflow.

The “local view of the remote repository” is even more elusive. It is the local cache of what we know about the remote repository and its history. This cache may or may not correspond to the actual status of the remote repository, sometimes leading to confusion.

It also also not helping that when switching to Git people try to embrace more complicated branching models than they used before. This is not really Git’s fault, but the developers are now hit with the inherent complexity of Git workflow multiplied by the complexity of the branch mode, and the productivity suffers.

Lots of new confusing terms

Introduction of new entities leads to new concepts like “fast-forward”, “tracking branch”, “remote reference”, and new operations like “push”, “stash”, “rebase”, “fetch”. “Fetch” is probably the most mysterious thing in Git. It updates the (invisible) local view of the remote repository, and therefore has no tangible effects, neither on the server, nor on the working copy. Therefore, it may be difficult to grasp what it does, even for seasoned developers.

Some operations have inherently confusing names. E.g. “git clone” does not really create a complete clone of the source repository. A complete clone would be created by “git clone --mirror“. Last time I checked, in the English language clones are supposed to be identical copies, while things in the mirror may be different in material ways, e.g. right and left are switched.

Other operations have names similar to their SVN counterparts, but do something different. E.g. one would expect “git checkout” to download files from remote server like “SVN checkout“, but in reality it switches to another local branch. “git merge” merges the entire branch history, while “SVN merge” merges only given revisions. “SVN merge” corresponds to git cherry-pick“, while “git merge” is closer to “svn merge --reintegrate“.

The very concept of “branch” and “commit” are different in Git, and these differences are subtle. In SVN a branch is largely equivalent to a folder, with some copy-on-write semantics to minimize space. SVN commits potentially modify the entire repository and do not belong to any particular branch. The list of SVN commits is strictly linear. In Git commits form a directed acyclic graph. A branch is a pointer into the graph of commits, and a commit is always done on a specific branch.

“One repo is one project” policy

Unlike SVN/Perforce/TFS/etc., in Git you cannot clone, tag or branch just a part of the repository. You can only branch or tag the whole thing. This means that if you have multiple projects, you are forced to keep multiple git repositories. This is good for performance, but makes things hard to find. Someone must keep track of all those repositories, and it is left beyond the scope of Git proper. This limitation makes transitioning large SVN projects difficult. Besides, projects sometimes do require common parts. This led to introduction of another new concept: the submodule, with its own set of operations and command line switches.

Limited/outdated information about the remote, misleading messages

Git will read the status of the remote server only when specifically instructed. This leads to at least two issues.

Firstly, “git log” shows only your local work, plus the work done by other people before the last pull. There is no easy way to see what is currently going on in the remote repository, unless this functionality is provided by non-git tools, such as Github.

Secondly, lack of up-to-date information on the remote repository may lead to incorrect messages. Suppose you clone a repository that contains branch master. Then you create a local branch experiment, switch to it, work on it for a while, and then switch back to master by issuing this command:

git checkout master

The output most likely would be

Switched to branch 'master'
Your branch is up-to-date with 'origin/master'

The last sentence may be a lie. Git may say that you are “up-to-date” even if there have been recent changes to the remote repository and your local branch is actually behind. Git will not know you have fallen behind until it is explicitly told to fetch the information from the remote machine. This is highly misleading. Of course, this was a conscious design choice: not fetching remote state every time gives Git its great performance.

You can lose work!

In SVN, once something is committed to the server, it is pretty much secure, barring server failures. Not so in Git. Firstly, if you forget to push your stuff to the remote, it is easy to lose it: local folders get overwritten, deleted, etc. However, even once pushed, your commits are not entirely safe. Deleting Git branches may make some commits unreachable, i.e. they no longer belong to the history of any branch. Such commits will be quickly deleted, which may lead to loss of work.

Conclusion

Git is a great tool for the use case it was designed for: distributed development of large open source projects. However, this leads to significant complexity and bunch of new concepts and operations. This means that Git may not be suitable for all projects, and centralized source control may do a better job in some cases. So, don’t become victim of the hype and choose your source control wisely.

See also

10 things I hate about Git by Steve Bennett.
A short history of Git
A successful Git branching model

9 Comments


  1. Hello, Ivan!

    This is a really informative and insightful post, thank you! It’s great to see a balanced summary of Git (DVCS) vs. SVN (CVCS) features and a description of the problems both systems have. There are practically no posts/articles on the Internet nowadays that try to compare Git with X in an unbiased way. (Or there are mostly no technically-correct comparisons). Therefore, I made my own attempt to resolve this and released a page “SVN vs. Git: Myths and Facts” that should help others find out facts about Git and SVN: http://svnvsgit.com/. You can also follow me on Twitter @svnvsgit 🙂

    I have a few comments, though.

    > Performance is one of Git design goals.

    IMO, being distributed is the one and main Git design concept. All other pros and cons are the concequences of the fact that every developer has a copy of the whole repository and revision history on his computer.

    > Unlike SVN, I don’t have to commit to the server (or even be connected to the server) and worry about showing half-baked work to others. Same for local experiments: I can quickly create a local branch, do some crazy stuff there, and push it to the server only if something good came out of it.

    With Subversion you could create a private shelve or branch in SVN repository and do the crazy stuff there. It is even possible to make your private shelves/branches invisible for other developers! I guess that you could disable email notifications for such private branches and configure access rules so that noone except yourself could see the crazy stuff you do. Unfortunately, SVN still does not support local shelves or branches, so you should be connected to the server.

    Thank you! 😀

    Reply

  2. >You can lose work!

    >In SVN, once something is committed to the server, it is pretty much secure, barring server failures. Not so in Git. Firstly, if you forget to push your stuff to the remote, it is easy to lose it: local folders get overwritten, deleted, etc. However, even once pushed, your commits are not entirely safe. Deleting Git branches may make some commits unreachable, i.e. they no longer belong to the history of any branch. Such commits will be quickly deleted, which may lead to loss of work.

    I just want to clarify this point. Even if a branch gets deleted, the commit hash itself is still available (and thus the code changes you made in that commit hash and all of its parents are still available. And, you can easily attach a branch back to this commit with `git checkout -b `). It’s not always easy to find old commit hashes, but you can do it by looking in the `git reflog` https://git-scm.com/docs/git-reflog.

    You won’t lose this work unless these commits haven’t been referenced by a branch within 30 days (although this is configurable). After that time, they get garbage collected (`git gc` https://git-scm.com/docs/git-gc).

    Reply

    1. I recall being unable to recover a commit fairly quickly after I have made it unreacheable, even though I knew the hash. However, it was a long time ago, and I may have been doing something wrong. I will double check in a few days.

      Reply

      1. A quick test is to do:

        # Create “commit 1”
        git commit –allow-empty -m “commit 1”
        # Get commit hash
        COMM1=$(git rev-parse HEAD)

        # Show that we have “commit 1”
        git log -n 2

        # Amend previous commit
        git commit –allow-empty –amend -m “commit 2”

        # Show that “commit 1” was replaced by “commit 2”
        git log -n 2

        # Show that “commit 1” is accessible
        git show $COMM1

        # We can still check it out directly, but you’ll get a detached head
        git checkout $COMM1

        # Attach branch to commit
        git checkout -b branch-name $COMM1

        One thing to keep in mind is that if you “git push” up this branch as it is now, only “commit 2” will get pushed up to the remote server (so “commit 1” will never have been pushed up to the remote server), so if you, or anyone else, was to reclone this repository, git would not clone “commit 1”. They would only be able to reach “commit 2”.

        Reply

  3. Thank you for sharing your experiences with Git. It is a very good point what you made at the end of your post: “…don’t become victim of the hype and choose your source control wisely”. Indeed.

    I agree with you when you said that Git suits some projects better than SVN. This does not mean that Git is a better option for every project. That is a good thing to keep in mind, and people should invest some time in researching different version control systems before doing any decisions.

    Therefore, the readers of this post might also benefit from reading this comparison between Git and SVN: https://deveo.com/git-vs-svn/. It sheds light on, e.g. the intuitiveness of these systems and how they support large files.

    Reply

  4. I can confirm your (bad) experiences made with git.

    But I missed the main ugly point of git for me: The combination of files in your working copy and local Git repository creates some kind of magical “state” defining what kind of operations you are allowed to do. And it is easy to get your working copy to a “state” where you can’t do the things you want, for example change the branch. This can happen by some kind of broken merge, rebase, whatever or just by modifying a file. With SVN if something went wrong you delete/move away the files making troubles and make a svn update. With Git you start googling how to resolve the situation.

    Additionally it is with SVN absolutely no problem having multiple checkouts of the same project e.g. with several branches while with Git this due the “refresh from remote repository” problem much more annoying.

    And all these problems just to do “offline” commits (I think 98% of the work is done while having a reliable connection to the repository)?

    Git-Repositories like Github or Gitlab are cool, but Git is IMHO much to complex and unintuitive for the daily use. The best solution for me would be some kind of translator software like subgit but in the other direction: I can use the SVN client to update, commit, create tags and branches, merge and the translator handles this to the Git repository.

    Reply

    1. Berhnhard, if you don’t mind losing your edits, you can always do

      git reset –hard

      This usually takes care of the files in bad state. A less drastic alternative is

      git add -A .
      git stash

      It does more or less the same thing, but keeps your changes in the stash. Push come to shove, you can always create another clone of the central repo, but I rarely had to do it in the last couple of years.

      You are right about working with multiple clones of the same central repo: it can get confusing really quickly, so you have to keep tabs on what are you doing there. This is especially true if both clones point to the same branch. This situation better be avoided: you work on one branch in one clone, and on another branch in another clone.

      I agree with you about the extra complexity coming from the decentralized model. However, when you do need it, it is priceless. It is pretty much impossible to move code in SVN from repo A to repo B and keep all the history, while in git it is a piece of cake.

      I don’t really have a lot of friction working with git lately, but explaining it to people who never worked with it is a pain. Typical period of whining is cursing is 2-3 months, then they get used to it, at least for the daily workflow tasks.

      Reply

    2. BTW, your web site, https://blog.bmaehr.com/, does not open in Chrome. StartCom root CA is no longer trusted, their certificate was revoked, and in this case Chrome does not allow you to proceed to the site. You should get a certificate from different root CA, StartCom is useless now.

      Reply

  5. It is true SVN is more “simple” than Git. But we have recently moved to git for a main reason: merges.
    One year ago we started to do a lot of refactoring, but now (it is on roadmap of SVN) you can’t track a rename in SVN, and merge and tree conflicts become our nigthmare…

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *