I have been working with Git for about a year now, and I think I am ready for some summary. Before Git, I used SVN, and before it Perforce, TFS, and some other products, but I will use SVN as a prototypical “non-Git” system.
Git buys you flexibility and performance for the price of greatly increased complexity of workflow.
SVN will work for you better if
- All your developers work under single management, AND
- All developers can get relatively fast access to a central server.
Conversely, you should prefer Git if
- There are multiple independent groups of developers that can contribute to the project, AND/OR
- It is difficult to provide all developers with fast access to a central server
Here is my take on the good, the bad, and the ugly sides of Git.
BAD: complexity of workflow, more entities to keep track of, lots of new confusing terms, “one repo is one project” policy, limited/outdated information about the remote, misleading messages. Commit labels are not integers, which complicates build/version number generation.
UGLY: you can lose work by deleting branches or tags.
Performance is one of Git design goals. It is partially achieved through clever programming, and partially through caching information locally and accessing remote servers only when specifically instructed. Obviously, this leads to a possibility that local data may become outdated.
Full local source control is convenient for implementing complex tasks in stages. I finish a stage, commit work locally, then continue to the next stage. If things go terribly wrong, I can always roll back to the last known good stage without losing much work. Unlike SVN, I don’t have to commit to the server (or even be connected to the server) and worry about showing half-baked work to others. Same for local experiments: I can quickly create a local branch, do some crazy stuff there, and push it to the server only if something good came out of it. Furthermore, I can do it on my laptop while disconnected from the central server.
Ability to move work quickly between repositories is built into Git. It is possible to move an entire history line from, say Visual Studio Team Services to GitHub using standard Git commands. This is something not easily doable with SVN or Perforce.
Simply put, with Git you need more steps to move your work to the server, even in simple cases. With traditional source control systems you record the changes using the “commit” command. With Git you also need to “push” your work to origin, which is easy to forget. This is not a theoretical concern: every developer in my team, myself included, had cases when they forgot to push the code to origin. You commit locally, you get distracted, and voila: your college comes to you with “I pulled the latest, but I can’t see your changes!”.
Unlike SVN, Git requires you to explicitly specify which modified files you want to check in. Fortunately, this is a problem only for strict command line environments. Most tools automate the task of finding modified files and adding them to “the index” before check in.
If you use branches and tags heavily, the situation is even worse. Each branch and tag must be created twice: in the local world and in the remote world, or they will be visible only to you. It is not easy to check which branches were pushed to origin, and tags are not pushed to origin by default.
Certain operations that are trivial in SVN become an issue with Git. E.g. checking what other people have done on the remote repository cannot be done directly, unless you are using something like GitHub. To get this information, you need to fetch the remote and issue funny commands. You also need to know what branches to fetch.
In a centralized version control system like SVN, as a developer you need to keep track of two data repositories:
- The SVN server
- Files in your local working copy
In Git, in the simplest case you need to worry about FOUR data repositories:
- Remote Git repository
- Files in your working copy
- Local Git repository, which is not the same as the working copy
- Local view of the remote repository; it may not coincide with actual state of the remote repository
Not only there are more things to worry about, some of these new things are also largely invisible, but nevertheless very important.
The local repository is in the “
.git” directory, and contains history of all local branches and tags, as well as remote references. The working copy corresponds to the contents of the currently selected branch plus any uncommitted changes. When committing, it is very important to remember which branch is currently active. Active branch is not clearly visible, and quite often one starts working in one branch when he meant to work in another branch. The logic of what happens to updated files when you switch branches is complex. There is a “
git stash” command to save you current work without making a real commit, but it is another complication of the workflow.
The “local view of the remote repository” is even more elusive. It is the local cache of what we know about the remote repository and its history. This cache may or may not correspond to the actual status of the remote repository, sometimes leading to confusion.
It also also not helping that when switching to Git people try to embrace more complicated branching models than they used before. This is not really Git’s fault, but the developers are now hit with the inherent complexity of Git workflow multiplied by the complexity of the branch mode, and the productivity suffers.
Introduction of new entities leads to new concepts like “fast-forward”, “tracking branch”, “remote reference”, and new operations like “push”, “stash”, “rebase”, “fetch”. “Fetch” is probably the most mysterious thing in Git. It updates the (invisible) local view of the remote repository, and therefore has no tangible effects, neither on the server, nor on the working copy. Therefore, it may be difficult to grasp what it does, even for seasoned developers.
Some operations have inherently confusing names. E.g. “
git clone” does not really create a complete clone of the source repository. A complete clone would be created by “
git clone --mirror“. Last time I checked, in the English language clones are supposed to be identical copies, while things in the mirror may be different in material ways, e.g. right and left are switched.
Other operations have names similar to their SVN counterparts, but do something different. E.g. one would expect “
git checkout” to download files from remote server like “
SVN checkout“, but in reality it switches to another local branch. “
git merge” merges the entire branch history, while “
SVN merge” merges only given revisions. “
SVN merge” corresponds to
git cherry-pick“, while “
git merge” is closer to “
svn merge --reintegrate“.
The very concept of “branch” and “commit” are different in Git, and these differences are subtle. In SVN a branch is largely equivalent to a folder, with some copy-on-write semantics to minimize space. SVN commits potentially modify the entire repository and do not belong to any particular branch. The list of SVN commits is strictly linear. In Git commits form a directed acyclic graph. A branch is a pointer into the graph of commits, and a commit is always done on a specific branch.
Unlike SVN/Perforce/TFS/etc., in Git you cannot clone, tag or branch just a part of the repository. You can only branch or tag the whole thing. This means that if you have multiple projects, you are forced to keep multiple git repositories. This is good for performance, but makes things hard to find. Someone must keep track of all those repositories, and it is left beyond the scope of Git proper. This limitation makes transitioning large SVN projects difficult. Besides, projects sometimes do require common parts. This led to introduction of another new concept: the submodule, with its own set of operations and command line switches.
Git will read the status of the remote server only when specifically instructed. This leads to at least two issues.
git log” shows only your local work, plus the work done by other people before the last pull. There is no easy way to see what is currently going on in the remote repository, unless this functionality is provided by non-git tools, such as Github.
Secondly, lack of up-to-date information on the remote repository may lead to incorrect messages. Suppose you clone a repository that contains branch
master. Then you create a local branch
experiment, switch to it, work on it for a while, and then switch back to master by issuing this command:
git checkout master
The output most likely would be
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'
The last sentence may be a lie. Git may say that you are “up-to-date” even if there have been recent changes to the remote repository and your local branch is actually behind. Git will not know you have fallen behind until it is explicitly told to fetch the information from the remote machine. This is highly misleading. Of course, this was a conscious design choice: not fetching remote state every time gives Git its great performance.
In SVN, once something is committed to the server, it is pretty much secure, barring server failures. Not so in Git. Firstly, if you forget to push your stuff to the remote, it is easy to lose it: local folders get overwritten, deleted, etc. However, even once pushed, your commits are not entirely safe. Deleting Git branches may make some commits unreachable, i.e. they no longer belong to the history of any branch. Such commits will be quickly deleted, which may lead to loss of work.
Git is a great tool for the use case it was designed for: distributed development of large open source projects. However, this leads to significant complexity and bunch of new concepts and operations. This means that Git may not be suitable for all projects, and centralized source control may do a better job in some cases. So, don’t become victim of the hype and choose your source control wisely.