Sunday, May 5, 2013

Incremental merge vs. direct merge vs. rebase

  Direct merge Rebase Incremental merge
Basic commands
git checkout master
git merge branch
<resolve big conflict>
git commit
git checkout branch
git rebase master
while not done:
    <resolve smaller conflict>
    git rebase --continue
git checkout master
git-imerge start [...] branch
while not done:
    <resolve pairwise conflict>
    git commit [...]
    git-imerge continue
git-imerge simplify [...]

(This uses the git-imerge tool [1].)

Size of conflicts All of the changes in master have to be combined with all of the changes in branch in one go. Various changes are mixed up together, even if they were originally committed separately, which can make the conflict large or even intractable. One step of a rebase requires one change from branch to be resolved with all of the changes from master. This is easier than doing a direct merge in one step, but still mixes up unrelated changes. At any step, only one change from master has to be combined with one change from branch. The commit messages and deltas from each of the two changes are available to help you understand the two changes.
Effort The least amount of effort, if there are only a few simple conflicts. Effort increases highly nonlinearly in the size of the conflict. A bit more effort, if conflicts are simple and isolated. Conflicts tend to be smaller than for direct merge, but more numerous. If there are sequential overlapping conflicts, you might need to resolve similar conflicts multiple times. More processing power required at start of merge. Conflicts are resolved as lots of little conflicts, so the effort scales closer to linearly with the number of pairwise conflicts [2]. Benefits decrease if original changes were not committed in small, self-contained steps.
Interruptibility After you start a direct merge, there is no way to record partial results. You are forced to either complete the whole merge and then commit it, or abandon the whole merge using git merge --abort. Progress on a direct merge is all-or-nothing. Once a rebase is started, it is awkward to interrupt it [3]. It is true that intermediate results are committed, but because the rebase discards the connection between original commits and rebased commits, you have to do the bookkeeping yourself. Each of the pairwise merges is resolved and committed separately. After each pairwise merge is committed, you can switch branches, do something else for a while, then return later to the incremental merge. Progress on an incremental merge can be locked in in small increments.
Testability, fixability The results of a direct merge cannot be tested until the whole merge is done. If you mess up one part of the merge, there is no easy way to revert only that part. The individual rebased commits can be tested. But if there is a failure, it is nontrivial to figure out which of the changes on master interacted badly with the rebased commit. Each pairwise merge can be tested individually. If one fails testing, it can be discarded and redone without having to discard earlier pairwise merges. If a failure is discovered after the incremental merge is done, it is possible to use git bisect to find out exactly which of the pairwise merges was faulty.
Collaboration Since an in-progress direct merge is not savable, the only way to get help from a colleague on a direct merge is to call the colleague to your computer and have him/her work in your working copy. It is strongly recommended that a branch that has been published not be rebased. Thus rebasing is usually applied only to local work done by a single developer. It is awkward to share the work of rebasing a branch [3]. Each pairwise merge commit is a normal commit that can be exchanged with other colleagues via push/fetch/etc. Each pairwise conflict can be resolved by the person most familiar with that part of the code. In fact, if you plan to retain the whole incremental merge in your repository history, then no history rewriting is necessary at all, and there is no problem merging a published branch, or publishing the intermediate merge commits as you go.
Recommendations In a merge-based workflow, use direct merge for simple merges. If a difficult conflict arises, switch to incremental merge and then simplify the results into a simple merge. In a rebase-based workflow with unpublished local changes, use rebase to update your work onto upstream changes if you only expect trivial conflicts. If you expect (or, midway through the rebase, encounter) nontrivial conflicts, switch to incremental merge and then simplify the results into a rebase.

Use incremental merge when:

  • a simple merge or rebase is desired but the conflicts are nontrivial
  • you would like to rebase work that has already been published (simplify results to rebase-with-history)
  • you want to retain a permanent record of how conflicts were resolved
  • you want to collaborate on a merge or rebase

Notes:

[1]git-imerge is a new tool that I am working on to automate incremental merging. It is available as open-source from its public Git repository.
[2]If development was truly chaotic, and especially if it involved changes that were later reverted, then an incremental merge might be much more effort than a simple merge because it requires you to resolve conflicts between changes that are obsolete.
[3](1, 2)

To pause and/or share the work of rebasing, add a reference to the last successful rebased commit using git branch rebase-wip before aborting the rebase with git rebase --abort. The result will look something like:

o - 0 - 1  - 2  - 3  - 4  - 5  - 6  - 7  - 8  - 9  - 10  - 11    ← master
    |                                                       |
    A                                                       A'
    |                                                       |
    B                                                       B'
    |                                                       |
    C                                                       C'
    |                                                       |
    D                                                       D'
    |
    E                                                       ↑
    |                                                   rebase-wip
    F
    |
    G
    |
    H
    |
    I

    ↑
  branch

Such an interrupted rebase can be continued using a command like git rebase --onto rebase-wip D branch, where D is the SHA1 of the last commit that was successfully rebased in the first attempt. The work can be shared by publishing tips branch and rebase-wip, but care should be taken when doing so to prevent other people from basing their own work on the pre-rebased branch branch.

No comments: