You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@asterixdb.apache.org by "Chris \"Ceej\" Hillery" <ce...@couchbase.com> on 2015/09/18 10:04:33 UTC

Merging vs. squashing

I'm pulling this conversation into a separate thread from the "Commit
message proposal".

There are certainly long-running and contentious arguments about whether
commits to master should be merges or squashes/cherry-picks. That ranks
right up there with vi vs. emacs. Before going too far into that, however,
you should be aware of some additional quirks/limitations that Gerrit
brings into the story.

First, we have Gerrit configured to always cherry-pick changes when
submitting them. We do this for a couple of reasons, but the main one is
that Gerrit gives you a little additional value in this case: It introduces
Reviewed-on: and Reviewed-by: headers into the commit, which provide you
with a link back to the Gerrit review. For instance,
https://github.com/apache/incubator-asterixdb-hyracks/commit/d885779b13c4fd9bd551d306a460df2894fb18cb
- see that there is a hyperlink there back to Gerrit. This is not possible
if you use any merge strategy other than cherry-pick, because only
cherry-picking creates a *new* commit object that Gerrit can decorate.

Moving on - In Gerrit, it is not possible to review a "branch merge", at
least not in the way you would hope. A review in Gerrit is always of a
*single commit*. If you make a number of commits on your local working
branch and then push that branch to Gerrit's refs/for/master, it will
create one review for each commit. This is occasionally desirable for
larger changes that can reasonably be broken up, but as a default working
method it is usually more trouble than it's worth - especially because
Gerrit won't prevent you from Submitting a subset of the changes or merging
them out of order. This is the main reason that I implemented git-gerrit to
do a squash before uploading a change to Gerrit.

What if you make changes on a branch, then merge them to your local master
and then propose just the merge commit to Gerrit? That's just one commit
(the merge commit) and therefore one review, right? Well, no - that would
only work if all of your branch commits were already known to Gerrit, say
because they were submitted (one at a time) to a non-master branch.
Otherwise, you'll get N+1 separate reviews on Gerrit, and the merge commit
review won't even make sense anymore.

It *is* possible to do this "right" with Gerrit if you want to get merge
commits onto the master branch. You need to create a secondary branch,
"unstable" or "testing" or similar, and propose your individual changes
there. We could even create multiple of those branches for individual
feature work if we wanted. Then once that branch is in a state that you
want to merge to master, you can propose a single merge commit change to
Gerrit. The biggest downside to this is that when reviewing a merge commit,
Gerrit does NOT show you any information about the changes being introduced
into the target branch. It only shows you any diffs which were introduced
during conflict resolution of the merge. Also, merge commits cannot be
cherry-picked, so you lose the Reviewed-on: header for the merge itself.
These aren't necessarily fatal limitations - we have a couple teams in
Couchbase using exactly this methodology with success.

Anyway. The upshot is, handling git history is a mess, and Gerrit makes it
messier. The "always squash branches, always cherry-pick changes" approach
that we currently use for AsterixDB seems to me to be the least available
evil, and it at least results in a clean if abbreviated commit history on
master. It definitely has its downsides as well, though - Jianfeng's
experience is one such problem, and it doesn't have a clean solution.

Ceej
aka Chris Hillery

Re: Merging vs. squashing

Posted by Ted Dunning <te...@gmail.com>.
Thanks a ton!

That was very educational.



On Fri, Sep 25, 2015 at 2:22 AM, Chris Hillery <ch...@hillery.land>
wrote:

> So, I've done some more experimentation with Gerrit's various merge
> strategies.
>
> First a reminder that a Gerrit review is always of a single commit, not a
> branch. This massively limits our ability to share branches before we even
> consider the Gerrit merge strategy, because it forces us to squash changes
> down to single commits. If we truly want to "solve" that problem, we'll
> need a much more extensive change to working method than just the Gerrit
> merge strategy. Also, any method that solved that problem and still had
> meaningful code reviews would involve reviewers needing to individually
> review every commit from the developer's branch. In all honesty, it might
> be easier to use a completely different review tool than Gerrit at that
> point.
>
> Anyway, at the risk of drowing this conversation completely... there are
> five merge strategies available in Gerrit; here are what I see as their
> pros, cons, and gotchas.
>
> 1. *Cherry-pick*. This is what we use today: the commit under review is
> cherry-picked onto master, resulting in a new commit.
>
> Pros:
>   - Introduces the "Reviewed-on" header in the commit on master, which
> allows linking back to the review page.
>   - Ensure linear master commit history, since there are no merge commits.
> Cons:
>   - Cherry-picks are always new commits (although it seems that rebase and
> merge can handle that OK - the real damage has already been done by the
> squashing before it even gets to Gerrit)
>
> 2. *Always merge*. This means that the commit under review will be put into
> the repository, and then a merge commit will be created (by Gerrit itself)
> with the current master tip as one parent and the new commit as another.
> Basically the same as using "git merge --no-ff" locally.
>
> Pros:
>   - Original commit maintains SHA, etc.
>   - Original commit does not necessarily have to be rebased to master tip
> (although if it isn't there can be merge conflicts)
> Cons:
>   - Master history can be a bit messy (although "git log --first-parent"
> works reasonably well)
>   - The commit message Gerrit creates is always:  ' *Merge "first line of
> commit message"* ' which means the master commit history loses a ton of
> information.
>
> 3. *Fast-forward only*. This means that the commit under review must have
> the current master tip as its parent. When the commit is submitted, Gerrit
> will simply fast-forward the master branch to the new commit. Basically the
> same as using "git merge --ff-only" locally.
>
> Pros:
>   - Original commit maintains SHA, etc.
>   - Master log history is kept very clear.
>   - No possibility of merge conflicts at Gerrit time.
> Cons:
>   - Since you must rebase to master before submitting, you potentially lose
> the ability to share your working branch with others more often than is
> strictly necessary.
>   - Also since you must rebase to master before submitting, you will
> potentially have more submissions that need to be rebased and reproposed
> than is strictly necessary.
>
> 4. *Merge if necessary*. This is basically the same as the default
> behaviour of "git merge": fast-forward if possible (ie, the commit under
> review's parent is the current master tip), otherwise create a merge
> commit.
>
> Pros:
>   - Good compromise between "keeping local branch commits shareable" and
> "don't force unnecessary rebases" - possibly the least developer-intrusive
> option.
> Cons:
>   - Master history is inconsistent, with a mix of "real" commits and merge
> commits.
>   - Merge commits still have the extremely abbreviated log message, which
> is IMHO even worse here since some commits will have full messages and some
> won't.
>
> 5. *Rebase if necessary*. This will fast-forward master if the commit under
> review has the current master tip as its parent. If not, Gerrit itself will
> attempt to rebase the commit onto master, and then fast-forward it onto
> master. This may result in conflicts at Submit time.
>
> Pros:
>   - Less developer interaction required than "fast-forward only".
>   - Master log history is kept very clear.
> Cons:
>   - Commits on master will sometimes be new commits, basically cherry-picks
> (without the advantages of cherry-pick)
>
>
> Anyway, I'm not sure any of this information is useful. There isn't a merge
> strategy which is clearly better or worse than any other, and the problem
> of wanting more flexibility in development branches is only tangentially
> related to that anyway. But, here's the information in case anyone wants to
> think about it any more.
>
> I'm going to do a little bit more exploration with "git gerrit" to see
> whether I could change anything there that would make a difference.
>
> Ceej
> aka Chris Hillery
>
> On Thu, Sep 24, 2015 at 5:52 PM, Chris Hillery <ch...@hillery.land>
> wrote:
>
> > On Thu, Sep 24, 2015 at 11:05 AM, Jianfeng Jia <ji...@gmail.com>
> > wrote:
> >
> >> If we can somehow enforce the rebase before merge, then I think it won’t
> >> have the merge “tree” problem.
> >>
> >
> > Actually the merge-tree problem was introduced in a situation where a
> part
> > of a branch was cherry-picked to master. It would also happen if part of
> a
> > branch were cherry-picked over to another branch (like you wanted to do
> > with Taewoo's branch) and then both were eventually merged to master. In
> a
> > normal scenario this wouldn't happen, with or without rebasing the
> feature
> > branch before merge.
> >
> > As for whether "rebase before merge" should be enforced, I'm in favor of
> > it personally. I'm even in favor of doing local interactive rebases with
> > selective squashing and so forth to "clean up" local branch history
> before
> > merging - although as always, this is destructively rewriting history and
> > therefore should not be done if the branch was shared with anyone else.
> >
> > Ceej
> > aka Chris Hillery
> >
>

Re: Merging vs. squashing

Posted by Chris Hillery <ch...@hillery.land>.
So, I've done some more experimentation with Gerrit's various merge
strategies.

First a reminder that a Gerrit review is always of a single commit, not a
branch. This massively limits our ability to share branches before we even
consider the Gerrit merge strategy, because it forces us to squash changes
down to single commits. If we truly want to "solve" that problem, we'll
need a much more extensive change to working method than just the Gerrit
merge strategy. Also, any method that solved that problem and still had
meaningful code reviews would involve reviewers needing to individually
review every commit from the developer's branch. In all honesty, it might
be easier to use a completely different review tool than Gerrit at that
point.

Anyway, at the risk of drowing this conversation completely... there are
five merge strategies available in Gerrit; here are what I see as their
pros, cons, and gotchas.

1. *Cherry-pick*. This is what we use today: the commit under review is
cherry-picked onto master, resulting in a new commit.

Pros:
  - Introduces the "Reviewed-on" header in the commit on master, which
allows linking back to the review page.
  - Ensure linear master commit history, since there are no merge commits.
Cons:
  - Cherry-picks are always new commits (although it seems that rebase and
merge can handle that OK - the real damage has already been done by the
squashing before it even gets to Gerrit)

2. *Always merge*. This means that the commit under review will be put into
the repository, and then a merge commit will be created (by Gerrit itself)
with the current master tip as one parent and the new commit as another.
Basically the same as using "git merge --no-ff" locally.

Pros:
  - Original commit maintains SHA, etc.
  - Original commit does not necessarily have to be rebased to master tip
(although if it isn't there can be merge conflicts)
Cons:
  - Master history can be a bit messy (although "git log --first-parent"
works reasonably well)
  - The commit message Gerrit creates is always:  ' *Merge "first line of
commit message"* ' which means the master commit history loses a ton of
information.

3. *Fast-forward only*. This means that the commit under review must have
the current master tip as its parent. When the commit is submitted, Gerrit
will simply fast-forward the master branch to the new commit. Basically the
same as using "git merge --ff-only" locally.

Pros:
  - Original commit maintains SHA, etc.
  - Master log history is kept very clear.
  - No possibility of merge conflicts at Gerrit time.
Cons:
  - Since you must rebase to master before submitting, you potentially lose
the ability to share your working branch with others more often than is
strictly necessary.
  - Also since you must rebase to master before submitting, you will
potentially have more submissions that need to be rebased and reproposed
than is strictly necessary.

4. *Merge if necessary*. This is basically the same as the default
behaviour of "git merge": fast-forward if possible (ie, the commit under
review's parent is the current master tip), otherwise create a merge commit.

Pros:
  - Good compromise between "keeping local branch commits shareable" and
"don't force unnecessary rebases" - possibly the least developer-intrusive
option.
Cons:
  - Master history is inconsistent, with a mix of "real" commits and merge
commits.
  - Merge commits still have the extremely abbreviated log message, which
is IMHO even worse here since some commits will have full messages and some
won't.

5. *Rebase if necessary*. This will fast-forward master if the commit under
review has the current master tip as its parent. If not, Gerrit itself will
attempt to rebase the commit onto master, and then fast-forward it onto
master. This may result in conflicts at Submit time.

Pros:
  - Less developer interaction required than "fast-forward only".
  - Master log history is kept very clear.
Cons:
  - Commits on master will sometimes be new commits, basically cherry-picks
(without the advantages of cherry-pick)


Anyway, I'm not sure any of this information is useful. There isn't a merge
strategy which is clearly better or worse than any other, and the problem
of wanting more flexibility in development branches is only tangentially
related to that anyway. But, here's the information in case anyone wants to
think about it any more.

I'm going to do a little bit more exploration with "git gerrit" to see
whether I could change anything there that would make a difference.

Ceej
aka Chris Hillery

On Thu, Sep 24, 2015 at 5:52 PM, Chris Hillery <ch...@hillery.land>
wrote:

> On Thu, Sep 24, 2015 at 11:05 AM, Jianfeng Jia <ji...@gmail.com>
> wrote:
>
>> If we can somehow enforce the rebase before merge, then I think it won’t
>> have the merge “tree” problem.
>>
>
> Actually the merge-tree problem was introduced in a situation where a part
> of a branch was cherry-picked to master. It would also happen if part of a
> branch were cherry-picked over to another branch (like you wanted to do
> with Taewoo's branch) and then both were eventually merged to master. In a
> normal scenario this wouldn't happen, with or without rebasing the feature
> branch before merge.
>
> As for whether "rebase before merge" should be enforced, I'm in favor of
> it personally. I'm even in favor of doing local interactive rebases with
> selective squashing and so forth to "clean up" local branch history before
> merging - although as always, this is destructively rewriting history and
> therefore should not be done if the branch was shared with anyone else.
>
> Ceej
> aka Chris Hillery
>

Re: Merging vs. squashing

Posted by Chris Hillery <ch...@hillery.land>.
On Thu, Sep 24, 2015 at 11:05 AM, Jianfeng Jia <ji...@gmail.com>
wrote:

> If we can somehow enforce the rebase before merge, then I think it won’t
> have the merge “tree” problem.
>

Actually the merge-tree problem was introduced in a situation where a part
of a branch was cherry-picked to master. It would also happen if part of a
branch were cherry-picked over to another branch (like you wanted to do
with Taewoo's branch) and then both were eventually merged to master. In a
normal scenario this wouldn't happen, with or without rebasing the feature
branch before merge.

As for whether "rebase before merge" should be enforced, I'm in favor of it
personally. I'm even in favor of doing local interactive rebases with
selective squashing and so forth to "clean up" local branch history before
merging - although as always, this is destructively rewriting history and
therefore should not be done if the branch was shared with anyone else.

Ceej
aka Chris Hillery

Re: Merging vs. squashing

Posted by Jianfeng Jia <ji...@gmail.com>.
If we can somehow enforce the rebase before merge, then I think it won’t have the merge “tree” problem.
> git checkout -b work master
> # commit a change on branch 'work'
> git checkout master
> git cherry-pick work
+ git checkout work
+ git rebase master
+ git checkout master
> git merge work


> On Sep 24, 2015, at 12:59 AM, Chris Hillery <ch...@hillery.land> wrote:
> 
> On Thu, Sep 24, 2015 at 12:32 AM, Jianfeng Jia <jianfeng.jia@gmail.com <ma...@gmail.com>> wrote:
> I haven’t tried yet, but here is a stackoverflow answer[1] showed that git can
> magically skip the “same” cherry-picked commit even for the rebase case. 
> Then rebase on there own branches should be fine.
> 
> [1] http://stackoverflow.com/questions/14509878/why-does-a-rebase-after-a-cherry-pick-not-apply-the-same-commit-twice <http://stackoverflow.com/questions/14509878/why-does-a-rebase-after-a-cherry-pick-not-apply-the-same-commit-twice>
> 
> Yep, I've seen this work, and I had assumed it worked for this reason. The rebase case is actually easier in a sense because git is replaying commits one at a time. But I'm glad to see this is actually a documented and intentional behaviour.
> 
> The merge case is more interesting, but it seems like it should work in most cases too:
> 
> http://stackoverflow.com/questions/14489056/does-git-cherry-pick-save-any-metadata?rq=1 <http://stackoverflow.com/questions/14489056/does-git-cherry-pick-save-any-metadata?rq=1>
> 
> I just tried the most trivial version of this:
> 
> git checkout -b work master
> # commit a change on branch 'work'
> git checkout master
> git cherry-pick work
> git merge work
> 
> That is - cherry-pick a change from a branch (the tip of the branch in this case) onto master, and then merge the branch onto master. Git handled this without conflicts, which to be honest surprised me a little bit. There are two downsides, however, both of which are caused by the fact that the cherry-picked commit is not the same commit:
> 
> - "git merge" was unable to just fast-forward master; it did introduce a merge commit onto master, which doesn't really serve any purpose
> - perhaps more annoyingly, "git log" on master now shows the change twice:
> 
> % git log --oneline -4
> 3b24414 Merge branch 'work'
> 6cbf7e3 This is a text change
> 23bfa31 This is a text change
> cc4356e CBD-1635: some initial work to run tests once a day
> 
> The reason for this is clear enough if you use git log --graph:
> 
> % git log --oneline --graph -4
> *   3b24414 Merge branch 'work'
> |\  
> | * 23bfa31 This is a text change
> * | 6cbf7e3 This is a text change
> |/  
> * cc4356e CBD-1635: some initial work to run tests once a day
> 
> This leads me to my main concern / objection to the merge approach: it's extremely easy for this to lead to truly squirrelly log histories on master. That is the main (and perhaps only) clear benefit to the current squash-and-cherry-pick approach we have - the log history on master is guaranteed to be linear, self-contained, and easily understandable. You pay a price for that orderliness, however, in terms of branching flexibility.
> 
> Ceej
> aka Chris Hillery



Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine


Re: Merging vs. squashing

Posted by Chris Hillery <ch...@hillery.land>.
On Thu, Sep 24, 2015 at 12:32 AM, Jianfeng Jia <ji...@gmail.com>
wrote:

> I haven’t tried yet, but here is a stackoverflow answer[1] showed that git
> can
> magically skip the “same” cherry-picked commit even for the rebase case.
> Then rebase on there own branches should be fine.
>
> [1]
> http://stackoverflow.com/questions/14509878/why-does-a-rebase-after-a-cherry-pick-not-apply-the-same-commit-twice
>

Yep, I've seen this work, and I had assumed it worked for this reason. The
rebase case is actually easier in a sense because git is replaying commits
one at a time. But I'm glad to see this is actually a documented and
intentional behaviour.

The merge case is more interesting, but it seems like it should work in
most cases too:

http://stackoverflow.com/questions/14489056/does-git-cherry-pick-save-any-metadata?rq=1

I just tried the most trivial version of this:

git checkout -b work master
# commit a change on branch 'work'
git checkout master
git cherry-pick work
git merge work

That is - cherry-pick a change from a branch (the tip of the branch in this
case) onto master, and then merge the branch onto master. Git handled this
without conflicts, which to be honest surprised me a little bit. There are
two downsides, however, both of which are caused by the fact that the
cherry-picked commit is not the same commit:

- "git merge" was unable to just fast-forward master; it did introduce a
merge commit onto master, which doesn't really serve any purpose
- perhaps more annoyingly, "git log" on master now shows the change twice:

% git log --oneline -4
3b24414 Merge branch 'work'
6cbf7e3 This is a text change
23bfa31 This is a text change
cc4356e CBD-1635: some initial work to run tests once a day

The reason for this is clear enough if you use git log --graph:

% git log --oneline --graph -4
*   3b24414 Merge branch 'work'
|\
| * 23bfa31 This is a text change
* | 6cbf7e3 This is a text change
|/
* cc4356e CBD-1635: some initial work to run tests once a day

This leads me to my main concern / objection to the merge approach: it's
extremely easy for this to lead to truly squirrelly log histories on
master. That is the main (and perhaps only) clear benefit to the current
squash-and-cherry-pick approach we have - the log history on master is
guaranteed to be linear, self-contained, and easily understandable. You pay
a price for that orderliness, however, in terms of branching flexibility.

Ceej
aka Chris Hillery

Re: Merging vs. squashing

Posted by Jianfeng Jia <ji...@gmail.com>.
I haven’t tried yet, but here is a stackoverflow answer[1] showed that git can
magically skip the “same” cherry-picked commit even for the rebase case. 
Then rebase on there own branches should be fine.

[1] http://stackoverflow.com/questions/14509878/why-does-a-rebase-after-a-cherry-pick-not-apply-the-same-commit-twice <http://stackoverflow.com/questions/14509878/why-does-a-rebase-after-a-cherry-pick-not-apply-the-same-commit-twice>

> On Sep 24, 2015, at 12:01 AM, Chris Hillery <ch...@hillery.land> wrote:
> 
> On Wed, Sep 23, 2015 at 11:38 PM, Chen Li <ch...@gmail.com> wrote:
> 
>> Just want to echo what Ted said: if the same commit exists in two
>> different branches, then there is no issue when we merge these two
>> branches into the master.  In fact, if branch A has all the commits of
>> branch B, after we merge branch A into the master, git will
>> automatically think branch B is already merged into the master.
> 
> 
> This is true, and one of the key benefits to the merge scenario. However, I
> feel it's important to point out again that a cherry-picked commit is *NOT*
> "the same commit" as far as git is concerned, so if you cherry-pick from A
> to B then A and B do not have the same commits. It may be that git merge
> and/or rebase can detect cherry-picks and handle them better than
> completely exploding. But if you think of them as the "same commit", you're
> giving yourself a bad understanding of what's actually going on, and that
> will eventually bite you.
> 
> (I know you didn't specifically mention cherry-picking, Chen, but in the
> context of Jianfeng's and Ted's earlier messages I think it's still a point
> that bears repeating.)
> 
> Ceej
> aka Chris Hillery



Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine


Re: Merging vs. squashing

Posted by Chris Hillery <ch...@hillery.land>.
On Wed, Sep 23, 2015 at 11:38 PM, Chen Li <ch...@gmail.com> wrote:

> Just want to echo what Ted said: if the same commit exists in two
> different branches, then there is no issue when we merge these two
> branches into the master.  In fact, if branch A has all the commits of
> branch B, after we merge branch A into the master, git will
> automatically think branch B is already merged into the master.


This is true, and one of the key benefits to the merge scenario. However, I
feel it's important to point out again that a cherry-picked commit is *NOT*
"the same commit" as far as git is concerned, so if you cherry-pick from A
to B then A and B do not have the same commits. It may be that git merge
and/or rebase can detect cherry-picks and handle them better than
completely exploding. But if you think of them as the "same commit", you're
giving yourself a bad understanding of what's actually going on, and that
will eventually bite you.

(I know you didn't specifically mention cherry-picking, Chen, but in the
context of Jianfeng's and Ted's earlier messages I think it's still a point
that bears repeating.)

Ceej
aka Chris Hillery

Re: Merging vs. squashing

Posted by Chen Li <ch...@gmail.com>.
Just want to echo what Ted said: if the same commit exists in two
different branches, then there is no issue when we merge these two
branches into the master.  In fact, if branch A has all the commits of
branch B, after we merge branch A into the master, git will
automatically think branch B is already merged into the master.

Chen

Re: Merging vs. squashing

Posted by abdullah alamoudi <ba...@gmail.com>.
Here is my less than 2c,
I am extremely happy with the current setup. I, however, have not had a
branch that is dependent on another branch before. The only thing that
bothers me for now is the fact that Hyracks is separated from AsterixDB but
I am also okay with that.

Cheers,
Abdullah.

Amoudi, Abdullah.

On Thu, Sep 24, 2015 at 1:38 AM, Chris Hillery <ch...@hillery.land>
wrote:

> P.S. I'd love it if anyone else with opinions or experience would chime in
> here... I'm pretty sure I don't have all the answers, so I don't want to
> seem like I'm trying to dictate the discussion.
>
> Ceej
> aka Chris Hillery
>
> On Wed, Sep 23, 2015 at 3:38 PM, Chris Hillery <ch...@hillery.land>
> wrote:
>
> > On Wed, Sep 23, 2015 at 10:40 AM, Jianfeng Jia <ji...@gmail.com>
> > wrote:
> >
> >> I hit another similar scenario that squash may make things harder.
> >>
> >> Now I’m working the UTF8 encoding task. Some part of work has been done
> >> in Taewoo’s branch. But his branch is a bigger change that won’t get
> into
> >> master
> >> soon. I’d like to cherry-pick several commits from his branch and then
> >> continue
> >> on my task. Then both of us won’t hit the merging conflict in future.
> >>
> >
> > That is, I believe, already not true. Cherry-pick and squashed merge have
> > basically the same effect - they create a new commit with no lineage to
> the
> > original. If you cherry-pick from his branch, then even if you merged
> yours
> > to the trunk (rather than squashing), he'd still get conflicts the next
> > time he updated.
> >
> > I will admit I'm not 100% sure I'm correct about this. I've seen some
> > evidence that git can handle a rebase when the two branches each have a
> > *single* commit which happens to contain precisely the same diffs, as
> would
> > happen with a cherry-pick that didn't require any conflict resolution of
> > its own. I'm not confident this always works, and I've never experimented
> > to see if it works on a merge rather than a rebase. I wouldn't want to
> make
> > any sweeping process decisions until at least we were sure we understood
> > what works and what doesn't.
> >
> > If we did merges all the time instead of squashes and cherry-picks, then
> > you would be able to share some of Taewoo's work if you could *merge* it
> to
> > your branch. But as you might guess, merging a couple of changes from the
> > middle of a foreign branch is quite challenging at best.
> >
> > Ceej
> > aka Chris Hillery
> >
>

Re: Merging vs. squashing

Posted by Chris Hillery <ch...@hillery.land>.
On Wed, Sep 23, 2015 at 11:54 PM, Chris Hillery <ch...@hillery.land>
wrote:

> 1. Don't rewrite history (which includes squashing or amending commits and
> force pushes) on a branch that has been shared to anyone else.
>

Should have mentioned: rebasing is also rewriting history. So if you share
a branch with someone else, you can't rebase yourself against master
anymore either.

Ceej
aka Chris Hillery

Re: Merging vs. squashing

Posted by Till Westmann <ti...@apache.org>.
I’m not sure I believe in eyeball grouping. Trying to mentally merge 2 
(or more) overlapping commits of non-trivial size is not an easy thing 
to do (at least for me :) ).

That being said, it’s still a good idea to have the JIRA tickets in 
the commit message (and we might want to enforce that) and I also think 
that there are JIRAs that con be addressed by a number of self-contained 
and more or less independent commits. And if that’s possible I 
actually think that that’s the better way. But I’d prefer not to 
have overlapping or strongly dependent changes in different commits in 
master.

My 2c,
Till

On 24 Sep 2015, at 0:42, Jianfeng Jia wrote:

> To ensure 2, IMO it’s better to enforce that every commit start with 
> a jira ticket number. Then at least we can run an eyeball groupby on 
> the commit history.
>
>> On Sep 23, 2015, at 11:54 PM, Chris Hillery <ch...@hillery.land> 
>> wrote:
>>
>> On Wed, Sep 23, 2015 at 6:29 PM, Ted Dunning <ted.dunning@gmail.com 
>> <ma...@gmail.com>> wrote:
>>
>> Cherry picking (in my experience) works very much as desired. The 
>> fact that a commit exists on two branches doesn't seem to cause any 
>> trouble at all at merge time.
>>
>> That's good to hear. I have definitely had bad experiences when 
>> dealing with cherry-picked commits, but I have not been able to pin 
>> down a specific case where they don't do what you might hope. I'm not 
>> sure how the git magic works, but it would certainly be compelling if 
>> it did work.
>>
>> Rebasing interactively is another case.  I routinely use this to make 
>> my local history more sensible. Within reason, it allows me to squash 
>> and re-order my own commits so that there appears to be more order in 
>> the historical record than was in my head at the time I did the work.
>>
>> Yes, this is a good strategy that we would definitely want to 
>> encourage if we were to stop doing the full-squash at commit time. 
>> The golden rules IMHO are:
>>
>> 1. Don't rewrite history (which includes squashing or amending 
>> commits and force pushes) on a branch that has been shared to anyone 
>> else.
>>
>> 2. Ensure that the commits which are ultimately pushed to master are 
>> sensible, self-contained, well-documented, etc.
>>
>> There's a small amount of tension between those two rules, but it can 
>> be handled.
>>
>> Ceej
>> aka Chris Hillery
>
>
>
> Best,
>
> Jianfeng Jia
> PhD Candidate of Computer Science
> University of California, Irvine

Re: Merging vs. squashing

Posted by Jianfeng Jia <ji...@gmail.com>.
To ensure 2, IMO it’s better to enforce that every commit start with a jira ticket number. Then at least we can run an eyeball groupby on the commit history.

> On Sep 23, 2015, at 11:54 PM, Chris Hillery <ch...@hillery.land> wrote:
> 
> On Wed, Sep 23, 2015 at 6:29 PM, Ted Dunning <ted.dunning@gmail.com <ma...@gmail.com>> wrote:
> 
> Cherry picking (in my experience) works very much as desired. The fact that a commit exists on two branches doesn't seem to cause any trouble at all at merge time.
> 
> That's good to hear. I have definitely had bad experiences when dealing with cherry-picked commits, but I have not been able to pin down a specific case where they don't do what you might hope. I'm not sure how the git magic works, but it would certainly be compelling if it did work.
>  
> Rebasing interactively is another case.  I routinely use this to make my local history more sensible. Within reason, it allows me to squash and re-order my own commits so that there appears to be more order in the historical record than was in my head at the time I did the work.
> 
> Yes, this is a good strategy that we would definitely want to encourage if we were to stop doing the full-squash at commit time. The golden rules IMHO are:
> 
> 1. Don't rewrite history (which includes squashing or amending commits and force pushes) on a branch that has been shared to anyone else.
> 
> 2. Ensure that the commits which are ultimately pushed to master are sensible, self-contained, well-documented, etc.
> 
> There's a small amount of tension between those two rules, but it can be handled.
> 
> Ceej
> aka Chris Hillery



Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine


Re: Merging vs. squashing

Posted by Chris Hillery <ch...@hillery.land>.
On Wed, Sep 23, 2015 at 6:29 PM, Ted Dunning <te...@gmail.com> wrote:

>
> Cherry picking (in my experience) works very much as desired. The fact
> that a commit exists on two branches doesn't seem to cause any trouble at
> all at merge time.
>

That's good to hear. I have definitely had bad experiences when dealing
with cherry-picked commits, but I have not been able to pin down a specific
case where they don't do what you might hope. I'm not sure how the git
magic works, but it would certainly be compelling if it did work.


> Rebasing interactively is another case.  I routinely use this to make my
> local history more sensible. Within reason, it allows me to squash and
> re-order my own commits so that there appears to be more order in the
> historical record than was in my head at the time I did the work.
>

Yes, this is a good strategy that we would definitely want to encourage if
we were to stop doing the full-squash at commit time. The golden rules IMHO
are:

1. Don't rewrite history (which includes squashing or amending commits and
force pushes) on a branch that has been shared to anyone else.

2. Ensure that the commits which are ultimately pushed to master are
sensible, self-contained, well-documented, etc.

There's a small amount of tension between those two rules, but it can be
handled.

Ceej
aka Chris Hillery

Re: Merging vs. squashing

Posted by Ted Dunning <te...@gmail.com>.
My experience or knowledge is hardly comprehensive, but merging branches
has most of the benefit of squash commits while retaining history.

Cherry picking (in my experience) works very much as desired. The fact that
a commit exists on two branches doesn't seem to cause any trouble at all at
merge time.  The flexibility is fairly awesome.  I would imagine that there
are counter examples where subsequent changes make the merge complex and I
have no idea what the limits are. All I can say is that in the few years I
have been working with git, I have been astounded at how well the complex
fringes work.

Rebasing interactively is another case.  I routinely use this to make my
local history more sensible. Within reason, it allows me to squash and
re-order my own commits so that there appears to be more order in the
historical record than was in my head at the time I did the work.






On Wed, Sep 23, 2015 at 6:01 PM, Till Westmann <ti...@apache.org> wrote:

> I’m certainly not an expert on git as I never took the time (or needed) to
> dive deeply into it.
>
> Reading this thread it seems to me that we have a number of reasonable
> wishes what we would like to do (e.g. squash commits to get a readable
> history for the master branch, cherry-pick between branches to avoid
> duplicate work, …) but it seems that there’s no obvious way to achieve all
> of them.
>
> If that’s indeed the case, we’d have to decide which wish is more
> important and I think that different people will have different opinions on
> this. Since I’m a big fan of readable history (and seeing the result of a
> code review in the master history as opposed to a number of intermediate
> steps), I’m pretty happy with the current state of the world.
>
> Given the problem of having a subset of code that could benefit 2 branches
> I would try to separate it out, review and merge it to master, and to merge
> master back into the 2 branches. However, I see that this could be a lot of
> work and there could be reasons why is is not feasible.
>
> Just my 2c (to have more people chiming in :) )
> Till
>
> On 23 Sep 2015, at 15:38, Chris Hillery wrote:
>
> P.S. I'd love it if anyone else with opinions or experience would chime in
>> here... I'm pretty sure I don't have all the answers, so I don't want to
>> seem like I'm trying to dictate the discussion.
>>
>> Ceej
>> aka Chris Hillery
>>
>> On Wed, Sep 23, 2015 at 3:38 PM, Chris Hillery <ch...@hillery.land>
>> wrote:
>>
>> On Wed, Sep 23, 2015 at 10:40 AM, Jianfeng Jia <ji...@gmail.com>
>>> wrote:
>>>
>>> I hit another similar scenario that squash may make things harder.
>>>>
>>>> Now I’m working the UTF8 encoding task. Some part of work has been done
>>>> in Taewoo’s branch. But his branch is a bigger change that won’t get
>>>> into
>>>> master
>>>> soon. I’d like to cherry-pick several commits from his branch and then
>>>> continue
>>>> on my task. Then both of us won’t hit the merging conflict in future.
>>>>
>>>>
>>> That is, I believe, already not true. Cherry-pick and squashed merge have
>>> basically the same effect - they create a new commit with no lineage to
>>> the
>>> original. If you cherry-pick from his branch, then even if you merged
>>> yours
>>> to the trunk (rather than squashing), he'd still get conflicts the next
>>> time he updated.
>>>
>>> I will admit I'm not 100% sure I'm correct about this. I've seen some
>>> evidence that git can handle a rebase when the two branches each have a
>>> *single* commit which happens to contain precisely the same diffs, as
>>> would
>>> happen with a cherry-pick that didn't require any conflict resolution of
>>> its own. I'm not confident this always works, and I've never experimented
>>> to see if it works on a merge rather than a rebase. I wouldn't want to
>>> make
>>> any sweeping process decisions until at least we were sure we understood
>>> what works and what doesn't.
>>>
>>> If we did merges all the time instead of squashes and cherry-picks, then
>>> you would be able to share some of Taewoo's work if you could *merge* it
>>> to
>>> your branch. But as you might guess, merging a couple of changes from the
>>> middle of a foreign branch is quite challenging at best.
>>>
>>> Ceej
>>> aka Chris Hillery
>>>
>>>

Re: Merging vs. squashing

Posted by Till Westmann <ti...@apache.org>.
I’m certainly not an expert on git as I never took the time (or 
needed) to dive deeply into it.

Reading this thread it seems to me that we have a number of reasonable 
wishes what we would like to do (e.g. squash commits to get a readable 
history for the master branch, cherry-pick between branches to avoid 
duplicate work, …) but it seems that there’s no obvious way to 
achieve all of them.

If that’s indeed the case, we’d have to decide which wish is more 
important and I think that different people will have different opinions 
on this. Since I’m a big fan of readable history (and seeing the 
result of a code review in the master history as opposed to a number of 
intermediate steps), I’m pretty happy with the current state of the 
world.

Given the problem of having a subset of code that could benefit 2 
branches I would try to separate it out, review and merge it to master, 
and to merge master back into the 2 branches. However, I see that this 
could be a lot of work and there could be reasons why is is not 
feasible.

Just my 2c (to have more people chiming in :) )
Till

On 23 Sep 2015, at 15:38, Chris Hillery wrote:

> P.S. I'd love it if anyone else with opinions or experience would 
> chime in
> here... I'm pretty sure I don't have all the answers, so I don't want 
> to
> seem like I'm trying to dictate the discussion.
>
> Ceej
> aka Chris Hillery
>
> On Wed, Sep 23, 2015 at 3:38 PM, Chris Hillery <ch...@hillery.land>
> wrote:
>
>> On Wed, Sep 23, 2015 at 10:40 AM, Jianfeng Jia 
>> <ji...@gmail.com>
>> wrote:
>>
>>> I hit another similar scenario that squash may make things harder.
>>>
>>> Now I’m working the UTF8 encoding task. Some part of work has been 
>>> done
>>> in Taewoo’s branch. But his branch is a bigger change that won’t 
>>> get into
>>> master
>>> soon. I’d like to cherry-pick several commits from his branch and 
>>> then
>>> continue
>>> on my task. Then both of us won’t hit the merging conflict in 
>>> future.
>>>
>>
>> That is, I believe, already not true. Cherry-pick and squashed merge 
>> have
>> basically the same effect - they create a new commit with no lineage 
>> to the
>> original. If you cherry-pick from his branch, then even if you merged 
>> yours
>> to the trunk (rather than squashing), he'd still get conflicts the 
>> next
>> time he updated.
>>
>> I will admit I'm not 100% sure I'm correct about this. I've seen some
>> evidence that git can handle a rebase when the two branches each have 
>> a
>> *single* commit which happens to contain precisely the same diffs, as 
>> would
>> happen with a cherry-pick that didn't require any conflict resolution 
>> of
>> its own. I'm not confident this always works, and I've never 
>> experimented
>> to see if it works on a merge rather than a rebase. I wouldn't want 
>> to make
>> any sweeping process decisions until at least we were sure we 
>> understood
>> what works and what doesn't.
>>
>> If we did merges all the time instead of squashes and cherry-picks, 
>> then
>> you would be able to share some of Taewoo's work if you could *merge* 
>> it to
>> your branch. But as you might guess, merging a couple of changes from 
>> the
>> middle of a foreign branch is quite challenging at best.
>>
>> Ceej
>> aka Chris Hillery
>>

Re: Merging vs. squashing

Posted by Chris Hillery <ch...@hillery.land>.
P.S. I'd love it if anyone else with opinions or experience would chime in
here... I'm pretty sure I don't have all the answers, so I don't want to
seem like I'm trying to dictate the discussion.

Ceej
aka Chris Hillery

On Wed, Sep 23, 2015 at 3:38 PM, Chris Hillery <ch...@hillery.land>
wrote:

> On Wed, Sep 23, 2015 at 10:40 AM, Jianfeng Jia <ji...@gmail.com>
> wrote:
>
>> I hit another similar scenario that squash may make things harder.
>>
>> Now I’m working the UTF8 encoding task. Some part of work has been done
>> in Taewoo’s branch. But his branch is a bigger change that won’t get into
>> master
>> soon. I’d like to cherry-pick several commits from his branch and then
>> continue
>> on my task. Then both of us won’t hit the merging conflict in future.
>>
>
> That is, I believe, already not true. Cherry-pick and squashed merge have
> basically the same effect - they create a new commit with no lineage to the
> original. If you cherry-pick from his branch, then even if you merged yours
> to the trunk (rather than squashing), he'd still get conflicts the next
> time he updated.
>
> I will admit I'm not 100% sure I'm correct about this. I've seen some
> evidence that git can handle a rebase when the two branches each have a
> *single* commit which happens to contain precisely the same diffs, as would
> happen with a cherry-pick that didn't require any conflict resolution of
> its own. I'm not confident this always works, and I've never experimented
> to see if it works on a merge rather than a rebase. I wouldn't want to make
> any sweeping process decisions until at least we were sure we understood
> what works and what doesn't.
>
> If we did merges all the time instead of squashes and cherry-picks, then
> you would be able to share some of Taewoo's work if you could *merge* it to
> your branch. But as you might guess, merging a couple of changes from the
> middle of a foreign branch is quite challenging at best.
>
> Ceej
> aka Chris Hillery
>

Re: Merging vs. squashing

Posted by Chris Hillery <ch...@hillery.land>.
On Wed, Sep 23, 2015 at 10:40 AM, Jianfeng Jia <ji...@gmail.com>
wrote:

> I hit another similar scenario that squash may make things harder.
>
> Now I’m working the UTF8 encoding task. Some part of work has been done
> in Taewoo’s branch. But his branch is a bigger change that won’t get into
> master
> soon. I’d like to cherry-pick several commits from his branch and then
> continue
> on my task. Then both of us won’t hit the merging conflict in future.
>

That is, I believe, already not true. Cherry-pick and squashed merge have
basically the same effect - they create a new commit with no lineage to the
original. If you cherry-pick from his branch, then even if you merged yours
to the trunk (rather than squashing), he'd still get conflicts the next
time he updated.

I will admit I'm not 100% sure I'm correct about this. I've seen some
evidence that git can handle a rebase when the two branches each have a
*single* commit which happens to contain precisely the same diffs, as would
happen with a cherry-pick that didn't require any conflict resolution of
its own. I'm not confident this always works, and I've never experimented
to see if it works on a merge rather than a rebase. I wouldn't want to make
any sweeping process decisions until at least we were sure we understood
what works and what doesn't.

If we did merges all the time instead of squashes and cherry-picks, then
you would be able to share some of Taewoo's work if you could *merge* it to
your branch. But as you might guess, merging a couple of changes from the
middle of a foreign branch is quite challenging at best.

Ceej
aka Chris Hillery

Re: Merging vs. squashing

Posted by Jianfeng Jia <ji...@gmail.com>.
I hit another similar scenario that squash may make things harder.

Now I’m working the UTF8 encoding task. Some part of work has been done
in Taewoo’s branch. But his branch is a bigger change that won’t get into master
soon. I’d like to cherry-pick several commits from his branch and then continue
on my task. Then both of us won’t hit the merging conflict in future. 

Any suggestion for this kind of problem under current git workflow?

> On Sep 21, 2015, at 1:29 AM, Chris Hillery <ch...@hillery.land> wrote:
> 
> On Fri, Sep 18, 2015 at 8:47 AM, Chen Li <ch...@gmail.com> wrote:
> 
>> I assume the conclusion is: we keep our current git practice.  Right?
>> 
> 
> I didn't mean to assert this. I just wanted to share the history and the
> Gerrit limitations. There's no question that things are the way they are
> today due in part to my personal prejudices, but mine is definitely not the
> only voice that matters.
> 
> 
>> Based on this practice, is there any easy way for people like Jianfeng
>> to make their merge into their branch simpler?
> 
> 
> I'm afraid I don't think there is. If you are working on multiple feature
> branches simultaneously, that shouldn't be directly impacted by this. But
> if you are trying to maintain several *dependent* branches locally (ie, you
> create branch "foo" from your local master, and then create branch "bar"
> from "foo") and then merge them independently to Gerrit, things are going
> to go badly sideways for you.
> 
> The only way I can think that this working method would be functional would
> be if we merged all feature branches to actual branches on Gerrit, and used
> only merge commits when merging onto master. To be honest I'm not sure I
> could easily enumerate the pros and cons of this approach.
> 
> 
>> I think Young-Seok is
>> doing experiencing something similar to merge the master into his
>> one-year-behind geo branch.
> 
> 
> For what it's worth, I don't think any different merging strategy would
> have made much difference in that case. When you fall that far behind
> master, it's very likely going to be a big job to catch up.
> 
> Ceej
> aka Chris Hillery



Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine


Re: Merging vs. squashing

Posted by Chris Hillery <ch...@hillery.land>.
On Fri, Sep 18, 2015 at 8:47 AM, Chen Li <ch...@gmail.com> wrote:

> I assume the conclusion is: we keep our current git practice.  Right?
>

I didn't mean to assert this. I just wanted to share the history and the
Gerrit limitations. There's no question that things are the way they are
today due in part to my personal prejudices, but mine is definitely not the
only voice that matters.


> Based on this practice, is there any easy way for people like Jianfeng
> to make their merge into their branch simpler?


I'm afraid I don't think there is. If you are working on multiple feature
branches simultaneously, that shouldn't be directly impacted by this. But
if you are trying to maintain several *dependent* branches locally (ie, you
create branch "foo" from your local master, and then create branch "bar"
from "foo") and then merge them independently to Gerrit, things are going
to go badly sideways for you.

The only way I can think that this working method would be functional would
be if we merged all feature branches to actual branches on Gerrit, and used
only merge commits when merging onto master. To be honest I'm not sure I
could easily enumerate the pros and cons of this approach.


> I think Young-Seok is
> doing experiencing something similar to merge the master into his
> one-year-behind geo branch.


For what it's worth, I don't think any different merging strategy would
have made much difference in that case. When you fall that far behind
master, it's very likely going to be a big job to catch up.

Ceej
aka Chris Hillery

Re: Merging vs. squashing

Posted by Chen Li <ch...@gmail.com>.
I assume the conclusion is: we keep our current git practice.  Right?

Based on this practice, is there any easy way for people like Jianfeng
to make their merge into their branch simpler?  I think Young-Seok is
doing experiencing something similar to merge the master into his
one-year-behind geo branch.

Chen

On Fri, Sep 18, 2015 at 1:04 AM, Chris "Ceej" Hillery
<ce...@couchbase.com> wrote:
> I'm pulling this conversation into a separate thread from the "Commit
> message proposal".
>
> There are certainly long-running and contentious arguments about whether
> commits to master should be merges or squashes/cherry-picks. That ranks
> right up there with vi vs. emacs. Before going too far into that, however,
> you should be aware of some additional quirks/limitations that Gerrit
> brings into the story.
>
> First, we have Gerrit configured to always cherry-pick changes when
> submitting them. We do this for a couple of reasons, but the main one is
> that Gerrit gives you a little additional value in this case: It introduces
> Reviewed-on: and Reviewed-by: headers into the commit, which provide you
> with a link back to the Gerrit review. For instance,
> https://github.com/apache/incubator-asterixdb-hyracks/commit/d885779b13c4fd9bd551d306a460df2894fb18cb
> - see that there is a hyperlink there back to Gerrit. This is not possible
> if you use any merge strategy other than cherry-pick, because only
> cherry-picking creates a *new* commit object that Gerrit can decorate.
>
> Moving on - In Gerrit, it is not possible to review a "branch merge", at
> least not in the way you would hope. A review in Gerrit is always of a
> *single commit*. If you make a number of commits on your local working
> branch and then push that branch to Gerrit's refs/for/master, it will
> create one review for each commit. This is occasionally desirable for
> larger changes that can reasonably be broken up, but as a default working
> method it is usually more trouble than it's worth - especially because
> Gerrit won't prevent you from Submitting a subset of the changes or merging
> them out of order. This is the main reason that I implemented git-gerrit to
> do a squash before uploading a change to Gerrit.
>
> What if you make changes on a branch, then merge them to your local master
> and then propose just the merge commit to Gerrit? That's just one commit
> (the merge commit) and therefore one review, right? Well, no - that would
> only work if all of your branch commits were already known to Gerrit, say
> because they were submitted (one at a time) to a non-master branch.
> Otherwise, you'll get N+1 separate reviews on Gerrit, and the merge commit
> review won't even make sense anymore.
>
> It *is* possible to do this "right" with Gerrit if you want to get merge
> commits onto the master branch. You need to create a secondary branch,
> "unstable" or "testing" or similar, and propose your individual changes
> there. We could even create multiple of those branches for individual
> feature work if we wanted. Then once that branch is in a state that you
> want to merge to master, you can propose a single merge commit change to
> Gerrit. The biggest downside to this is that when reviewing a merge commit,
> Gerrit does NOT show you any information about the changes being introduced
> into the target branch. It only shows you any diffs which were introduced
> during conflict resolution of the merge. Also, merge commits cannot be
> cherry-picked, so you lose the Reviewed-on: header for the merge itself.
> These aren't necessarily fatal limitations - we have a couple teams in
> Couchbase using exactly this methodology with success.
>
> Anyway. The upshot is, handling git history is a mess, and Gerrit makes it
> messier. The "always squash branches, always cherry-pick changes" approach
> that we currently use for AsterixDB seems to me to be the least available
> evil, and it at least results in a clean if abbreviated commit history on
> master. It definitely has its downsides as well, though - Jianfeng's
> experience is one such problem, and it doesn't have a clean solution.
>
> Ceej
> aka Chris Hillery