You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Zoltan Ivanfi <zi...@cloudera.com.INVALID> on 2018/10/18 15:46:29 UTC

Merge strategy for feature branches

Dear Parquet Developers,

On yesterday's Parquet Sync we discussed merging vs. rebasing vs.
squashing again. (Please note that I use the word "merge" both in the
sense of incorporating a change or feature branch in the main branch
as well as one specific way of doing so: using merge commits. This
unfortunate overloading of "merge" comes from github's terminology.)

Ryan presented a new argument against merging that we haven't
considered before: reverting behaves strangely with merge commits.
Combined with the chaotic way in which git and github present merges
in their log/history views, this makes merging much less compelling
for handling feature branches.

We considered rebasing as well, but thinking further about it after
the sync we came to the conclusion that it is not feasible for feature
branch development due to several reasons:

- Resolving conflicts with rebasing is a lot more difficult if there
are multiple conflicting commits and/or if the conflicting upstream
commit would have required substantial changes in the feature branch
from the beginning.
- Rebasing leads to rewriting history, which goes against the every
change is reviewed on the feature branch approach.
- Rebasing the feature branch leads to loss of review comments on
existing and merged pull requests, since those are associated with
specific git hashes.
- The feature branch can only be rebased by a committer.

Based on these considerations, we deemed both merging and rebasing
unfeasible for being used as a feature branch merging strategy. As a
compromise, we settled on Ryan's original suggestion: squashing. This
is also the regular way of merging individual pull requests that are
not developed on feature branches. This practice can easily be adapted
to feature branches as follows:

- The feature branch is handled similar to the master branch:
developers open pull requests, committers review, approve and merge
them by squashing each pull request into a single commit on the
feature branch.
- The feature branch developers may occasionally merge the master
branch into the feature branch. (This needs some extra manual work
from the committer, since merges are disabled on the github UI on
purpose.)
- When the feature branch is ready, its developers open a pull request
for merging it into master. Committers should merge such a pull
request by squashing it but mentioning the feature branch name and
keeping the list of commits in the commit message at the same time.
This allows interested parties to trace back the history of affected
code lines by consulting the feature branch history and/or pull
request reviews.

You can see an example of such a commit (the first one and as such the
only one so far) here:
https://github.com/apache/parquet-mr/commit/e7db9e20f52c925a207ea62d6dda6dc4e870294e

Please let us know any concerns or suggestions you may have to improve
this practice even further.

Thanks,

Zoltan

Re: Merge strategy for feature branches

Posted by Zoltan Ivanfi <zi...@cloudera.com.INVALID>.
Hi,

On Fri, Oct 19, 2018 at 1:46 AM Ryan Blue <rb...@netflix.com.invalid> wrote:

> I find that it’s easier to rebase because it requires fixing individual commits that are smaller.

Our personal experience with this feature branch was that the
codebases really diverged and conflict resolution with rebasing was
not feasible. Some of the upstream changes required modifications that
we could add as a new commit with merging, but would have had to
introduce retroactively from the very first commit with rebasing.

>> Rebasing leads to rewriting history, which goes against the every change is
>> reviewed on the feature branch approach.
>
> Again, reviews should validate tests that can be used here. The squashing
> approach that we’ve used for a long time relies on the same principle.

Squashing changes history as well indeed, but this only affects the
number and message of commits. The code itself does not change. With a
rebase, code also changes and in such a way that we have no way of
differentiating what has already been reviewed and what is new. I
don't think that passing unit tests are a good enough substitute for
reviewing the conflict resolution, especially when it involves
extensive changes, as it required in our case.

> Rebasing the feature branch leads to loss of review comments on existing
> and merged pull requests, since those are associated with specific git
> hashes.
>
> Comments should be attached to PRs, not hashes.

Github associates each review comment with a specific line in a
specific commit. When someone force pushes a rebase onto their branch
that serves as the basis of the pull request, the earlier comments are
left without context as the code changes they were attached to are no
longer available.

Br,

Zoltan

Re: Merge strategy for feature branches

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
Squashing sounds reasonable to me, although I don’t really mind the
rebasing problems. Individual commits should be small and well isolated, so
I don’t think it’s going to be a big problem in practice. I think you have
overstated many of the issues:

Resolving conflicts with rebasing is a lot more difficult if there are
multiple conflicting commits and/or if the conflicting upstream commit
would have required substantial changes in the feature branch from the
beginning.

This is a bad situation in general, but I find that it’s easier to rebase
because it requires fixing individual commits that are smaller. And because
those commits were already reviewed, they should have test cases to verify
the rebase changes.

Rebasing leads to rewriting history, which goes against the every change is
reviewed on the feature branch approach.

Again, reviews should validate tests that can be used here. The squashing
approach that we’ve used for a long time relies on the same principle.

Rebasing the feature branch leads to loss of review comments on existing
and merged pull requests, since those are associated with specific git
hashes.

Comments should be attached to PRs, not hashes.

The feature branch can only be rebased by a committer.

Anyone can create a rebased branch, it’s just committers that can update
the feature branch in the Apache repository. I think that’s reasonable.

On Thu, Oct 18, 2018 at 8:47 AM Zoltan Ivanfi <zi...@cloudera.com.invalid>
wrote:

> Dear Parquet Developers,
>
> On yesterday's Parquet Sync we discussed merging vs. rebasing vs.
> squashing again. (Please note that I use the word "merge" both in the
> sense of incorporating a change or feature branch in the main branch
> as well as one specific way of doing so: using merge commits. This
> unfortunate overloading of "merge" comes from github's terminology.)
>
> Ryan presented a new argument against merging that we haven't
> considered before: reverting behaves strangely with merge commits.
> Combined with the chaotic way in which git and github present merges
> in their log/history views, this makes merging much less compelling
> for handling feature branches.
>
> We considered rebasing as well, but thinking further about it after
> the sync we came to the conclusion that it is not feasible for feature
> branch development due to several reasons:
>
> - Resolving conflicts with rebasing is a lot more difficult if there
> are multiple conflicting commits and/or if the conflicting upstream
> commit would have required substantial changes in the feature branch
> from the beginning.
> - Rebasing leads to rewriting history, which goes against the every
> change is reviewed on the feature branch approach.
> - Rebasing the feature branch leads to loss of review comments on
> existing and merged pull requests, since those are associated with
> specific git hashes.
> - The feature branch can only be rebased by a committer.
>
> Based on these considerations, we deemed both merging and rebasing
> unfeasible for being used as a feature branch merging strategy. As a
> compromise, we settled on Ryan's original suggestion: squashing. This
> is also the regular way of merging individual pull requests that are
> not developed on feature branches. This practice can easily be adapted
> to feature branches as follows:
>
> - The feature branch is handled similar to the master branch:
> developers open pull requests, committers review, approve and merge
> them by squashing each pull request into a single commit on the
> feature branch.
> - The feature branch developers may occasionally merge the master
> branch into the feature branch. (This needs some extra manual work
> from the committer, since merges are disabled on the github UI on
> purpose.)
> - When the feature branch is ready, its developers open a pull request
> for merging it into master. Committers should merge such a pull
> request by squashing it but mentioning the feature branch name and
> keeping the list of commits in the commit message at the same time.
> This allows interested parties to trace back the history of affected
> code lines by consulting the feature branch history and/or pull
> request reviews.
>
> You can see an example of such a commit (the first one and as such the
> only one so far) here:
>
> https://github.com/apache/parquet-mr/commit/e7db9e20f52c925a207ea62d6dda6dc4e870294e
>
> Please let us know any concerns or suggestions you may have to improve
> this practice even further.
>
> Thanks,
>
> Zoltan
>


-- 
Ryan Blue
Software Engineer
Netflix