You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Andy Singleton <an...@assembla.com> on 2011/07/18 00:03:42 UTC

Fixing merge - Subtree, Cyclic, and Tree Change cases


To start the discussion, I will refer to this blog article by Mark Phippard:

http://blogs.collab.net/subversion/2008/07/subversion-merg/

I found the article to be a good overview of the issues. I think that we 
need help from Mark. On the other hand, I have seen that Mark sometimes 
makes discouraging comments. My work is apparently "hand wavey" and 
"proprietary". I'm used to this treatment because I have 25 developers 
who work for me who often think that I am full of crap. However, it 
might have a discouraging effect on other contributors. For example, you 
can see in this great ticket thread - 
http://subversion.tigris.org/issues/show_bug.cgi?id=2897 - he states "I 
do not think it is possible in this design....I think we need to accept 
the limitations of the current design and work towards doing the best we 
can within that design" Apparently that was enough to kill progress. I 
think we should keep a more open mind going forward.

I'm going to make some claims that some problems have "straightforward" 
solutions. That doesn't mean they are simple solutions. Handling all of 
the merge cases is going to be hard. However, they are straightforward 
in the sense that we can discuss the strategy at the high level used in 
Mark's article.

Let's consider three issues: Subtree merginfo, cyclic merge, and tree 
change operations

SUBTREE MERGINFO

Mark notes that reintegrate does not work if you have subtree merginfo. 
The subtrees potentially make the top-level mergeinfo inaccurate. So, 
basically everyone that has looked at merge problems in the past four 
years, including Mark, has tried to get rid of subtree merginfo. It's 
amazing that Subversion still tries to support this feature. It can't be 
supported in NewMerge.

In the following sections, we will also see that the merginfo data is 
too sparse, and we need to replace it with something bigger and more 
extensible.

CYCLIC MERGE

The case where we merge back and forth between a development or 
deployment branch, and trunk, is the base case for merge. It should be 
supported. Subversion only supports it with special instructions. This 
is the "cyclic merge" problem.

It seems that we have two basic ways to do a merge. We can grab all of 
the changes that we are trying to merge in one big diff between the 
branch we are merging from and the branch we are merging into - the 
reintegrate merge as described in Mark's article. Or, we can 
sequentially apply or "replay" each of the changes that we want to merge 
into our working copy - the "recursive" strategy that is the default for 
git.

It seems to me that the "one big diff" and the replay strategy are 
closely related. When you are replaying, you grab all of the changes in 
any sequence of revisions that doesn't include a merge as one big diff. 
So, the current "one big diff" strategy is a special case of the replay 
strategy that applies when there are no intermediate merges from other 
branches or cherrypicks.

But wait! According to this article, we can't use the replay strategy 
because we are missing part of the replay. We lose information that was 
used to resolve a merge when composing merge commits. If we had that 
information, we could replay individual merges, and handle a higher 
percentage of the cyclic merge cases.

This problem seems to have a straightforward solution. When we commit 
the merge, we can stuff the changeset that represents the difference 
between the merge, and the commit, into the merge_history. We just need 
an extensible merge_history format to hold it.

It's totally not clear to me why you need to say "reintegrate" when you 
merge to trunk, and why you need to update the branch after you do a 
reintegrate merge from it. The computer should be able to remember the 
history of merges and it should be obvious which things have been merged 
and which revisions have been committed on both branches. The only 
reason that I can think if is that that the mergeinfo is so sparse that 
the computer doesn't remember enough about the merge history. Would a 
bigger and more extensible data format give us a straightforward way to 
solve that problem?

TREE CHANGE

We can identify tree changes by pattern matching. This is the same 
tactic that git uses, without any other tree change tracking. We can 
identify when this match is successful because the match is applied, 
examined by the merger, and then the merge is committed. In this case we 
could write the tree map into the merge_history so that we can map 
changes bi-directionally during future merges without guessing again. 
This is another case of saving information that we need to replay a merge.

I think we could get a similar effect by generating a move operation 
(normal copy & delete form) as part of the merge. I think that this 
mapping would need to be done by updates as well as by explicit merges.


EXPERTISE
Who on this list knows enough about the core algorithm used in merge to 
critique these suggestions and point to places in the code or documentation?

Re: Fixing merge - Subtree, Cyclic, and Tree Change cases

Posted by Folker Schamel <sc...@spinor.com>.

> I do not think that the reverse merge strategy is the same as doing
> multiple forward merges, and I do not think it covers enough cases. For
> example, In your original example, you proposed merging in to branch
> Feature, where
> Feature = F+(R+S+RSFix)
> You proposed "unmerging" (or even reverting) to get F, then merging in
> all of the potentially redundant changes.
>
> Let's consider the case where we make a new change F2, so
> Feature = F+(R+S+RSFix)+F2
>
> Now, F2 is an edit on one of files or code blocks that is added or
> changed in R or S. This is a high-probability case, because people are
> most likely to change new code. In this case, there is no way to reverse
> merge and still keep F2. However, we can forward merge T and RSTFix, and
> still keep F2.

We don't have to keep F2. For example, in your case we can also
reverse merge (R+S+RSfix)+F2, then merge whole trunk, then merge F2
again.

(Indeed, in general, it seems that your sample is a good reason
for not using a minimal reverse merge as I had in mind originally,
but instead to reverse merge always the whole continues block backwards
as far as needed, including F2, merge the other line, and then merge
changes we had merged out too much like F2 back in again.)

For sure there are many variants of how to un-merge and merge which
revisions in which order, giving the same result if there are no
conflicts, but possibly differing significantly regarding the
probability of creating conflicts.
But my main point is:
Clean merging is possible using un-merges.
And, by choosing a good merge order, I still think with the same 
non-conflict-quality as when storing some fixes separately.

The reason why I'm not really convinced with this fix storing is:
What exactly is RSTFix? How is it stored? Can the user display it
somehow? How can we forward merge just T and then RSTFix, if merging
T already causes a conflict, which was the reason for RSTFix?
I guess you aleady thought about all this and found solutions.
It just sounds complex to me, introducing a new elementary concept of
fixes which must be calculated, stored, managed, and applied.
Basically implementing a new logic of changes and merging
in parallel to the existing merging logic in Subversion.
And I think this is not necessary.

>
> On 7/19/2011 3:47 PM, Folker Schamel wrote:
>>> On 7/18/2011 4:37 PM, Folker Schamel wrote:
>>>> Hi Andy,
>>>>
>>>> two thoughts about cyclic merges:
>>>>
>>>> 1. Merging should not skip cyclic merges (like this old
>>>> svnmerge tool), but must subtract (reverse-merge) the original
>>>> change first, and then add (merge) the cyclic merge, in order
>>>> to not loose adaptions of changes.
>>> I made a different proposal to solve the same problem. Following your
>>> example, let's say we are merging
>>> Trunk = (R+S+RSFix) + (T + RSTfix)
>>> --- where RSTFix is changes to resolve a merge conflict
>>> into Feature = (R+S+RSFix) + F
>>>
>>> In your proposal, you "unmerge" (R+S+RSFix) to get F. Then, having
>>> separated the stuff that is duplicate from the stuff that is new, you
>>> can do the "one big diff" style merge from Trunk.
>>>
>>> In my proposal, we save RSTfix in our expandable merge_history file, and
>>> then we can in many cases apply T and RSTFix separately, without any
>>> duplicates.
>>>
>>> Do you think that might be easier?
>>
>> At the end also your proposal requires a reverse-merge to calculate
>> RSTfix. So the difference is basically whether to calculate RSTfix
>> on the fly implicitly when needed, or in advance and store it.
>> Which one is easier and/or faster - good question.
>>
>> The idea behind the on-the-fly reverse-merge approach is
>> a) to operate purely on existing revisions (no need to store changes
>> like RSFix separately), and
>> b) (at least in theory) a simple merge algorithm, which basically
>> just says: "Merge everything over, but reverse-merge existing old
>> changesets before", solving this RSTfix adaption issue on the fly
>> automatically implicitly in a robust way, without having to deal
>> with adaptions like RSTfix explicitly (at least in theory).
>> See http://svn.haxx.se/dev/archive-2007-12/0137.shtml
>> (Note that this algorithm assumes "correct" merge info,
>> not the current subversion merge info.)
>>
>> Cheers,
>> Folker
>>
>>>> For example, suppose you have two branches A and B.
>>>> c100 is a change in A.
>>>> c101 is a change in B.
>>>> B merges c1 into B (maybe with or without conflict),
>>>> but has to adapt this change to get it compatible with c101,
>>>> resulting into c102.
>>>> Now, A merges all changes from B to A.
>>>> Then just merging c101 would loose the adaptions made in c102.
>>>> So the correct behavior is to subtract c100 and then add c101 and c102.
>>>> Note that if the changesets are not overlapping, the order
>>>> of the reverse-merges and merges does not matter.
>>>> But if the changesets are overlapping, then the correct of
>>>> reverse-merges and merges can matter.
>>>>
>>>> 2. Supporting cyclic merges correctly requires that merge-info
>>>> only records the direct merge info without carrying over
>>>> existing merge info.
>>>>
>>>> See for example
>>>> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=948427
>>>>
>>>>
>>> I completely agree that "svn newmerge" should be able to handle the case
>>> that you posted in that message.
>>>
>>> I also agree that some of the problems in this merge case come from the
>>> inadequate data in merginfo. As you point out, we can read mergeinfo
>>> carefully, and we don't even know the common ancestor of two branches
>>> being merged. If you know the common ancestor - which is often not that
>>> far back in this workflow - you can ignore everything before that point.
>>> Why isn't there a record dropped in with every merge that says "We
>>> merged X (server + branch + revision) and (all of this other merge
>>> history from there) at time Y"? This would get dragged into the next
>>> branch it gets merged with. You could read back through the merge
>>> tree/graph to find common ancestors. We could be saving those records in
>>> a new and expanded merge_history.
>>>
>>>>
>>>> Cheers,
>>>> Folker
>>>>
>>>>> To start the discussion, I will refer to this blog article by Mark
>>>>> Phippard:
>>>>>
>>>>> http://blogs.collab.net/subversion/2008/07/subversion-merg/
>>>>>
>>>>> I found the article to be a good overview of the issues.I think
>>>>> that we
>>>>> need help from Mark.On the other hand, I have seen that Mark sometimes
>>>>> makes discouraging comments. My work is apparently “hand wavey” and
>>>>> “proprietary”.I’m used to this treatment because I have 25 developers
>>>>> who work for me who often think that I am full of crap.However, it
>>>>> might
>>>>> have a discouraging effect on other contributors.For example, you can
>>>>> see in this great ticket thread -
>>>>> http://subversion.tigris.org/issues/show_bug.cgi?id=2897 - he
>>>>> states "I
>>>>> do not think it is possible in this design....I think we need to
>>>>> accept
>>>>> the limitations of the current design and work towards doing the
>>>>> best we
>>>>> can within that design” Apparently that was enough to kill progress.I
>>>>> think we should keep a more open mind going forward.
>>>>>
>>>>> I’m going to make some claims that some problems have
>>>>> “straightforward”
>>>>> solutions.That doesn’t mean they are simple solutions.Handling all of
>>>>> the merge cases is going to be hard.However, they are
>>>>> straightforward in
>>>>> the sense that we can discuss the strategy at the high level used in
>>>>> Mark’s article.
>>>>>
>>>>> Let’s consider three issues:Subtree merginfo, cyclic merge, and tree
>>>>> change operations
>>>>>
>>>>> SUBTREE MERGINFO
>>>>>
>>>>> Mark notes that reintegrate does not work if you have subtree
>>>>> merginfo.
>>>>> The subtrees potentially make the top-level mergeinfo inaccurate.So,
>>>>> basically everyone that has looked at merge problems in the past four
>>>>> years, including Mark, has tried to get rid of subtree merginfo.It’s
>>>>> amazing that Subversion still tries to support this feature.It
>>>>> can’t be
>>>>> supported in NewMerge.
>>>>>
>>>>> In the following sections, we will also see that the merginfo data is
>>>>> too sparse, and we need to replace it with something bigger and more
>>>>> extensible.
>>>>>
>>>>> CYCLIC MERGE
>>>>>
>>>>> The case where we merge back and forth between a development or
>>>>> deployment branch, and trunk, is the base case for merge.It should be
>>>>> supported.Subversion only supports it with special
>>>>> instructions.This is
>>>>> the “cyclic merge” problem.
>>>>>
>>>>> It seems that we have two basic ways to do a merge.We can grab all of
>>>>> the changes that we are trying to merge in one big diff between the
>>>>> branch we are merging from and the branch we are merging into - the
>>>>> reintegrate merge as described in Mark’s article.Or, we can
>>>>> sequentially
>>>>> apply or “replay” each of the changes that we want to merge into our
>>>>> working copy - the “recursive” strategy that is the default for git.
>>>>>
>>>>> It seems to me that the “one big diff” and the replay strategy are
>>>>> closely related.When you are replaying, you grab all of the changes in
>>>>> any sequence of revisions that doesn’t include a merge as one big
>>>>> diff.So, the current “one big diff” strategy is a special case of the
>>>>> replay strategy that applies when there are no intermediate merges
>>>>> from
>>>>> other branches or cherrypicks.
>>>>>
>>>>> But wait!According to this article, we can’t use the replay strategy
>>>>> because we are missing part of the replay.We lose information that was
>>>>> used to resolve a merge when composing merge commits.If we had that
>>>>> information, we could replay individual merges, and handle a higher
>>>>> percentage of the cyclic merge cases.
>>>>>
>>>>> This problem seems to have a straightforward solution.When we
>>>>> commit the
>>>>> merge, we can stuff the changeset that represents the difference
>>>>> between
>>>>> the merge, and the commit, into the merge_history.We just need an
>>>>> extensible merge_history format to hold it.
>>>>>
>>>>> It’s totally not clear to me why you need to say “reintegrate” when
>>>>> you
>>>>> merge to trunk, and why you need to update the branch after you do a
>>>>> reintegrate merge from it.The computer should be able to remember the
>>>>> history of merges and it should be obvious which things have been
>>>>> merged
>>>>> and which revisions have been committed on both branches.The only
>>>>> reason
>>>>> that I can think if is that that the mergeinfo is so sparse that the
>>>>> computer doesn’t remember enough about the merge history.Would a
>>>>> bigger
>>>>> and more extensible data format give us a straightforward way to solve
>>>>> that problem?
>>>>>
>>>>> TREE CHANGE
>>>>>
>>>>> We can identify tree changes by pattern matching.This is the same
>>>>> tactic
>>>>> that git uses, without any other tree change tracking.We can identify
>>>>> when this match is successful because the match is applied,
>>>>> examined by
>>>>> the merger, and then the merge is committed.In this case we could
>>>>> write
>>>>> thetree map into the merge_history so thatwe can map changes
>>>>> bi-directionally during future merges without guessing again.This is
>>>>> another case of saving information that we need to replay a merge.
>>>>>
>>>>> I think we could get a similar effect by generating a move operation
>>>>> (normal copy & delete form) as part of the merge.I think that this
>>>>> mapping would need to be done by updates as well as by explicit
>>>>> merges.
>>>>>
>>>>>
>>>>> EXPERTISE
>>>>> Who on this list knows enough about the core algorithm used in
>>>>> merge to
>>>>> critique these suggestions and point to places in the code or
>>>>> documentation?
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>

Re: Fixing merge - Subtree, Cyclic, and Tree Change cases

Posted by Andy Singleton <an...@assembla.com>.

  I do not think that the reverse merge strategy is the same as doing 
multiple forward merges, and I do not think it covers enough cases.  For 
example, In your original example, you proposed merging in to branch 
Feature, where
Feature = F+(R+S+RSFix)
You proposed "unmerging" (or even reverting) to get F, then merging in 
all of the potentially redundant changes.

Let's consider the case where we make a new change F2, so
Feature = F+(R+S+RSFix)+F2

Now, F2 is an edit on one of files or code blocks that is added or 
changed in R or S.  This is a high-probability case, because people are 
most likely to change new code.  In this case, there is no way to 
reverse merge and still keep F2.  However, we can forward merge T and 
RSTFix, and still keep F2.

On 7/19/2011 3:47 PM, Folker Schamel wrote:
>> On 7/18/2011 4:37 PM, Folker Schamel wrote:
>>> Hi Andy,
>>>
>>> two thoughts about cyclic merges:
>>>
>>> 1. Merging should not skip cyclic merges (like this old
>>> svnmerge tool), but must subtract (reverse-merge) the original
>>> change first, and then add (merge) the cyclic merge, in order
>>> to not loose adaptions of changes.
>> I made a different proposal to solve the same problem. Following your
>> example, let's say we are merging
>> Trunk = (R+S+RSFix) + (T + RSTfix)
>> --- where RSTFix is changes to resolve a merge conflict
>> into Feature = (R+S+RSFix) + F
>>
>> In your proposal, you "unmerge" (R+S+RSFix) to get F. Then, having
>> separated the stuff that is duplicate from the stuff that is new, you
>> can do the "one big diff" style merge from Trunk.
>>
>> In my proposal, we save RSTfix in our expandable merge_history file, and
>> then we can in many cases apply T and RSTFix separately, without any
>> duplicates.
>>
>> Do you think that might be easier?
>
> At the end also your proposal requires a reverse-merge to calculate
> RSTfix. So the difference is basically whether to calculate RSTfix
> on the fly implicitly when needed, or in advance and store it.
> Which one is easier and/or faster - good question.
>
> The idea behind the on-the-fly reverse-merge approach is
> a) to operate purely on existing revisions (no need to store changes
> like RSFix separately), and
> b) (at least in theory) a simple merge algorithm, which basically
> just says: "Merge everything over, but reverse-merge existing old
> changesets before", solving this RSTfix adaption issue on the fly
> automatically implicitly in a robust way, without having to deal
> with adaptions like RSTfix explicitly (at least in theory).
> See http://svn.haxx.se/dev/archive-2007-12/0137.shtml
> (Note that this algorithm assumes "correct" merge info,
> not the current subversion merge info.)
>
> Cheers,
> Folker
>
>>> For example, suppose you have two branches A and B.
>>> c100 is a change in A.
>>> c101 is a change in B.
>>> B merges c1 into B (maybe with or without conflict),
>>> but has to adapt this change to get it compatible with c101,
>>> resulting into c102.
>>> Now, A merges all changes from B to A.
>>> Then just merging c101 would loose the adaptions made in c102.
>>> So the correct behavior is to subtract c100 and then add c101 and c102.
>>> Note that if the changesets are not overlapping, the order
>>> of the reverse-merges and merges does not matter.
>>> But if the changesets are overlapping, then the correct of
>>> reverse-merges and merges can matter.
>>>
>>> 2. Supporting cyclic merges correctly requires that merge-info
>>> only records the direct merge info without carrying over
>>> existing merge info.
>>>
>>> See for example
>>> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=948427 
>>>
>>>
>> I completely agree that "svn newmerge" should be able to handle the case
>> that you posted in that message.
>>
>> I also agree that some of the problems in this merge case come from the
>> inadequate data in merginfo. As you point out, we can read mergeinfo
>> carefully, and we don't even know the common ancestor of two branches
>> being merged. If you know the common ancestor - which is often not that
>> far back in this workflow - you can ignore everything before that point.
>> Why isn't there a record dropped in with every merge that says "We
>> merged X (server + branch + revision) and (all of this other merge
>> history from there) at time Y"? This would get dragged into the next
>> branch it gets merged with. You could read back through the merge
>> tree/graph to find common ancestors. We could be saving those records in
>> a new and expanded merge_history.
>>
>>>
>>> Cheers,
>>> Folker
>>>
>>>> To start the discussion, I will refer to this blog article by Mark
>>>> Phippard:
>>>>
>>>> http://blogs.collab.net/subversion/2008/07/subversion-merg/
>>>>
>>>> I found the article to be a good overview of the issues.I think 
>>>> that we
>>>> need help from Mark.On the other hand, I have seen that Mark sometimes
>>>> makes discouraging comments. My work is apparently “hand wavey” and
>>>> “proprietary”.I’m used to this treatment because I have 25 developers
>>>> who work for me who often think that I am full of crap.However, it 
>>>> might
>>>> have a discouraging effect on other contributors.For example, you can
>>>> see in this great ticket thread -
>>>> http://subversion.tigris.org/issues/show_bug.cgi?id=2897 - he 
>>>> states "I
>>>> do not think it is possible in this design....I think we need to 
>>>> accept
>>>> the limitations of the current design and work towards doing the 
>>>> best we
>>>> can within that design” Apparently that was enough to kill progress.I
>>>> think we should keep a more open mind going forward.
>>>>
>>>> I’m going to make some claims that some problems have 
>>>> “straightforward”
>>>> solutions.That doesn’t mean they are simple solutions.Handling all of
>>>> the merge cases is going to be hard.However, they are 
>>>> straightforward in
>>>> the sense that we can discuss the strategy at the high level used in
>>>> Mark’s article.
>>>>
>>>> Let’s consider three issues:Subtree merginfo, cyclic merge, and tree
>>>> change operations
>>>>
>>>> SUBTREE MERGINFO
>>>>
>>>> Mark notes that reintegrate does not work if you have subtree 
>>>> merginfo.
>>>> The subtrees potentially make the top-level mergeinfo inaccurate.So,
>>>> basically everyone that has looked at merge problems in the past four
>>>> years, including Mark, has tried to get rid of subtree merginfo.It’s
>>>> amazing that Subversion still tries to support this feature.It 
>>>> can’t be
>>>> supported in NewMerge.
>>>>
>>>> In the following sections, we will also see that the merginfo data is
>>>> too sparse, and we need to replace it with something bigger and more
>>>> extensible.
>>>>
>>>> CYCLIC MERGE
>>>>
>>>> The case where we merge back and forth between a development or
>>>> deployment branch, and trunk, is the base case for merge.It should be
>>>> supported.Subversion only supports it with special 
>>>> instructions.This is
>>>> the “cyclic merge” problem.
>>>>
>>>> It seems that we have two basic ways to do a merge.We can grab all of
>>>> the changes that we are trying to merge in one big diff between the
>>>> branch we are merging from and the branch we are merging into - the
>>>> reintegrate merge as described in Mark’s article.Or, we can 
>>>> sequentially
>>>> apply or “replay” each of the changes that we want to merge into our
>>>> working copy - the “recursive” strategy that is the default for git.
>>>>
>>>> It seems to me that the “one big diff” and the replay strategy are
>>>> closely related.When you are replaying, you grab all of the changes in
>>>> any sequence of revisions that doesn’t include a merge as one big
>>>> diff.So, the current “one big diff” strategy is a special case of the
>>>> replay strategy that applies when there are no intermediate merges 
>>>> from
>>>> other branches or cherrypicks.
>>>>
>>>> But wait!According to this article, we can’t use the replay strategy
>>>> because we are missing part of the replay.We lose information that was
>>>> used to resolve a merge when composing merge commits.If we had that
>>>> information, we could replay individual merges, and handle a higher
>>>> percentage of the cyclic merge cases.
>>>>
>>>> This problem seems to have a straightforward solution.When we 
>>>> commit the
>>>> merge, we can stuff the changeset that represents the difference 
>>>> between
>>>> the merge, and the commit, into the merge_history.We just need an
>>>> extensible merge_history format to hold it.
>>>>
>>>> It’s totally not clear to me why you need to say “reintegrate” when 
>>>> you
>>>> merge to trunk, and why you need to update the branch after you do a
>>>> reintegrate merge from it.The computer should be able to remember the
>>>> history of merges and it should be obvious which things have been 
>>>> merged
>>>> and which revisions have been committed on both branches.The only 
>>>> reason
>>>> that I can think if is that that the mergeinfo is so sparse that the
>>>> computer doesn’t remember enough about the merge history.Would a 
>>>> bigger
>>>> and more extensible data format give us a straightforward way to solve
>>>> that problem?
>>>>
>>>> TREE CHANGE
>>>>
>>>> We can identify tree changes by pattern matching.This is the same 
>>>> tactic
>>>> that git uses, without any other tree change tracking.We can identify
>>>> when this match is successful because the match is applied, 
>>>> examined by
>>>> the merger, and then the merge is committed.In this case we could 
>>>> write
>>>> thetree map into the merge_history so thatwe can map changes
>>>> bi-directionally during future merges without guessing again.This is
>>>> another case of saving information that we need to replay a merge.
>>>>
>>>> I think we could get a similar effect by generating a move operation
>>>> (normal copy & delete form) as part of the merge.I think that this
>>>> mapping would need to be done by updates as well as by explicit 
>>>> merges.
>>>>
>>>>
>>>> EXPERTISE
>>>> Who on this list knows enough about the core algorithm used in 
>>>> merge to
>>>> critique these suggestions and point to places in the code or
>>>> documentation?
>>>>
>>>>
>>>
>>
>>
>


-- 
Andy Singleton
Founder/CEO, Assembla Online: http://www.assembla.com
Phone: 781-328-2241
Skype: andysingleton

Re: Fixing merge - Subtree, Cyclic, and Tree Change cases

Posted by Folker Schamel <sc...@spinor.com>.

> On 7/18/2011 4:37 PM, Folker Schamel wrote:
>> Hi Andy,
>>
>> two thoughts about cyclic merges:
>>
>> 1. Merging should not skip cyclic merges (like this old
>> svnmerge tool), but must subtract (reverse-merge) the original
>> change first, and then add (merge) the cyclic merge, in order
>> to not loose adaptions of changes.
> I made a different proposal to solve the same problem. Following your
> example, let's say we are merging
> Trunk = (R+S+RSFix) + (T + RSTfix)
> --- where RSTFix is changes to resolve a merge conflict
> into Feature = (R+S+RSFix) + F
>
> In your proposal, you "unmerge" (R+S+RSFix) to get F. Then, having
> separated the stuff that is duplicate from the stuff that is new, you
> can do the "one big diff" style merge from Trunk.
>
> In my proposal, we save RSTfix in our expandable merge_history file, and
> then we can in many cases apply T and RSTFix separately, without any
> duplicates.
>
> Do you think that might be easier?

At the end also your proposal requires a reverse-merge to calculate
RSTfix. So the difference is basically whether to calculate RSTfix
on the fly implicitly when needed, or in advance and store it.
Which one is easier and/or faster - good question.

The idea behind the on-the-fly reverse-merge approach is
a) to operate purely on existing revisions (no need to store changes
like RSFix separately), and
b) (at least in theory) a simple merge algorithm, which basically
just says: "Merge everything over, but reverse-merge existing old
changesets before", solving this RSTfix adaption issue on the fly
automatically implicitly in a robust way, without having to deal
with adaptions like RSTfix explicitly (at least in theory).
See http://svn.haxx.se/dev/archive-2007-12/0137.shtml
(Note that this algorithm assumes "correct" merge info,
not the current subversion merge info.)

Cheers,
Folker

>> For example, suppose you have two branches A and B.
>> c100 is a change in A.
>> c101 is a change in B.
>> B merges c1 into B (maybe with or without conflict),
>> but has to adapt this change to get it compatible with c101,
>> resulting into c102.
>> Now, A merges all changes from B to A.
>> Then just merging c101 would loose the adaptions made in c102.
>> So the correct behavior is to subtract c100 and then add c101 and c102.
>> Note that if the changesets are not overlapping, the order
>> of the reverse-merges and merges does not matter.
>> But if the changesets are overlapping, then the correct of
>> reverse-merges and merges can matter.
>>
>> 2. Supporting cyclic merges correctly requires that merge-info
>> only records the direct merge info without carrying over
>> existing merge info.
>>
>> See for example
>> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=948427
>>
> I completely agree that "svn newmerge" should be able to handle the case
> that you posted in that message.
>
> I also agree that some of the problems in this merge case come from the
> inadequate data in merginfo. As you point out, we can read mergeinfo
> carefully, and we don't even know the common ancestor of two branches
> being merged. If you know the common ancestor - which is often not that
> far back in this workflow - you can ignore everything before that point.
> Why isn't there a record dropped in with every merge that says "We
> merged X (server + branch + revision) and (all of this other merge
> history from there) at time Y"? This would get dragged into the next
> branch it gets merged with. You could read back through the merge
> tree/graph to find common ancestors. We could be saving those records in
> a new and expanded merge_history.
>
>>
>> Cheers,
>> Folker
>>
>>> To start the discussion, I will refer to this blog article by Mark
>>> Phippard:
>>>
>>> http://blogs.collab.net/subversion/2008/07/subversion-merg/
>>>
>>> I found the article to be a good overview of the issues.I think that we
>>> need help from Mark.On the other hand, I have seen that Mark sometimes
>>> makes discouraging comments. My work is apparently “hand wavey” and
>>> “proprietary”.I’m used to this treatment because I have 25 developers
>>> who work for me who often think that I am full of crap.However, it might
>>> have a discouraging effect on other contributors.For example, you can
>>> see in this great ticket thread -
>>> http://subversion.tigris.org/issues/show_bug.cgi?id=2897 - he states "I
>>> do not think it is possible in this design....I think we need to accept
>>> the limitations of the current design and work towards doing the best we
>>> can within that design” Apparently that was enough to kill progress.I
>>> think we should keep a more open mind going forward.
>>>
>>> I’m going to make some claims that some problems have “straightforward”
>>> solutions.That doesn’t mean they are simple solutions.Handling all of
>>> the merge cases is going to be hard.However, they are straightforward in
>>> the sense that we can discuss the strategy at the high level used in
>>> Mark’s article.
>>>
>>> Let’s consider three issues:Subtree merginfo, cyclic merge, and tree
>>> change operations
>>>
>>> SUBTREE MERGINFO
>>>
>>> Mark notes that reintegrate does not work if you have subtree merginfo.
>>> The subtrees potentially make the top-level mergeinfo inaccurate.So,
>>> basically everyone that has looked at merge problems in the past four
>>> years, including Mark, has tried to get rid of subtree merginfo.It’s
>>> amazing that Subversion still tries to support this feature.It can’t be
>>> supported in NewMerge.
>>>
>>> In the following sections, we will also see that the merginfo data is
>>> too sparse, and we need to replace it with something bigger and more
>>> extensible.
>>>
>>> CYCLIC MERGE
>>>
>>> The case where we merge back and forth between a development or
>>> deployment branch, and trunk, is the base case for merge.It should be
>>> supported.Subversion only supports it with special instructions.This is
>>> the “cyclic merge” problem.
>>>
>>> It seems that we have two basic ways to do a merge.We can grab all of
>>> the changes that we are trying to merge in one big diff between the
>>> branch we are merging from and the branch we are merging into - the
>>> reintegrate merge as described in Mark’s article.Or, we can sequentially
>>> apply or “replay” each of the changes that we want to merge into our
>>> working copy - the “recursive” strategy that is the default for git.
>>>
>>> It seems to me that the “one big diff” and the replay strategy are
>>> closely related.When you are replaying, you grab all of the changes in
>>> any sequence of revisions that doesn’t include a merge as one big
>>> diff.So, the current “one big diff” strategy is a special case of the
>>> replay strategy that applies when there are no intermediate merges from
>>> other branches or cherrypicks.
>>>
>>> But wait!According to this article, we can’t use the replay strategy
>>> because we are missing part of the replay.We lose information that was
>>> used to resolve a merge when composing merge commits.If we had that
>>> information, we could replay individual merges, and handle a higher
>>> percentage of the cyclic merge cases.
>>>
>>> This problem seems to have a straightforward solution.When we commit the
>>> merge, we can stuff the changeset that represents the difference between
>>> the merge, and the commit, into the merge_history.We just need an
>>> extensible merge_history format to hold it.
>>>
>>> It’s totally not clear to me why you need to say “reintegrate” when you
>>> merge to trunk, and why you need to update the branch after you do a
>>> reintegrate merge from it.The computer should be able to remember the
>>> history of merges and it should be obvious which things have been merged
>>> and which revisions have been committed on both branches.The only reason
>>> that I can think if is that that the mergeinfo is so sparse that the
>>> computer doesn’t remember enough about the merge history.Would a bigger
>>> and more extensible data format give us a straightforward way to solve
>>> that problem?
>>>
>>> TREE CHANGE
>>>
>>> We can identify tree changes by pattern matching.This is the same tactic
>>> that git uses, without any other tree change tracking.We can identify
>>> when this match is successful because the match is applied, examined by
>>> the merger, and then the merge is committed.In this case we could write
>>> thetree map into the merge_history so thatwe can map changes
>>> bi-directionally during future merges without guessing again.This is
>>> another case of saving information that we need to replay a merge.
>>>
>>> I think we could get a similar effect by generating a move operation
>>> (normal copy & delete form) as part of the merge.I think that this
>>> mapping would need to be done by updates as well as by explicit merges.
>>>
>>>
>>> EXPERTISE
>>> Who on this list knows enough about the core algorithm used in merge to
>>> critique these suggestions and point to places in the code or
>>> documentation?
>>>
>>>
>>
>
>

Re: Fixing merge - Subtree, Cyclic, and Tree Change cases

Posted by Andy Singleton <an...@assembla.com>.

  On 7/18/2011 4:37 PM, Folker Schamel wrote:
> Hi Andy,
>
> two thoughts about cyclic merges:
>
> 1. Merging should not skip cyclic merges (like this old
> svnmerge tool), but must subtract (reverse-merge) the original
> change first, and then add (merge) the cyclic merge, in order
> to not loose adaptions of changes.
I made a different proposal to solve the same problem.  Following your 
example, let's say we are merging
Trunk = (R+S+RSFix) + (T + RSTfix)
--- where RSTFix is changes to resolve a merge conflict
into Feature = (R+S+RSFix) + F

In  your proposal, you "unmerge" (R+S+RSFix) to get F.  Then, having 
separated  the stuff that is duplicate from the stuff that is new, you 
can do the "one big diff" style merge from Trunk.

In my proposal, we save RSTfix in our expandable merge_history file, and 
then we can in many cases apply T and RSTFix separately, without any 
duplicates.

Do you think that might be easier?
>
> For example, suppose you have two branches A and B.
> c100 is a change in A.
> c101 is a change in B.
> B merges c1 into B (maybe with or without conflict),
> but has to adapt this change to get it compatible with c101,
> resulting into c102.
> Now, A merges all changes from B to A.
> Then just merging c101 would loose the adaptions made in c102.
> So the correct behavior is to subtract c100 and then add c101 and c102.
> Note that if the changesets are not overlapping, the order
> of the reverse-merges and merges does not matter.
> But if the changesets are overlapping, then the correct of
> reverse-merges and merges can matter.
>
> 2. Supporting cyclic merges correctly requires that merge-info
> only records the direct merge info without carrying over
> existing merge info.
>
> See for example
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=948427 
>
I completely agree that "svn newmerge" should be able to handle the case 
that you posted in that message.

I also agree that some of the problems in this merge case come from the 
inadequate data in merginfo.  As you point out, we can read mergeinfo 
carefully, and we don't even know the common ancestor of two branches 
being merged.  If you know the common ancestor - which is often not that 
far back in this workflow - you can ignore everything before that 
point.  Why isn't there a record dropped in with every merge that says 
"We merged X (server + branch + revision) and (all of this other merge 
history from there) at time Y"?  This would get dragged into the next 
branch it gets merged with.  You could read back through the merge 
tree/graph to find common ancestors.  We could be saving those records 
in a new and expanded merge_history.

>
> Cheers,
> Folker
>
>> To start the discussion, I will refer to this blog article by Mark 
>> Phippard:
>>
>> http://blogs.collab.net/subversion/2008/07/subversion-merg/
>>
>> I found the article to be a good overview of the issues.I think that we
>> need help from Mark.On the other hand, I have seen that Mark sometimes
>> makes discouraging comments. My work is apparently “hand wavey” and
>> “proprietary”.I’m used to this treatment because I have 25 developers
>> who work for me who often think that I am full of crap.However, it might
>> have a discouraging effect on other contributors.For example, you can
>> see in this great ticket thread -
>> http://subversion.tigris.org/issues/show_bug.cgi?id=2897 - he states "I
>> do not think it is possible in this design....I think we need to accept
>> the limitations of the current design and work towards doing the best we
>> can within that design” Apparently that was enough to kill progress.I
>> think we should keep a more open mind going forward.
>>
>> I’m going to make some claims that some problems have “straightforward”
>> solutions.That doesn’t mean they are simple solutions.Handling all of
>> the merge cases is going to be hard.However, they are straightforward in
>> the sense that we can discuss the strategy at the high level used in
>> Mark’s article.
>>
>> Let’s consider three issues:Subtree merginfo, cyclic merge, and tree
>> change operations
>>
>> SUBTREE MERGINFO
>>
>> Mark notes that reintegrate does not work if you have subtree merginfo.
>> The subtrees potentially make the top-level mergeinfo inaccurate.So,
>> basically everyone that has looked at merge problems in the past four
>> years, including Mark, has tried to get rid of subtree merginfo.It’s
>> amazing that Subversion still tries to support this feature.It can’t be
>> supported in NewMerge.
>>
>> In the following sections, we will also see that the merginfo data is
>> too sparse, and we need to replace it with something bigger and more
>> extensible.
>>
>> CYCLIC MERGE
>>
>> The case where we merge back and forth between a development or
>> deployment branch, and trunk, is the base case for merge.It should be
>> supported.Subversion only supports it with special instructions.This is
>> the “cyclic merge” problem.
>>
>> It seems that we have two basic ways to do a merge.We can grab all of
>> the changes that we are trying to merge in one big diff between the
>> branch we are merging from and the branch we are merging into - the
>> reintegrate merge as described in Mark’s article.Or, we can sequentially
>> apply or “replay” each of the changes that we want to merge into our
>> working copy - the “recursive” strategy that is the default for git.
>>
>> It seems to me that the “one big diff” and the replay strategy are
>> closely related.When you are replaying, you grab all of the changes in
>> any sequence of revisions that doesn’t include a merge as one big
>> diff.So, the current “one big diff” strategy is a special case of the
>> replay strategy that applies when there are no intermediate merges from
>> other branches or cherrypicks.
>>
>> But wait!According to this article, we can’t use the replay strategy
>> because we are missing part of the replay.We lose information that was
>> used to resolve a merge when composing merge commits.If we had that
>> information, we could replay individual merges, and handle a higher
>> percentage of the cyclic merge cases.
>>
>> This problem seems to have a straightforward solution.When we commit the
>> merge, we can stuff the changeset that represents the difference between
>> the merge, and the commit, into the merge_history.We just need an
>> extensible merge_history format to hold it.
>>
>> It’s totally not clear to me why you need to say “reintegrate” when you
>> merge to trunk, and why you need to update the branch after you do a
>> reintegrate merge from it.The computer should be able to remember the
>> history of merges and it should be obvious which things have been merged
>> and which revisions have been committed on both branches.The only reason
>> that I can think if is that that the mergeinfo is so sparse that the
>> computer doesn’t remember enough about the merge history.Would a bigger
>> and more extensible data format give us a straightforward way to solve
>> that problem?
>>
>> TREE CHANGE
>>
>> We can identify tree changes by pattern matching.This is the same tactic
>> that git uses, without any other tree change tracking.We can identify
>> when this match is successful because the match is applied, examined by
>> the merger, and then the merge is committed.In this case we could write
>> thetree map into the merge_history so thatwe can map changes
>> bi-directionally during future merges without guessing again.This is
>> another case of saving information that we need to replay a merge.
>>
>> I think we could get a similar effect by generating a move operation
>> (normal copy & delete form) as part of the merge.I think that this
>> mapping would need to be done by updates as well as by explicit merges.
>>
>>
>> EXPERTISE
>> Who on this list knows enough about the core algorithm used in merge to
>> critique these suggestions and point to places in the code or 
>> documentation?
>>
>>
>


-- 
Andy Singleton
Founder/CEO, Assembla Online: http://www.assembla.com
Phone: 781-328-2241
Skype: andysingleton

Re: Fixing merge - Subtree, Cyclic, and Tree Change cases

Posted by Folker Schamel <sc...@spinor.com>.

Hi Andy,

two thoughts about cyclic merges:

1. Merging should not skip cyclic merges (like this old
svnmerge tool), but must subtract (reverse-merge) the original
change first, and then add (merge) the cyclic merge, in order
to not loose adaptions of changes.

For example, suppose you have two branches A and B.
c100 is a change in A.
c101 is a change in B.
B merges c1 into B (maybe with or without conflict),
but has to adapt this change to get it compatible with c101,
resulting into c102.
Now, A merges all changes from B to A.
Then just merging c101 would loose the adaptions made in c102.
So the correct behavior is to subtract c100 and then add c101 and c102.
Note that if the changesets are not overlapping, the order
of the reverse-merges and merges does not matter.
But if the changesets are overlapping, then the correct of
reverse-merges and merges can matter.

2. Supporting cyclic merges correctly requires that merge-info
only records the direct merge info without carrying over
existing merge info.

See for example
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=948427

Cheers,
Folker

> To start the discussion, I will refer to this blog article by Mark Phippard:
>
> http://blogs.collab.net/subversion/2008/07/subversion-merg/
>
> I found the article to be a good overview of the issues.I think that we
> need help from Mark.On the other hand, I have seen that Mark sometimes
> makes discouraging comments. My work is apparently “hand wavey” and
> “proprietary”.I’m used to this treatment because I have 25 developers
> who work for me who often think that I am full of crap.However, it might
> have a discouraging effect on other contributors.For example, you can
> see in this great ticket thread -
> http://subversion.tigris.org/issues/show_bug.cgi?id=2897 - he states "I
> do not think it is possible in this design....I think we need to accept
> the limitations of the current design and work towards doing the best we
> can within that design” Apparently that was enough to kill progress.I
> think we should keep a more open mind going forward.
>
> I’m going to make some claims that some problems have “straightforward”
> solutions.That doesn’t mean they are simple solutions.Handling all of
> the merge cases is going to be hard.However, they are straightforward in
> the sense that we can discuss the strategy at the high level used in
> Mark’s article.
>
> Let’s consider three issues:Subtree merginfo, cyclic merge, and tree
> change operations
>
> SUBTREE MERGINFO
>
> Mark notes that reintegrate does not work if you have subtree merginfo.
> The subtrees potentially make the top-level mergeinfo inaccurate.So,
> basically everyone that has looked at merge problems in the past four
> years, including Mark, has tried to get rid of subtree merginfo.It’s
> amazing that Subversion still tries to support this feature.It can’t be
> supported in NewMerge.
>
> In the following sections, we will also see that the merginfo data is
> too sparse, and we need to replace it with something bigger and more
> extensible.
>
> CYCLIC MERGE
>
> The case where we merge back and forth between a development or
> deployment branch, and trunk, is the base case for merge.It should be
> supported.Subversion only supports it with special instructions.This is
> the “cyclic merge” problem.
>
> It seems that we have two basic ways to do a merge.We can grab all of
> the changes that we are trying to merge in one big diff between the
> branch we are merging from and the branch we are merging into - the
> reintegrate merge as described in Mark’s article.Or, we can sequentially
> apply or “replay” each of the changes that we want to merge into our
> working copy - the “recursive” strategy that is the default for git.
>
> It seems to me that the “one big diff” and the replay strategy are
> closely related.When you are replaying, you grab all of the changes in
> any sequence of revisions that doesn’t include a merge as one big
> diff.So, the current “one big diff” strategy is a special case of the
> replay strategy that applies when there are no intermediate merges from
> other branches or cherrypicks.
>
> But wait!According to this article, we can’t use the replay strategy
> because we are missing part of the replay.We lose information that was
> used to resolve a merge when composing merge commits.If we had that
> information, we could replay individual merges, and handle a higher
> percentage of the cyclic merge cases.
>
> This problem seems to have a straightforward solution.When we commit the
> merge, we can stuff the changeset that represents the difference between
> the merge, and the commit, into the merge_history.We just need an
> extensible merge_history format to hold it.
>
> It’s totally not clear to me why you need to say “reintegrate” when you
> merge to trunk, and why you need to update the branch after you do a
> reintegrate merge from it.The computer should be able to remember the
> history of merges and it should be obvious which things have been merged
> and which revisions have been committed on both branches.The only reason
> that I can think if is that that the mergeinfo is so sparse that the
> computer doesn’t remember enough about the merge history.Would a bigger
> and more extensible data format give us a straightforward way to solve
> that problem?
>
> TREE CHANGE
>
> We can identify tree changes by pattern matching.This is the same tactic
> that git uses, without any other tree change tracking.We can identify
> when this match is successful because the match is applied, examined by
> the merger, and then the merge is committed.In this case we could write
> thetree map into the merge_history so thatwe can map changes
> bi-directionally during future merges without guessing again.This is
> another case of saving information that we need to replay a merge.
>
> I think we could get a similar effect by generating a move operation
> (normal copy & delete form) as part of the merge.I think that this
> mapping would need to be done by updates as well as by explicit merges.
>
>
> EXPERTISE
> Who on this list knows enough about the core algorithm used in merge to
> critique these suggestions and point to places in the code or documentation?
>
>

Re: Fixing merge - Subtree, Cyclic, and Tree Change cases

Posted by Peter Samuelson <pe...@p12n.org>.

> On Sun, Jul 17, 2011 at 6:03 PM, Andy Singleton <an...@assembla.com> wrote:
> > TREE CHANGE
> >
> > We can identify tree changes by pattern matching.  This is the same tactic
> > that git uses, without any other tree change tracking.

[Paul Burba]
> Could you provide a bit more explanation of what you mean here?  For
> those of us not familiar with the inner workings of git.

I'm not terribly familiar with git internals, but I think what he's
getting at is that git does not track tree changes, but instead,
_deduces_ tree changes from changesets.  If, within a single commit,
one file disappears and another file appears with similar content, git
interprets, after the fact, as a move.

There are several ramifications.  One, I'm not sure how well this works
if the delete and the add are in separate commits - i.e., a file is
resurrected from history.  Maybe that is tracked in some other way.

Two, git attempts to reconstruct _content_ moves, not just tree moves.
That is, if a significant chunk of a file (say, a function) disappears
and reappears in another file, this is tracked as well, even if it's
not the whole file.  So, a file rename is really just a special case of
content moving between files.

Note that the complexity of this approach (deducing content moves after
the fact, rather than tracking tree changes) scales by the size of the
commit, not the size of the tree.  So the assumption is that even in
large trees, any given commit is relatively limited in scope.

-- 
Peter Samuelson | org-tld!p12n!peter | http://p12n.org/

Re: Fixing merge - Subtree, Cyclic, and Tree Change cases

Posted by Stefan Sperling <st...@elego.de>.

On Wed, Jul 20, 2011 at 03:27:02PM +0100, Julian Foad wrote:
> Stefan Sperling wrote:
> > It does not generate conflicts because of the way REV1 is chosen,
> > and because of the way PATH1 and PATH2 are chosen.
> 
> In fact, this must imply that you *can* get conflicts if any of the
> changes on trunk after rX conflicts with the "feature".  But the
> recommended work flow would be to do the last "catch-up" (which defines
> X) just before the re-integrate, so the opportunity for such conflicts
> would be small.

Yes, precisely. Thanks for correcting this point.

Re: Fixing merge - Subtree, Cyclic, and Tree Change cases

Posted by Peter Samuelson <pe...@p12n.org>.

[Julian Foad]
> In fact, this must imply that you *can* get conflicts if any of the
> changes on trunk after rX conflicts with the "feature".  But the
> recommended work flow would be to do the last "catch-up" (which
> defines X) just before the re-integrate, so the opportunity for such
> conflicts would be small.

Indeed.  I don't usually bother with a "final catch-up merge".  I
just do the reintegrate with wherever the branch happens to be.
The conflicts will be pretty much the same, and it's one less step.

The only valid argument for a 'final catch-up' that I can think of is
if the final --reintegrate merge is to be performed by somebody else,
or has some sort of formal QA process around it, such that it is
advantageous to resolve the conflicts ahead of time, in a separate
step, to keep the final step simple and deterministic.  But typically,
I reintegrate my own branches, so I prefer to skip the extra step and
the extra rev.
-- 
Peter Samuelson | org-tld!p12n!peter | http://p12n.org/

Re: Fixing merge - Subtree, Cyclic, and Tree Change cases

Posted by Julian Foad <ju...@wandisco.com>.

Stefan Sperling wrote:
> On Wed, Jul 20, 2011 at 01:13:09PM +0100, Julian Foad wrote:
> > There are (broadly speaking) two ways we could perform a "re-integrate"
> > merge.  The "general" way is to do it by using a sufficiently clever
> > general-purpose merge algorithm to merge all the changes that are unique
> > to the branch, onto the trunk.  The "special" way is to recognize that,
> > if the branch is up to date with trunk, then the desired state of the
> > trunk is exactly the current state of the branch, so all we need to do
> > is make the trunk look like the branch, which is theoretically a trivial
> > operation.
[...]
> I think the above description simplifies things a bit too much
> for the purpose of this discussion.
[...]
> A re-integrate does not simply swap out the trunk with the branch.
> It runs a two-URL merge with parameters constructed in such a
> way that the delta being merged contains exactly the changes
> committed for the feature, and no unrelated changes.
[...]     
>                  feature  +-------------------------------R
>                          /                               . \
>                         /                  ..............   \
>                        /                  .                  v
>          trunk ------+--------------------L------------------o
>                     rW                   rX
> 
>      In the diagram above, L marks the left side of the merge (trunk@X),
>      and R marks the right side of the merge (feature@HEAD). The difference
>      between the left and right side is merged into the target.

Ah... I didn't realize it is that flexible.  Thanks for correcting me.
That's very useful for anyone trying to re-integrate onto a "busy"
trunk.

> > Note that a practical benefit of the work flow using the "special"
> > re-integrate is it ensures the reintegration will not generate any merge
> > conflicts - neither physical, nor semantic (since we assume the branch
> > has been reviewed and tested).
> 
> It does not generate conflicts because of the way REV1 is chosen,
> and because of the way PATH1 and PATH2 are chosen.

In fact, this must imply that you *can* get conflicts if any of the
changes on trunk after rX conflicts with the "feature".  But the
recommended work flow would be to do the last "catch-up" (which defines
X) just before the re-integrate, so the opportunity for such conflicts
would be small.

- Julian

Re: Fixing merge - Subtree, Cyclic, and Tree Change cases

Posted by Stefan Sperling <st...@elego.de>.

On Wed, Jul 20, 2011 at 01:13:09PM +0100, Julian Foad wrote:
> There are (broadly speaking) two ways we could perform a "re-integrate"
> merge.  The "general" way is to do it by using a sufficiently clever
> general-purpose merge algorithm to merge all the changes that are unique
> to the branch, onto the trunk.  The "special" way is to recognize that,
> if the branch is up to date with trunk, then the desired state of the
> trunk is exactly the current state of the branch, so all we need to do
> is make the trunk look like the branch, which is theoretically a trivial
> operation.
> 
> Both ways achieve the basic desired result when re-integrating a feature
> branch to trunk, and their differences, while significant, are
> relatively minor in context of the whole field of merging.  The
> Subversion project implemented the "special" way, and I would guess that
> was mainly because the merge tracking was not clever enough to make the
> "general" way work, while recognizing that it's often helpful to have a
> simple concept on which to base a well-defined work flow so the loss of
> generality was not a big problem in many cases.

I think the above description simplifies things a bit too much
for the purpose of this discussion.

To avoid potential misunderstandings I'd like to provide a more
detailed explanation of how reintegrate works.
Andy also said earlier he didn't understand why --reintegrate is
necessary and I hope I can answer this question.

A re-integrate does not simply swap out the trunk with the branch.
It runs a two-URL merge with parameters constructed in such a
way that the delta being merged contains exactly the changes
committed for the feature, and no unrelated changes.

Quoting 'svn help merge' output from Subversion 1.7:

     A feature has been developed on a branch called "feature".
     The feature branch started as a copy of trunk@W. Work on the
     feature has completed and it should be merged back into the trunk.

     The feature branch was last synced with its immediate ancestor,
     the trunk, in revision X. So the difference between trunk@X and
     feature@HEAD contains the complete set of changes that implement
     the feature, and no other changes. These changes are applied to
     the trunk.

                 feature  +-------------------------------R
                         /                               . \
                        /                  ..............   \
                       /                  .                  v
         trunk ------+--------------------L------------------o
                    rW                   rX

     In the diagram above, L marks the left side of the merge (trunk@X),
     and R marks the right side of the merge (feature@HEAD). The difference
     between the left and right side is merged into the target.

Essentially, all merges in Subversion are 2-URL merges.
svn merge generates some delta and applies that to a working copy.
There is no other way, right now, that a merge could work.

A two-URL merge needs the following parameters:

  PATH1, REV1 (left side of the delta)
  PATH2, REV2 (right side of the delta)
  TARGET (working copy of some PATH3, up-to-date enough for commit)

The reintegrate merge automatically fills in all required parameters 
expect PATH2, and we get:

  PATH1 = trunk, REV1 = rX
  PATH2 = feature, REV2 = HEAD
  TARGET = working copy of trunk (generally speaking @HEAD)

A sync merge can fill in the all parameters as well, except PATH2.
However, it needs to do so in a different way. With a sync merge
PATH1 and PATH2 are the same so the delta between trunk@REV1 and
trunk@REV2 is merged into a working copy of the feature branch
In case of reintegrate PATH1 and PATH2 are not the same!

So svn merge needs the special --reintegrate option because just
'svn merge ^/branch' would be ambiguous.

> Note that a practical benefit of the work flow using the "special"
> re-integrate is it ensures the reintegration will not generate any merge
> conflicts - neither physical, nor semantic (since we assume the branch
> has been reviewed and tested).

It does not generate conflicts because of the way REV1 is chosen,
and because of the way PATH1 and PATH2 are chosen.

In some cases, you can actually get a clean merge by reintegrating
a branch like this: svn merge ^/feature
This results in the delta between feature@W and feature@HEAD
being applied to a trunk working copy. This merge can be conflict-free
but is not guaranteed to be conflict-free.

> The "special" kind of re-integrate that Subversion performs has a
> logical pre-condition: that the branch must be up to date with the
> "trunk".  In concrete terms, the required check is that the total set of
> mergeinfo on the branch must indicate that all changes from the trunk
> are included in full.

No. Using the above diagram as a reference, it only requires that
there are no gaps in the branch's mergeinfo for merges from trunk
between rW and rX. It does not require that changes between rX and
HEAD have already been merged from the trunk to the branch.
It just requires that a continous chunk of history has been merged,
i.e. no cherry-picking from trunk to the branch between rW and rX,
you need everything in that range.

> The implementation of this check has been overly simple and pessimistic
> in the past.  It is possible for the same merge history to be recorded
> in different ways.  In some cases where mergeinfo is recorded on a
> subtree, the very same merge history could equally well be recorded in a
> single svn:mergeinfo on the root of the branch.  The original
> "re-integrate" merge did not tolerate mergeinfo properties on subtrees
> at all, but that limitation was soon relaxed and I believe it has since
> been improved to logical completion so that now mergeinfo is allowed on
> subtrees as long as the total merge history described is logically
> correct.

Again, the question is about gaps in the subtree mergeinfo,
not about whether everything has already been merged.

Re: Fixing merge - Subtree, Cyclic, and Tree Change cases

Posted by Stefan Sperling <st...@elego.de>.

On Wed, Jul 20, 2011 at 09:41:48AM -0400, Andy Singleton wrote:
> The non-special merge case is also common even in projects that use
> Subversion.  For example, consider the case where we have a release
> branch and a development trunk.  I see this in almost every serious
> subversion project.  We move some bug fixes to the release branch.
> So, it only has selected changes from the development trunk.  We we
> also make bug fixes on the release branch, or on branches  that
> developers make to fix and test the release version.  We should be
> able to merge those into the development trunk.

I am not sure what you want to fix for this use case.
This works today. You can merge bugfixes into either direction.
Those merges are all cherry-picking merges.
There is no intention to ever reintegrate the release branch into
trunk, nor to reintegrate the trunk into the release branch.
So there are no problems with cycles.

As an aside, note that Subversion can track such cherry-picking
merges, while git and mercurial cannot. So Subversion already
has an advantage in this area.

To get cycles you need to do stupid things like fixing a bug in
a feature branch, cherry-picking this bug fix to trunk, and then
have the same feature branch sync to trunk so the bugfix bounces
back to it. This anti-pattern doesn't work well in git either,
you will always be able to create a scenario where the tool cannot
resolve a conflict automatically.
If the bug fix was made on trunk the feature branch would simply
soak it up in the next sync merge, without conflict.
The key thing to realise is that the location where changes enter
your branching/merging pattern does matter.

I would say that most cyclic merge problems happen because people
do not plan their branching/merging upfront, but just commit whatever
changes they like anywhere they want, and then expect their tool
to sort out the mess they've created. They don't realise that
separate branches have separate purposes for a reason.
"Fixing" merge for such users is really, really, hard, in any
version control system.

Consider this problem:

Your merge policy says you want to commit all your bugfixes to the
main branch before they get backported to the release.
You can do this in Subversion, just get working copy of the release
branch and run something like 'svn merge -cN ^/trunk' to merge
the fix committed in revision N.

Now, enter git or mercurial. You cannot do it this way.
Your policy does not fit the tool. You are required to commit your
bugfixes to the release branch first because every merge you run
always merges the entire branch. So if you run 'hg merge default'
in your release branch, you are merging lots of destabilising changes
into the release branch that are supposed to go into the next release.

Effectively, you only have what Subversion calls "reintegrate".
You can cherry-pick, but this won't be tracked as a merge because
the parent/child relationship between commits that git and mercurial
use for merge-tracking cannot represent this case.
So instead, you commit your bugfixes to all your active release
branches (maybe transfer changes from one release branch to another
via automated cherry-picks betwen the release branches), and then
get a working copy of your main branch, and run something like
'hg merge release-1.x' to merge all the bugfixes that have accumulated
in your release-1.x branch into the main branch. You do this merge
just once, possibly from the most recent release to minimize conflicts.

This is a good example of how the merge models used by a tool
influence the merge policy. You cannot expect any given tool
to support an arbitrary merge policy out of the box. It needs to
be planned around the capabilities of the tool.

> Stefan, thank you for explaining reintegrate merge.  It makes more
> sense to me now.  I also see how with the current 2-URL merge,
> "merge <branch>" can be ambiguous.  If merge were redefined to mean
> "grab all changes", and that were calculated internally, then you
> would not specify two URL's, and the command "merge <from branch>"
> would not be ambiguous.

So I suppose your effort is focused on trying to replace the 2-URL
merge model with something better.
I would suggest to create a list of specific use cases that are
not well supported in the 2-URL merge model Subversion uses, and
then discuss each one separately. Without specific use cases to
talk about we won't be getting anywhere in this discussion.

Re: Fixing merge - Subtree, Cyclic, and Tree Change cases

Posted by Andy Singleton <an...@assembla.com>.

  Julian and Stefan, thank you for this distinction between the 
"special" merge case, where we can grab all of the changes between two 
revisions, and the "general" merge case. That really clarifies the 
discussion.

The point I am making is that Subversion needs to support more of the 
"general" merge cases.   They are supported in other SCM systems, and 
developers use them and need them.

The non-special merge case is also common even in projects that use 
Subversion.  For example, consider the case where we have a release 
branch and a development trunk.  I see this in almost every serious 
subversion project.  We move some bug fixes to the release branch.  So, 
it only has selected changes from the development trunk.  We we also 
make bug fixes on the release branch, or on branches  that developers 
make to fix and test the release version.  We should be able to merge 
those into the development trunk.

Stefan, thank you for explaining reintegrate merge.  It makes more sense 
to me now.  I also see how with the current 2-URL merge, "merge 
<branch>" can be ambiguous.  If merge were redefined to mean "grab all 
changes", and that were calculated internally, then you would not 
specify two URL's, and the command "merge <from branch>" would not be 
ambiguous.

On 7/20/2011 8:13 AM, Julian Foad wrote:
> Hi Andy.
>
> Let me talk a bit about 're-integrate' merges, which are a special case
> of 'cyclic' merges.
>
> Stepping back, just to make sure we all understand the concept and use
> the same terminlogy, we can define the high-level concept of a
> "re-integrate" merge in terms of the work flow of a "feature branch".  A
> feature branch is made from what we'll call "trunk" and we develop a
> feature on the branch, keeping the branch periodically synchronized to
> trunk by doing a "catch-up" merge to get all the latest changes from
> trunk.  When the feature is ready, a "re-integrate" merge is what we do
> to merge it (the feature, or equivalently the branch that includes the
> feature) back into trunk.
>
> There are (broadly speaking) two ways we could perform a "re-integrate"
> merge.  The "general" way is to do it by using a sufficiently clever
> general-purpose merge algorithm to merge all the changes that are unique
> to the branch, onto the trunk.  The "special" way is to recognize that,
> if the branch is up to date with trunk, then the desired state of the
> trunk is exactly the current state of the branch, so all we need to do
> is make the trunk look like the branch, which is theoretically a trivial
> operation.
>
> Both ways achieve the basic desired result when re-integrating a feature
> branch to trunk, and their differences, while significant, are
> relatively minor in context of the whole field of merging.  The
> Subversion project implemented the "special" way, and I would guess that
> was mainly because the merge tracking was not clever enough to make the
> "general" way work, while recognizing that it's often helpful to have a
> simple concept on which to base a well-defined work flow so the loss of
> generality was not a big problem in many cases.
>
> Note that a practical benefit of the work flow using the "special"
> re-integrate is it ensures the reintegration will not generate any merge
> conflicts - neither physical, nor semantic (since we assume the branch
> has been reviewed and tested).
>
>
> On Mon, 2011-07-18, Paul Burba wrote:
>> On Sun, Jul 17, 2011, Andy Singleton wrote:
> [...]
>>>   Mark notes that reintegrate does not work if you have subtree merginfo.
>>>   The subtrees potentially make the top-level  mergeinfo inaccurate.
> The "special" kind of re-integrate that Subversion performs has a
> logical pre-condition: that the branch must be up to date with the
> "trunk".  In concrete terms, the required check is that the total set of
> mergeinfo on the branch must indicate that all changes from the trunk
> are included in full.
>
> The implementation of this check has been overly simple and pessimistic
> in the past.  It is possible for the same merge history to be recorded
> in different ways.  In some cases where mergeinfo is recorded on a
> subtree, the very same merge history could equally well be recorded in a
> single svn:mergeinfo on the root of the branch.  The original
> "re-integrate" merge did not tolerate mergeinfo properties on subtrees
> at all, but that limitation was soon relaxed and I believe it has since
> been improved to logical completion so that now mergeinfo is allowed on
> subtrees as long as the total merge history described is logically
> correct.
>
> So I'm not sure if we still have design deficiencies in coping with
> subtree mergeinfo or not.
>
> The question of whether and how to implement a general-purpose merge
> that can cope with arbitrary re-integrates and other cyclic merges is a
> more complex issue that I have only begun to read up on.  None of the
> above is meant to suggest that such an approach should not be attempted.
>
> - Julian
>
>
> [1] Here, "state" means the directory structure and file content, but
> not the history.
>
>


-- 
Andy Singleton
Founder/CEO, Assembla Online: http://www.assembla.com
Phone: 781-328-2241
Skype: andysingleton

Re: Fixing merge - Subtree, Cyclic, and Tree Change cases

Posted by Julian Foad <ju...@wandisco.com>.

Hi Andy.

Let me talk a bit about 're-integrate' merges, which are a special case
of 'cyclic' merges.

Stepping back, just to make sure we all understand the concept and use
the same terminlogy, we can define the high-level concept of a
"re-integrate" merge in terms of the work flow of a "feature branch".  A
feature branch is made from what we'll call "trunk" and we develop a
feature on the branch, keeping the branch periodically synchronized to
trunk by doing a "catch-up" merge to get all the latest changes from
trunk.  When the feature is ready, a "re-integrate" merge is what we do
to merge it (the feature, or equivalently the branch that includes the
feature) back into trunk.

There are (broadly speaking) two ways we could perform a "re-integrate"
merge.  The "general" way is to do it by using a sufficiently clever
general-purpose merge algorithm to merge all the changes that are unique
to the branch, onto the trunk.  The "special" way is to recognize that,
if the branch is up to date with trunk, then the desired state of the
trunk is exactly the current state of the branch, so all we need to do
is make the trunk look like the branch, which is theoretically a trivial
operation.

Both ways achieve the basic desired result when re-integrating a feature
branch to trunk, and their differences, while significant, are
relatively minor in context of the whole field of merging.  The
Subversion project implemented the "special" way, and I would guess that
was mainly because the merge tracking was not clever enough to make the
"general" way work, while recognizing that it's often helpful to have a
simple concept on which to base a well-defined work flow so the loss of
generality was not a big problem in many cases.

Note that a practical benefit of the work flow using the "special"
re-integrate is it ensures the reintegration will not generate any merge
conflicts - neither physical, nor semantic (since we assume the branch
has been reviewed and tested).

On Mon, 2011-07-18, Paul Burba wrote:
> On Sun, Jul 17, 2011, Andy Singleton wrote:
[...]
> >  Mark notes that reintegrate does not work if you have subtree merginfo.
> >  The subtrees potentially make the top-level  mergeinfo inaccurate.

The "special" kind of re-integrate that Subversion performs has a
logical pre-condition: that the branch must be up to date with the
"trunk".  In concrete terms, the required check is that the total set of
mergeinfo on the branch must indicate that all changes from the trunk
are included in full.

The implementation of this check has been overly simple and pessimistic
in the past.  It is possible for the same merge history to be recorded
in different ways.  In some cases where mergeinfo is recorded on a
subtree, the very same merge history could equally well be recorded in a
single svn:mergeinfo on the root of the branch.  The original
"re-integrate" merge did not tolerate mergeinfo properties on subtrees
at all, but that limitation was soon relaxed and I believe it has since
been improved to logical completion so that now mergeinfo is allowed on
subtrees as long as the total merge history described is logically
correct.

So I'm not sure if we still have design deficiencies in coping with
subtree mergeinfo or not.

The question of whether and how to implement a general-purpose merge
that can cope with arbitrary re-integrates and other cyclic merges is a
more complex issue that I have only begun to read up on.  None of the
above is meant to suggest that such an approach should not be attempted.

- Julian

[1] Here, "state" means the directory structure and file content, but
not the history.

Re: Fixing merge - Subtree, Cyclic, and Tree Change cases

Posted by Paul Burba <pt...@gmail.com>.

On Sun, Jul 17, 2011 at 6:03 PM, Andy Singleton <an...@assembla.com> wrote:
> To start the discussion, I will refer to this blog article by Mark Phippard:
>
> http://blogs.collab.net/subversion/2008/07/subversion-merg/
>
> I found the article to be a good overview of the issues.  I think that we
> need help from Mark.   On the other hand, I have seen that Mark sometimes
> makes discouraging comments.   My work is apparently “hand wavey” and
> “proprietary”.   I’m used to this treatment because I have 25 developers who
> work for me who often think that I am full of crap.  However, it might have
> a discouraging effect on other contributors.  For example, you can see in
> this great ticket thread -
> http://subversion.tigris.org/issues/show_bug.cgi?id=2897 - he states "I do
> not think it is possible in this design....I think we need to accept the
> limitations of the current design and work towards doing the best we can
> within that design”  Apparently that was enough to kill progress.   I think
> we should keep a more open mind going forward.
>
> I’m going to make some claims that some problems have “straightforward”
> solutions.  That doesn’t mean they are simple solutions.  Handling all of
> the merge cases is going to be hard.  However, they are straightforward in
> the sense that we can discuss the strategy at the high level used in Mark’s
> article.
>
> Let’s consider three issues:  Subtree merginfo, cyclic merge, and tree
> change operations
>
> SUBTREE MERGINFO
>
>  Mark notes that reintegrate does not work if you have subtree merginfo.
>  The subtrees potentially make the top-level  mergeinfo inaccurate.

Hi Andy,

The blog you reference is three years old.  A lot has changed since
then.  You might want to take a look at
https://svn.apache.org/repos/asf/subversion/branches/1.7.x/CHANGES for
the items related to merging.

> So,
> basically everyone that has looked at merge problems in the past four years,
> including Mark, has tried to get rid of subtree merginfo.

On that note: As far back as 1.5.5 (12/2008) reintegrate merges
tolerated subtree mergeinfo if all previous sync merges occurred at
the root of the branch.  The forthcoming 1.7 release supports
reintegrate merges even if some of the sync merges were done at the
subtree level, as long as the branch is effectively synced -- see this
issue for more info:
http://subversion.tigris.org/issues/show_bug.cgi?id=3577

This is why one of my first requests on your earlier thread was for
the *specific* use-cases where the current merge-tracking
implementation doesn't work (and preferably some test patches for
these use cases).

> It’s amazing that
> Subversion still tries to support this feature.  It can’t be supported in
> NewMerge.

To clarify, when you say "This feature" you mean what?

  A) The ability to do subtree merges at all

  B) Recording mergeinfo describing subtree merges (i.e. subtree
merges are still allowed, but not tracked).

  C) Other

> In the following sections, we will also see that the merginfo data is too
> sparse, and we need to replace it with something bigger and more extensible.

Sidebar: You might want to use another term besides "sparse" when
talking about mergeinfo granularity.  We typically use "sparse" to
refer to the depth of a working copy (e.g. 'svn co %URL% wc --depth
immediates wc-path) or the operative depth of an operation (e.g. 'svn
propget svn:mergeinfo %URL% --depth empty').  Sparse merges and sparse
merge targets can both result in subtree mergeinfo, so there is the
potential for some confusion for anybody following this casually.

> CYCLIC MERGE
>
> The case where we merge back and forth between a development or deployment
> branch, and trunk, is the base case for merge.  It should be supported.
> Subversion only supports it with special instructions.  This is the “cyclic
> merge” problem.
>
> It seems that we have two basic ways to do a merge.  We can grab all of the
> changes that we are trying to merge in one big diff between the branch we
> are merging from and the branch we are merging into - the reintegrate merge
> as described in Mark’s article.  Or, we can sequentially apply or “replay”
> each of the changes that we want to merge into our working copy - the
> “recursive” strategy that is the default for git.

If each of these sequential "replays" is a separate editor drive (see
the svn_delta_editor_t in
https://svn.apache.org/repos/asf/subversion/trunk/subversion/include/svn_diff.h)
you might need to consider the impact of performance for merges over a
WAN.  If a single editor drive turns into multiple editor drives it
will certainly be slower.  How much slower obviously depends on a lot
of factors, but it is something to keep in mind since this isn't git
and we don't have the entire repository history locally available.

> It seems to me that the “one big diff” and the replay strategy are closely
> related.  When you are replaying, you grab all of the changes in any
> sequence of revisions that doesn’t include a merge as one big diff.  So, the
> current “one big diff” strategy is a special case of the replay strategy
> that applies when there are no intermediate merges from other branches or
> cherrypicks.
>
> But wait!  According to this article, we can’t use the replay strategy
> because we are missing part of the replay.  We lose information that was
> used to resolve a merge when composing merge commits.  If we had that
> information, we could replay individual merges, and handle a higher
> percentage of the cyclic merge cases.
>
> This problem seems to have a straightforward solution.  When we commit the
> merge, we can stuff the changeset that represents the difference between the
> merge, and the commit, into the merge_history.  We just need an extensible
> merge_history format to hold it.
>
> It’s totally not clear to me why you need to say “reintegrate” when you
> merge to trunk,

You don't need to; reintegrate is simply a shortcut for a two-URL
merge.  The blog you reference explains this.

> and why you need to update the branch after you do a
> reintegrate merge from it.  The computer should be able to remember the
> history of merges and it should be obvious which things have been merged and
> which revisions have been committed on both branches.  The only reason that
> I can think if is that that the mergeinfo is so sparse that the computer
> doesn’t remember enough about the merge history.  Would a bigger and more
> extensible data format give us a straightforward way to solve that problem?

I think you are asking, at a very high level, "The svn:mergeinfo
property tracks merges down to the revision level.  Would a
finer-grained tracking solution allow us to solve the remaining cyclic
merge problems and do away with the need for the reintegrate option
and the workflow that goes with it?"

If that is a fair representation of your core question then I can say,
at an equally high level, "yes, it probably can".

> TREE CHANGE
>
> We can identify tree changes by pattern matching.  This is the same tactic
> that git uses, without any other tree change tracking.

Could you provide a bit more explanation of what you mean here?  For
those of us not familiar with the inner workings of git.

> We can identify when
> this match is successful because the match is applied, examined by the
> merger, and then the merge is committed.
>
> In this case we could write the
> tree map into the merge_history so that  we can map changes bi-directionally
> during future merges without guessing again.   This is another case of
> saving information that we need to replay a merge.
>
> I think we could get a similar effect by generating a move operation (normal
> copy & delete form) as part of the merge.  I think that this mapping would
> need to be done by updates as well as by explicit merges.
>
> EXPERTISE
> Who on this list knows enough about the core algorithm used in merge to
> critique these suggestions and point to places in the code or documentation?

If you haven't already read this I'd start here:
http://subversion.apache.org/docs/community-guide/

See svn_client_merge_peg4, svn_client_merge4, and
svn_client_merge_reintegrate in
https://svn.apache.org/repos/asf/subversion/trunk/subversion/include/svn_client.h

As I mentioned above, see the svn_delta_editor_t in
https://svn.apache.org/repos/asf/subversion/trunk/subversion/include/svn_diff.h

See all of https://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_client/merge.c.
 Though much of this is dedicated to the current merge tracking
implementation so you can probably skip a lot of it since you want to
be rid or it :-)

You might want to try some merges with the --ignore-ancestry option.
This disables merge tracking and performs the merge in what amounts to
a pre-1.5 manner and will allow you to get your head around the
essentials of how Subversion does a merge without worrying about
mergeinfo stuffs.

Paul