You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Julian Foad <ju...@btopenworld.com> on 2012/05/10 13:50:48 UTC

Re: Two-phase merge

On 29 April, Stefan Fuhrmann wrote:
> Julian Foad wrote:
>> Stefan Fuhrmann wrote:
>>> [...] The proposal I make here is not
>>> an alternative to e.g. "Symmetric Merge" but rather a
>>> refined merge strategy that would also benefit the current
>>> merge.
[...]
>>> In end, we will always merge individual file or property
>>> changes but we plan the merge on tree-level. For many
>>> file nodes, this will unnecessary intermediate states
>>> where the merge got fragmented due to a change to
>>> some *other* file.
>> 
>> Yes.  That is likely to be a significant cause of extra conflict in cases
>> where the user has sub-tree mergeinfo or has done some cherry-pick merges
>> and is now doing a "merge everything else" merge.
> 
> It also happens in the "back and forth" case:


If this problem does arise in basic back-and-forth merging then I would agree it's much more important than I thought, 
but I don't get it.  If "it" 
refers to fragmenting the merge, then no, each merge in a back-and-forth case doesn't get fragmented.

> if the changes in the source are interspersed
> with merges from the target, i.e. the merges
> from A to B will be excluded from the changes
> to be merged from B to A.

Do you mean like this?

            A2      A3      A4
  A o-------o-------o-------o

   /         \             /

  o           \           /

   \           \         /

  B o---o-------o-------o

        B2      B3      B4   


The merge we're considering is the one shown from B4 to A4.  Changes B2 and B4 are changes in the source, interspersed with B3 which is a merge from the target, that is the result of a merge from A to B.

A2 will be chosen as the base for a 3-way merge, and the two arms of this merge will be (A2:B4) as the source side and (A2:A4) as the target side.  The source side, (A2:B4), is equivalent to the combination of B2 and B4.  It can be a bit tricky to understand why it is equivalent; if so, see <>.  The point is that changes B2 and B4 don't get split up and merged one after the other in this scenario.



>>>  This increases the probability of
>>>  merge conflicts and their likelihood to show up again
>>>  after each conflict resolution.
>> (That is: after each phase of the N revision-range phases that the merge 
>> got broken into.)
>> 
>>>  Moreover, it masks cases
>>>  where Symmetric Merge might select an optimized
>>>  merge plan (see wiki for various examples).
>>  I understand, but I don't know of any examples on the Wiki yet.
> 
> E.g. in a merge from A to B, we can simply take the merge
> result of the last B to A merge if there were no later changes
> on either side and the B -> A merge did not cherry-pick
> (with respect to the node in question).

>>>  I propose to use the following two-phase merge workflow:
>>> 
>>>  (1) Planning phase
>>>  - read / evaluate the merge info and change history
>>>     to find the list of revisions to merge
>>>  - break that down for each node changed on the source side
>>>  - optionally handle renaming etc.
>>>  - mark tree conflicts (not sure when to signal them to the
>>>     user and when to resolve them; high-level fragmentation
>>>     might become necessary)
>>> 
>>>  (2) Execution phase
>>>  - merge changes on a per-node basis, i.e. all changes of
>>>     that node get merged before the next node
>>>  - fragmentation may still be necessary for *that file*
>>>  - given the "merge plan" for a node, alternative merge
>>>     strategies may be chosen
>> 
>> OK, the significant change here is not so much the two-phase (although that 
>> has merit in itself) but rather turning the processing order inside-out from 
>> (conceptually)
>> 
>>     REV_RANGES = [contiguous-revision-ranges needed for TREE]
>>     for REV_RANGE in REV_RANGES:
>>       3-way merge REV_RANGE into the whole TREE
>> 
>> to (conceptually)
>> 
>>     for each NODE in the TREE:
>>       MERGES = [find the best sequence of 3-way merges to merge all needed 
>> changes into NODE]
>>       for MERGE in MERGES:
>>         3-way-merge MERGE into NODE
> 
> Correctly. With the tiny addition that all rename
> and tree conflict handling is done in the planning
> phase. That makes it repeatable and potentially
> reduces the number of conflicts (at most one per
> node instead of one per node and merge step).

I get how it reduces conflicts.  What do you mean by 'repeatable'?

> The second phase is also reduced to purely text-
> based processing because all the "structure"
> issues have already been resolved. But that might
> be more of a mental aid instead of a true reduction
> in complexity (we still might need to handle missing
> targets on disk etc.).


>> This is a good idea in itself.  It certainly brings advantages as
>> mentioned above.  I'm not sure what, if any, disadvantages it might have.
> 
> Memory consumption could be proportional to
> the number of changes and renames or such.
> Should not be a real-world problem as long as
> you don't merge more than a few 100k changes.
> 
> A more severe issue might be latency because
> the planning phase needs to gather / evaluate
> data from large sections of history. But that one
> can be improved ;)

Right.

>> I don't know that the scenarios this helps with are our main concern at
>> this point.  It would be good to keep this approach in mind when writing
>> or  designing any new merge code.  In terms of development effort I think
>> it's  more valuable to concentrate first on the advantages to to-and-fro
>> merging (most  commonly without subtree merges and without cherry-picks)
>> that the  'symmetric merge' brings.
> 
> I think it can become helpful as soon as we track renames because
> we need to remap on a per-node basis.

To track a renamed node, we need to be able to recognize and act on the information that each node in a merge will be found at three paths that may all be different: its path in the merge-base (aka source-left) tree, its path in the source (aka source-right) tree, and its path in the target tree.  Is that what you mean by "remap on a per-node basis"?  I think that's something we will need to do anyway, but I don't see that it makes much difference whether we do this once at the planning stage (for a two-phase merge), or do it separately for each sub-merge (for the split-into-revision-ranges method).

- Julian