You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@subversion.apache.org by Apache Wiki <wi...@apache.org> on 2011/10/06 18:55:21 UTC

[Subversion Wiki] Update of "MergeTrackingIdeas" by JulianFoad

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Subversion Wiki" for change notification.

The "MergeTrackingIdeas" page has been changed by JulianFoad:
http://wiki.apache.org/subversion/MergeTrackingIdeas

Comment:
(Manually) import doc from Google docs

New page:
''Design notes, plans and ideas by JulianFoad''

----
= Merge Tracking Ideas =
Ways in which we could improve on the 1.6 mergeinfo and merge tracking scheme. Experimental thoughts, not fully thought out.

== Logical Change Tracking ==
This idea came out of a[[http://colabti.org/irclogger/irclogger_log/svn-dev?date=2011-09-06|discussion on IRC]].

The idea is to define a more powerful form of merge tracking, as an upgrade from the current (1.6) merge tracking. The additional power is in tracking “logical changes” as they are merged from one branch to another to another in arbitrary ways, including back-and-forth and circular patterns, and still being able to say what changes we “need” to merge from branch B to branch C.

This is, I think, going some way towards what people sometimes call ''changeset-based merging''.

Advantages:

* Support more flexible branching topologies. It doesn’t matter whether changes have already been merged to the target branch directly from this source branch or via any other chain of branches.
* Enable the “reintegrate” purpose to be served by the same automatic merge algorithm as is used for catch-up.

The emphasis is on being able to ''detect'' and describe what logical changes are needed, without necessarily being able to perform the merge automatically in all cases. The current state of thinking is that Subversion would be able perform the merge when a candidate revision on the source branch is a merge of logical changes that are either ''all'' needed on the target or ''none'' of them are needed. But if this candidate revision is a merge and only ''some'' of the logical changes in it are needed, then Subversion would, we suppose, stop and print a helpful description of the situation. But even that level of capability is a big improvement on 1.6, and helpful diagnostics for unhandled situations are entirely the sort of assistance a user should be entitled to expect from a VCS.

=== Principles ===
* We track ''logical changes''.
* User-facing goals defined in terms of merging the right set of ''logical changes''.
* Each commit is either a ''logical change'' or a ''merge'' of ''logical changes''.

We introduce the concept of a ''logical change'' as the fundamental unit of change that is tracked. A ''logical change'' starts life as a committed change that is not part of a merge. When that tree-content change is merged to another branch (adapted if necessary to accommodate any physical and/or semantic differences between the branches), the resulting commit is ''not'' a new ''logical change'' but rather is a ''merge''. A ''merge'' is defined as a committed change that includes a mergeinfo change that brings in one or more ''logical changes''. A ''logical change'' has a unique identifier (let’s say the branch and revision in which it was originally committed) and is always identified by that same identifier, no matter what branches it has been merged through or whether it has been merged together with other ''logical changes'' in a single ''merge''.

We must be able to identify the ''logical changes'' in the system. To identify the ''logical changes ''in existing 1.6 mergeinfo, we will classify each commit as a ''logical change'' if it is a change without any mergeinfo change, or else as a ''merge'' if it includes a mergeinfo change. If it’s ''merge'', then it brings in some pre-existing logical changes and/or merges. By scanning recursively into mergeinfo history, we can identify all the original logical changes brought in by a merge.

The user-facing goals of merging are defined in terms of getting the right set of ''logical changes'' onto the target branch. This is in contrast to the 1.6 scheme which is defined in terms of getting the complete set of ''commits on the source branch'' onto the target branch. The difference is that we will select a commit in the source branch only if it is a ''logical change'' or if it is a ''merge'' that brings in ''logical changes'' that we don’t have; and not if it is a ''merge'' that brings in ''logical changes'' that we do already have.

Each ''candidate revision'' to merge from the source branch is merged iff it is, or is a merge that brings in, ''logical changes'' that we don’t yet have on the target.

* If the candidate revision is a ''logical change'', then we merge it iff we don’t have that ''logical change'' on the source branch (as determined by the source branch’s mergeinfo).
* If the candidate revision is a ''merge'', then we merge it if all the ''logical changes'' it brings in are ones we don’t already have, or we skip it if all the ''logical changes'' it brings in are ones we do already have.
* If the candidate revision is a ''merge'' that brings in a some new ''logical changes'' that we don’t have and some that we do have, then Subversion can at least bail out, telling the user which changes are present and which are to be merged. At the moment we don’t anticipate being able to untangle the relevant parts of the physical edit from the source branch, nor to fetch the required ''logical changes'' from their origin or from some other branch.

The ''reintegrate'' purpose is to bring all ''logical changes'' from the source branch that are not already in the target branch. That is exactly the same as for a ''catch-up'' merge, and so the same algorithm can be used. Users might still want to specify the “--reintegrate” option because of the additional checks that it performs before merging, but that would be optional and for the user’s benefit not for the system’s benefit. A plain automatic merge would still work in that direction even if the old reintegration constraints are not met.

=== Migration from 1.6 ===
See above about recursive scanning of 1.6 mergeinfo.

Retro-fitting the principle of ''logical changes'' onto an existing 1.6 merge history would seem to be a good fit, as it is already common mantra and practice to separate new logical changes from merges. The consequences if this principle has occasionally not been followed in the past would seem to be predictable and relatively straightforward to recover from. Where the history has been altered by record-only merges or direct editing or removal of mergeinfo, however, this method of classifying old commits may be untenable or need augmenting with user input. (Investigate?)

=== Other issues to explore / define ===
Reverse merges — first need to define basic semantics before can contemplate supporting.

Subtree merges — semantics?

Merging into a mixed-rev WC — what special considerations apply? Does it help to remember that a WC being merged into is commonly acting as a proto-revision?

MI storage — semantics and format. per branch? whole history in one place?

=== Rules (differences from 1.6) ===
1. '''###? '''The ''Feature Branch'' relationship can be applied with cycles in the graph of relationships.
* A ⇒ B, B ⇒ A
* A ⇒ B, B ⇒ C, C ⇒ A

1. '''###? '''The ''Feature Branch'' relationship can be applied with multiple paths in the graph of relationships.
* A ⇒ B, B ⇒ C, A ⇒ C
* A ⇒ B, A ⇒ C, B ⇒ D, C ⇒ D

=== Requirements ===
* Editable merge history. (Because it increases reliance on the correctness of mergeinfo, and especially mergeinfo changes, which in the current 1.6 scheme is fragile.)
* Quick(ish) traversal of mergeinfo history. This suggests a new storage model in which all the historic mergeinfo (of a given branch?) is in one place.

=== A Worked Example ===
== Multiple Commits in One Logical Change ==
We might want to allow multiple revisions to be recorded as being components of the same ''merge''. This would increase the power of merge tracking in a functional sense, but it is “advanced” functionality and would require user awareness and tool support to make use of it.

Merge A:10 to B, committing the result initially as B:13, then doing some more conflict resolution in B:14 and B:16. Arrange somehow (by user input, for example) for B:14 and B:16 also to be recorded as part of the “same” merge: branch B revs 13, 14, 16 jointly comprise the merge of A:10. In a subsequent merge from B to C, assuming A:10 is already on C, that would prevent B:13, B:14 and B:16 from being merged to C.

Maybe worth designing in the ability, as it (at first sight) sounds like something that could be unused and unimplemented at first and then implemented later. Until the merge algorithm pays attention to it ''and'' somebody populates it, those follow-ups B:14 and B:16 will simply be merged to C and will conflict (physically and/or semantically) just like happens today.

. ### What are the semantics exactly? Does it matter whether A:10 is an original change or a merge? What gets complex when the merge has multiple source changes?

== Distinguish Operative and No-op Source Revs ==
At present the revs we record are ones that have been “considered” from the source branch — regardless whether they contained an original change or a merge or nothing at all.

. ### My overall impression is this is not a useful avenue, but here’s the thought anyway.

The aim of a ''catch-up'' merge is to reach a state in which a single continuous revision range (including all operative and no-op revs) is recorded as having been merged from the immediate ''source branch''. If there are gaps but all the gaps are no-op, the merge algorithm searches those gaps and finds that there is nothing to do, and then (potentially, and in practice actually) fills in those gaps in the recorded mergeinfo.

If we were to distinguish between operative and no-op revs, that would help in displaying mergeinfo in a more user-friendly way. ### Specifics?

This info is already discoverable, it’s just not fast.

This distinction would come “for free” if we start recording ''logical changes'' rather than physical changes. But then the question, “Are there any eligible changes to merge?” might be harder to answer.