You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@subversion.apache.org by Apache Wiki <wi...@apache.org> on 2011/10/06 18:55:21 UTC

[Subversion Wiki] Update of "MergeTrackingIdeas" by JulianFoad

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Subversion Wiki" for change notification.

The "MergeTrackingIdeas" page has been changed by JulianFoad:
http://wiki.apache.org/subversion/MergeTrackingIdeas

Comment:
(Manually) import doc from Google docs

New page:
''Design notes, plans and ideas by JulianFoad''

----
= Merge Tracking Ideas =
Ways in which we could improve on the 1.6 mergeinfo and merge tracking scheme.  Experimental thoughts, not fully thought out.

== Logical Change Tracking ==
This idea came out of a[[http://colabti.org/irclogger/irclogger_log/svn-dev?date=2011-09-06|discussion on IRC]].

The  idea is to define a more powerful form of merge tracking, as an upgrade  from the current (1.6) merge tracking.  The additional power is in  tracking “logical changes” as they are merged from one branch to another  to another in arbitrary ways, including back-and-forth and circular  patterns, and still being able to say what changes we “need” to merge  from branch B to branch C.

This is, I think, going some way towards what people sometimes call ''changeset-based merging''.

Advantages:

 * Support  more flexible branching topologies.  It doesn’t matter whether changes  have already been merged to the target branch directly from this source  branch or via any other chain of branches.
 * Enable the “reintegrate” purpose to be served by the same automatic merge algorithm as is used for catch-up.

The emphasis is on being able to ''detect''  and describe what logical changes are needed, without necessarily being  able to perform the merge automatically in all cases.  The current  state of thinking is that Subversion would be able perform the merge  when a candidate revision on the source branch is a merge of logical  changes that are either ''all'' needed on the target or ''none'' of them are needed.  But if this candidate revision is a merge and only ''some''  of the logical changes in it are needed, then Subversion would, we  suppose, stop and print a helpful description of the situation.  But  even that level of capability is a big improvement on 1.6, and helpful  diagnostics for unhandled situations are entirely the sort of assistance  a user should be entitled to expect from a VCS.

=== Principles ===
 * We track ''logical changes''.
 * User-facing goals defined in terms of merging the right set of ''logical changes''.
 * Each commit is either a ''logical change'' or a ''merge'' of ''logical changes''.

We introduce the concept of a ''logical change'' as the fundamental unit of change that is tracked.  A ''logical change''  starts life as a committed change that is not part of a merge.  When  that tree-content change is merged to another branch (adapted if  necessary to accommodate any physical and/or semantic differences  between the branches), the resulting commit is ''not'' a new ''logical change'' but rather is a ''merge''.  A ''merge'' is defined as a committed change that includes a mergeinfo change that brings in one or more ''logical changes''.  A ''logical change''  has a unique identifier (let’s say the branch and revision in which it  was originally committed) and is always identified by that same  identifier, no matter what branches it has been merged through or  whether it has been merged together with other ''logical changes'' in a single ''merge''.

We must be able to identify the ''logical changes'' in the system.  To identify the ''logical changes ''in existing 1.6 mergeinfo, we will classify each commit as a ''logical change'' if it is a change without any mergeinfo change, or else as a ''merge'' if it includes a mergeinfo change.  If it’s ''merge'',  then it brings in some pre-existing logical changes and/or merges.  By  scanning recursively into mergeinfo history, we can identify all the  original logical changes brought in by a merge.

The user-facing goals of merging are defined in terms of getting the right set of ''logical changes'' onto the target branch.  This is in contrast to the 1.6 scheme which is defined in terms of getting the complete set of ''commits on the source branch'' onto the target branch.  The difference is that we will select a commit in the source branch only if it is a ''logical change'' or if it is a ''merge'' that brings in ''logical changes'' that we don’t have; and not if it is a ''merge'' that brings in ''logical changes'' that we do already have.

Each ''candidate revision'' to merge from the source branch is merged iff it is, or is a merge that brings in, ''logical changes'' that we don’t yet have on the target.

 * If the candidate revision is a ''logical change'', then we merge it iff we don’t have that ''logical change'' on the source branch (as determined by the source branch’s mergeinfo).
 * If the candidate revision is a ''merge'', then we merge it if all the ''logical changes'' it brings in are ones we don’t already have, or we skip it if all the ''logical changes'' it brings in are ones we do already have.
 * If the candidate revision is a ''merge'' that brings in a some new ''logical changes''  that we don’t have and some that we do have, then Subversion can at  least bail out, telling the user which changes are present and which are  to be merged.  At the moment we don’t anticipate being able to untangle  the relevant parts of the physical edit from the source branch, nor to  fetch the required ''logical changes'' from their origin or from some other branch.

The ''reintegrate'' purpose is to bring all ''logical changes'' from the source branch that are not already in the target branch.  That is exactly the same as for a ''catch-up''  merge, and so the same algorithm can be used.  Users might still want  to specify the “--reintegrate” option because of the additional checks  that it performs before merging, but that would be optional and for the  user’s benefit not for the system’s benefit.  A plain automatic merge  would still work in that direction even if the old reintegration  constraints are not met.

=== Migration from 1.6 ===
See above about recursive scanning of 1.6 mergeinfo.

Retro-fitting the principle of ''logical changes''  onto an existing 1.6 merge history would seem to be a good fit, as it  is already common mantra and practice to separate new logical changes  from merges.  The consequences if this principle has occasionally not  been followed in the past would seem to be predictable and relatively  straightforward to recover from.  Where the history has been altered by  record-only merges or direct editing or removal of mergeinfo, however,  this method of classifying old commits may be untenable or need  augmenting with user input.  (Investigate?)

=== Other issues to explore / define ===
Reverse merges — first need to define basic semantics before can contemplate supporting.

Subtree merges — semantics?

Merging  into a mixed-rev WC — what special considerations apply? Does it help  to remember that a WC being merged into is commonly acting as a  proto-revision?

MI storage — semantics and format. per branch? whole history in one place?

=== Rules (differences from 1.6) ===
 1. '''###? '''The ''Feature Branch'' relationship can be applied with cycles in the graph of relationships.
  * A ⇒ B, B ⇒ A
  * A ⇒ B, B ⇒ C, C ⇒ A

 1. '''###? '''The ''Feature Branch'' relationship can be applied with multiple paths in the graph of relationships.
  * A ⇒ B, B ⇒ C, A ⇒ C
  * A ⇒ B, A ⇒ C, B ⇒ D, C ⇒ D

=== Requirements ===
 * Editable  merge history.  (Because it increases reliance on the correctness of  mergeinfo, and especially mergeinfo changes, which in the current 1.6  scheme is fragile.)
 * Quick(ish)  traversal of mergeinfo history.  This suggests a new storage model in  which all the historic mergeinfo (of a given branch?) is in one place.

=== A Worked Example ===
== Multiple Commits in One Logical Change ==
We might want to allow multiple revisions to be recorded as being components of the same ''merge''.   This would increase the power of merge tracking in a functional sense,  but it is “advanced” functionality and would require user awareness and  tool support to make use of it.

Merge  A:10 to B, committing the result initially as B:13, then doing some  more conflict resolution in B:14 and B:16.  Arrange somehow (by user  input, for example) for B:14 and B:16 also to be recorded as part of the  “same” merge: branch B revs 13, 14, 16 jointly comprise the merge of  A:10.  In a subsequent merge from B to C, assuming A:10 is already on C,  that would prevent B:13, B:14 and B:16 from being merged to C.

Maybe  worth designing in the ability, as it (at first sight) sounds like  something that could be unused and unimplemented at first and then  implemented later.  Until the merge algorithm pays attention to it ''and''  somebody populates it, those follow-ups B:14 and B:16 will simply be  merged to C and will conflict (physically and/or semantically) just like  happens today.

 . ###  What are the semantics exactly?  Does it matter whether A:10 is an  original change or a merge?  What gets complex when the merge has  multiple source changes?

== Distinguish Operative and No-op Source Revs ==
At  present the revs we record are ones that have been “considered” from  the source branch — regardless whether they contained an original change  or a merge or nothing at all.

 . ### My overall impression is this is not a useful avenue, but here’s the thought anyway.

The aim of a ''catch-up''  merge is to reach a state in which a single continuous revision range  (including all operative and no-op revs) is recorded as having been  merged from the immediate ''source branch''.   If there are gaps but all the gaps are no-op, the merge algorithm  searches those gaps and finds that there is nothing to do, and then  (potentially, and in practice actually) fills in those gaps in the  recorded mergeinfo.

If  we were to distinguish between operative and no-op revs, that would  help in displaying mergeinfo in a more user-friendly way.  ###  Specifics?

This info is already discoverable, it’s just not fast.

This distinction would come “for free” if we start recording ''logical changes'' rather than physical changes.  But then the question, “Are there any eligible changes to merge?” might be harder to answer.