You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Greg Hudson <gh...@MIT.EDU> on 2002/12/22 00:58:01 UTC

Repeated merge strategies

Okay, with some help from Bill Tutt, I think have deciphered the
nature of the conflict between Branko and Nathanael Nerode on repeated
merge strategies.  We don't necessarily have to solve this issue
before 1.0, but I thought it might be nice to describe what's going in
as concise a matter as I can manage.

Here are the two general schemes:

The Most Recent Ancestor approach (MRA)  [advocated by Nathanael]
---------------------------------------

In this scheme, you record an optional set of merge sources in each
node-revision.  When asked to do a merge with only one source (that
is, just "svn merge URL", with no second argument), you compute the
most recent ancestor and do a three-way merge between the common
ancestor, the given URL, and the WC.

To compute the most recent ancestor, you chain off the immediate
predecessors of each node-revision.  The immediate predecessors are
the direct predecessor (the most recent node-revision within the node)
and the merge sources.  I guess the most efficient algorithm is an
interleaved breadth-first search.

The Ancestry Set approach (AS)           [advocated by Branko]
------------------------------

This is nicely documented in
http://svn.collab.net/repos/svn/trunk/doc/programmer/design/model.texi
under "Merging and Ancestry", but I'll give a capsule summary.

In this scheme, you record the full ancestry set for each
node-revision--that is, the set of all changes which are accounted for
in that node-revision.  (How you store this ancestry set is
unimportant; the point is, you need a reasonably efficient way of
determining it when asked.)  If you are asked to "svn merge URL", you
apply the changes present in URL's ancestry but absent in WC's
ancestry.  Note that this is not a single three-way merge; you may
have to apply a large number of disjoint changes to the WC.

Comparisons and arguments
-------------------------

AS allows you to merge changes from a branch out of order, without
doing any bookkeeping.  MRA requires you to merge changes from a
branch in order.

We'll need to consider what other version control systems do.  I'm
told AS is similar to how Bitkeeper and Arch work, but I don't know
how true that is.  If everyone does AS and we do MRA, we'll stand out
as needlessly different.

Branko argued that MRA would not reasonably handle the case of a
commit which reverts a previous change.  I guess he was assuming that
in the future Subversion will have some mechanism for recording that
fact; right now a commit which reverts a previous change looks exactly
the same as an original delta that happens to restore the previous
state.  To me the situation seems reversed; MRA (with no augmentation
for recording reversions) would attempt to apply a reversion delta,
while AS (augmented to record reversions by removing the reverted
delta from the ancestry set) would seem to never apply a reversion.
(For example, if I create a branch at rev 200, and on the branch I
revert rev 100, and then I merge that branch into the trunk, the AS
merge algorithm would seem to do nothing, since the trunk's ancestor
set is a superset of the branch's ancestor set.)

MRA may be simpler to implement, since it results in a single
three-way diff.

MRA may be easier for users to understand, even though AS is probably
simpler to a mathematician.  I have no experience with either system,
so I'm not sure.

MRA may break down faster if the merging topology becomes
non-hierarchical, but I'm having trouble imagining a reasonable use
case for a non-hierarchical merging topology.

An Open Question (applying to both schemes)
----------------

If a user asks to merge a directory, should we apply MRA or AS to each
subdirectory and file to determine what ancestor(s) to use?  Or should
we apply MRA or AS just once, to the directory itself?  The latter
approach seems simpler and more efficient, but will break down quickly
if the user wants to merge in subdirectories of a branch in advance of
merging in the whole thing.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Repeated merge strategies

Posted by Karl Fogel <kf...@newton.ch.collab.net>.
Greg Hudson <gh...@MIT.EDU> writes:
> Okay, with some help from Bill Tutt, I think have deciphered the
> nature of the conflict between Branko and Nathanael Nerode on repeated
> merge strategies.  We don't necessarily have to solve this issue
> before 1.0, but I thought it might be nice to describe what's going in
> as concise a matter as I can manage.

Thank you!  This email was a service to humanity :-).

Do you want to store this somewhere in notes/ so we don't lose track
of it?

Off the top of my head, I can't think of an ancestry-tracking
mechanism (MRA, AS, or otherwise) simple enough to go into 1.0.  But
maybe your email will inspire a proposal from someone...

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Repeated merge strategies

Posted by Nathanael Nerode <ne...@twcny.rr.com>.
Greg Hudson wrote:
> Okay, with some help from Bill Tutt, I think have deciphered the
> nature of the conflict between Branko and Nathanael Nerode on repeated
> merge strategies.  We don't necessarily have to solve this issue
> before 1.0, but I thought it might be nice to describe what's going in
> as concise a matter as I can manage.
Thank you for your calm, clear, voice of reason, and sorry about being 
irritable previously. :-)

> 
> Here are the two general schemes:
> 
> The Most Recent Ancestor approach (MRA)  [advocated by Nathanael]
> ---------------------------------------
> 
> In this scheme, you record an optional set of merge sources in each
> node-revision.  When asked to do a merge with only one source (that
> is, just "svn merge URL", with no second argument), you compute the
> most recent ancestor and do a three-way merge between the common
> ancestor, the given URL, and the WC.
> 
> To compute the most recent ancestor, you chain off the immediate
> predecessors of each node-revision.  The immediate predecessors are
> the direct predecessor (the most recent node-revision within the node)
> and the merge sources.  I guess the most efficient algorithm is an
> interleaved breadth-first search.
> 
> The Ancestry Set approach (AS)           [advocated by Branko]
> ------------------------------
> 
> This is nicely documented in
> http://svn.collab.net/repos/svn/trunk/doc/programmer/design/model.texi
> under "Merging and Ancestry", but I'll give a capsule summary.
> 
> In this scheme, you record the full ancestry set for each
> node-revision--that is, the set of all changes which are accounted for
> in that node-revision.  (How you store this ancestry set is
> unimportant; the point is, you need a reasonably efficient way of
This is the part which I hypothesized would require serious 
architectural work on Subversion.

> determining it when asked.)  If you are asked to "svn merge URL", you
> apply the changes present in URL's ancestry but absent in WC's
> ancestry.  Note that this is not a single three-way merge; you may
> have to apply a large number of disjoint changes to the WC.
> 
> Comparisons and arguments
> -------------------------
> 
> AS allows you to merge changes from a branch out of order, without
> doing any bookkeeping.  MRA requires you to merge changes from a
> branch in order.
> 
> We'll need to consider what other version control systems do.  I'm
> told AS is similar to how Bitkeeper and Arch work, but I don't know
> how true that is.  If everyone does AS and we do MRA, we'll stand out
> as needlessly different.
This is, of course, a very good argument for the AS method.

> Branko argued that MRA would not reasonably handle the case of a
> commit which reverts a previous change.  I guess he was assuming that
> in the future Subversion will have some mechanism for recording that
> fact; right now a commit which reverts a previous change looks exactly
> the same as an original delta that happens to restore the previous
> state.  To me the situation seems reversed; MRA (with no augmentation
> for recording reversions) would attempt to apply a reversion delta,
> while AS (augmented to record reversions by removing the reverted
> delta from the ancestry set) would seem to never apply a reversion.
> (For example, if I create a branch at rev 200, and on the branch I
> revert rev 100, and then I merge that branch into the trunk, the AS
> merge algorithm would seem to do nothing, since the trunk's ancestor
> set is a superset of the branch's ancestor set.)
> 
> MRA may be simpler to implement, since it results in a single
> three-way diff.
This was my primary argument in favor of it.

> MRA may be easier for users to understand, even though AS is probably
> simpler to a mathematician.  I have no experience with either system,
> so I'm not sure.
> 
> MRA may break down faster if the merging topology becomes
> non-hierarchical, but I'm having trouble imagining a reasonable use
Definitely.
> case for a non-hierarchical merging topology.
I'm sure there is one, but I seriously doubt it's at all common.

I want to add one comment: the schemes are not actually exclusive.  If 
one is implemented as the default for 'svn merge', it is quite feasible 
to implement the other as 'svn alternate-merge' at some later point, or 
to relegate the original one to 'alternate-merge' and make the new one 
the default 'merge'.

> An Open Question (applying to both schemes)
> ----------------
> 
> If a user asks to merge a directory, should we apply MRA or AS to each
> subdirectory and file to determine what ancestor(s) to use?  Or should
> we apply MRA or AS just once, to the directory itself?  The latter
> approach seems simpler and more efficient, but will break down quickly
> if the user wants to merge in subdirectories of a branch in advance of
> merging in the whole thing.

The latter approach is what I was thinking of.  With MRA I thought there 
might be gotchas in the first approach, with MRAs turning out to be from 
different revisions for different files... though the more I think about 
it, the more it seems like that actually doesn't cause any problems.

Well, whatever y'all decide, thanks for listening. :-)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Repeated merge strategies

Posted by Greg Hudson <gh...@MIT.EDU>.
Zack wrote:
> I suspect that the second most common repeat-merge situation does
> require ancestry sets -- you've got a development and a release
> branch, a few individual patches have been pushed from dev to rel,
> and now you want to merge the entire dev branch to release.

Wait a minute.  Wouldn't this operation be better handled by creating a
new release branch off the mainline?  Or, barring that, dumping the
contents of the mainline onto the release branch instead of doing a
merge?


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Repeated merge strategies

Posted by Nuutti Kotivuori <na...@iki.fi>.
Zack Weinberg wrote:
> The GCC team has documented this procedure and uses it for all
> branch development.  I suspect we're going to see 1.0 Subversion
> as a step backward in this regard if 'svn merge' can't do this at
> all, even with the equivalent kludges (cp-ing every merge point
> into the tags directory).  I don't know if 'svn merge' has the
> equivalent of cvs update -jTAG1 -jTAG2.

As already pointed out, 'svn merge' has the equivalent of 'cvs update
-jTAG1 -jTAG2'. But also, tracking these things is even easier. There
is no need to make a tag for each merge point, only to note the
revision at which it was made. This can be stored in the repository,
in log messages or externally, whatever. The only thing people will
need to know is that "What is the last revision that has already been
merged?" to perform the merge.

-- Naked


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Repeated merge strategies

Posted by Branko Čibej <br...@xbc.nu>.
Zack Weinberg wrote:

>I *think* it's plausible to do merge arrows, and in fact ancestor
>sets, entirely with revision properties.  For MRA functionality, all
>you need is a 'svn:mergedfrom' prop on the merged revision that points
>to the indirect ancestor.  AS is a bit trickier; you put a list of
>(merge start, merge end, add/subtract) tuples as properties on the
>merged revision, and walk up the revision graph building the set.
>  
>

I'd be scared to use properties for this, for two reasons. One is
implementation-specific -- the way we store properties makes them
unsuitable for large amounts of data. The other has to do with
semantics: If you want to stay sane while managing a plethora of
extraneous surplussage of branches, you have to be able to do complex
queries in the merge history. Props simply weren't designed for that.

Come to think of it, that's an implementation-specific reason, too...

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: Repeated merge strategies

Posted by Bill Tutt <ra...@lyra.org>.
> From: Bill Tutt [mailto:rassilon@lyra.org] 
> I think there's still a question in my mind if semi-automated 
> AS driven merges are worth doing. They seem fraught with 
> silent conflict potential. Esp. given the fact that we don't 
> version sub-file chunks of text specific to a programming 
> language. (i.e. something like Stellation or Intentional 
> Programming where the versioned unit is smaller than one
> file)
> 

When I say silent conflict potential I refer to scenarios such as this
comment from the design doc
(http://svn.collab.net/repos/svn/trunk/doc/programmer/design/model.texi)
:

@c Heh, what about this:
@c
@c    B:3 adds line 3, with the text "foo".
@c    B:5 deletes line 3.
@c    B:7 adds line 3, with the text "foo".
@c    B:9 deletes line 3.
@c
@c The user first merges B:5 and B:9 into A.  If A had that line, it
@c goes away now, nothing more.
@c 
@c Next, user merges B:3 and B:7 into A.  The second merge must
@c conflict.
@c


Bill


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: Repeated merge strategies

Posted by Bill Tutt <ra...@lyra.org>.
> From: Greg Hudson [mailto:ghudson@MIT.EDU] 
> On Sun, 2002-12-22 at 01:43, Zack Weinberg wrote:
> > Given all these things, I think it's plausible to talk 
> about doing MRA 
> > real soon (1.1 at the latest, 1.0 would be nice) and AS later, as a 
> > generalization.
> 
> I hope doing MRA sooner doesn't make it harder to do AS 
> later, if that's the eventual plan.
> 
> (Terminology note: I'm going to start calling it MRCA, for 
> "Most Recent Common Ancestor", instead of MRA, starting with 
> the file I'm committing to "notes".  Sorry for the confusion, 
> but I think it's clearer in the long run.)

Doing MRCA sooner only prevents us from auto-detecting removals from an
AS. We can easily build AS adds for each node revision given MRCA
information.

Even given AS information actually supporting AS driven
semi-automated-merging (as opposed to AS driven queries) is a
challenging problem on it's own. (as the old design doc notes)

Folks who worry about branch merging will still find AS driven query
mechanisms a useful feature for branch maintanance sanity even if we
don't easily support doing semi-automated AS driven merges.

I think there's still a question in my mind if semi-automated AS driven
merges are worth doing. They seem fraught with silent conflict
potential. Esp. given the fact that we don't version sub-file chunks of
text specific to a programming language. (i.e. something like Stellation
or Intentional Programming where the versioned unit is smaller than one
file)

> 
> > I *think* it's plausible to do merge arrows, and in fact ancestor 
> > sets, entirely with revision properties.  For MRA 
> functionality, all 
> > you need is a 'svn:mergedfrom' prop on the merged revision 
> that points 
> > to the indirect ancestor.  AS is a bit trickier; you put a list of 
> > (merge start, merge end, add/subtract) tuples as properties on the 
> > merged revision, and walk up the revision graph building the set.
> 
> You're only considering the simple merge-and-commit cases.
> 
>   svn co URL1 wc
>   (cd wc/subdir1 && svn merge URL2 URL3)
>   (cd wc/subdir1 && svn merge URL4 URL5)
>   (cd wc/subdir2 && svn merge URL6 URL7)
>   svn ci
> 

Yeah, Zack's "svn:mergedfrom" property clearly needs to be able to store
multiple merge sources.

Bill


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Repeated merge strategies

Posted by Greg Hudson <gh...@MIT.EDU>.
On Sun, 2002-12-22 at 01:43, Zack Weinberg wrote:
>    The GCC team has documented this procedure and uses it for all
>    branch development.  I suspect we're going to see 1.0 Subversion as
>    a step backward in this regard if 'svn merge' can't do this at all,
>    even with the equivalent kludges (cp-ing every merge point into the
>    tags directory).  I don't know if 'svn merge' has the equivalent of
>    cvs update -jTAG1 -jTAG2.

You can do it with the equivalent kludges.  svn merge only knows how to
do the equivalent of "-j TAG1 -j TAG2", as it turns out.  (In CVS, if
you only specify one "-j REV" option, CVS will apply pred(REV):REV to
the working copy, if I understand correctly.  We don't have that.)

> Given all these things, I think it's plausible to talk about doing MRA
> real soon (1.1 at the latest, 1.0 would be nice) and AS later, as a
> generalization.

I hope doing MRA sooner doesn't make it harder to do AS later, if that's
the eventual plan.

(Terminology note: I'm going to start calling it MRCA, for "Most Recent
Common Ancestor", instead of MRA, starting with the file I'm committing
to "notes".  Sorry for the confusion, but I think it's clearer in the
long run.)

> I *think* it's plausible to do merge arrows, and in fact ancestor
> sets, entirely with revision properties.  For MRA functionality, all
> you need is a 'svn:mergedfrom' prop on the merged revision that points
> to the indirect ancestor.  AS is a bit trickier; you put a list of
> (merge start, merge end, add/subtract) tuples as properties on the
> merged revision, and walk up the revision graph building the set.

You're only considering the simple merge-and-commit cases.

  svn co URL1 wc
  (cd wc/subdir1 && svn merge URL2 URL3)
  (cd wc/subdir1 && svn merge URL4 URL5)
  (cd wc/subdir2 && svn merge URL6 URL7)
  svn ci


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Repeated merge strategies

Posted by Nathanael Nerode <ne...@twcny.rr.com>.
Zack Weinberg wrote:
> Some things I'd like to point out:
> 
> 1) Most-recent-ancestor is a special case of ancestry-set.  When the
>    ancestry set of a revision includes all revisions on some
>    continuous path from the revision-tree root, the MRA algorithm can
>    be used.  If there's a discontinuity, you need the general AS merge
>    algorithm.
> 
> 2) Most-recent-ancestor covers what I think is the most common
>    repeat-merge situation: work has been done on a branch, but regular
>    main->branch merges have occurred, and now one wants to merge the
>    branch back to the mainline.  
> 
>    I suspect that the second most common repeat-merge situation does
>    require ancestry sets -- you've got a development and a release
>    branch, a few individual patches have been pushed from dev to rel,
>    and now you want to merge the entire dev branch to release.
Basically yes.  Without ancestry sets you get potential false conflicts 
in the merge, due specifically to the patches which have been pushed to 
rel.  A good 3-way merge actually eliminates many of these false 
conflicts, but ancestry sets eliminate them all.

> 3) It's possible to do most-recent-ancestor with CVS, through careful
>    use of labels: you tag the branchpoint, you tag the branch you're
>    merging from right before each merge, you tag the branch you're
>    merging to right after each merge, and you always give cvs update
>    two -j options.  It's a pain but it works.
> 
>    The GCC team has documented this procedure and uses it for all
>    branch development.  I suspect we're going to see 1.0 Subversion as
>    a step backward in this regard if 'svn merge' can't do this at all,
>    even with the equivalent kludges (cp-ing every merge point into the
>    tags directory).  I don't know if 'svn merge' has the equivalent of
>    cvs update -jTAG1 -jTAG2
> 
> Given all these things, I think it's plausible to talk about doing MRA
> real soon (1.1 at the latest, 1.0 would be nice) and AS later, as a
> generalization.  And even if nothing uses the information, I would
> like to see 'merge arrows' (borrowing terminology from ClearCase) in
> 1.0, so that it's there for later versions to use.  Going back and
> adding them to a repository after the fact is usually infeasible.
*Right*.
> I *think* it's plausible to do merge arrows, and in fact ancestor
> sets, entirely with revision properties.  For MRA functionality, all
> you need is a 'svn:mergedfrom' prop on the merged revision that points
This is what I felt we could get into 1.0 without too much work. If the 
merge tracking information is being put in by 'merge', we can at any 
rate handle repeated merging (by hand) a lot better than CVS.

> to the indirect ancestor.  AS is a bit trickier; you put a list of
> (merge start, merge end, add/subtract) tuples as properties on the
> merged revision, and walk up the revision graph building the set.
> 
> zw



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Repeated merge strategies

Posted by Zack Weinberg <za...@codesourcery.com>.
Some things I'd like to point out:

1) Most-recent-ancestor is a special case of ancestry-set.  When the
   ancestry set of a revision includes all revisions on some
   continuous path from the revision-tree root, the MRA algorithm can
   be used.  If there's a discontinuity, you need the general AS merge
   algorithm.

2) Most-recent-ancestor covers what I think is the most common
   repeat-merge situation: work has been done on a branch, but regular
   main->branch merges have occurred, and now one wants to merge the
   branch back to the mainline.  

   I suspect that the second most common repeat-merge situation does
   require ancestry sets -- you've got a development and a release
   branch, a few individual patches have been pushed from dev to rel,
   and now you want to merge the entire dev branch to release.

3) It's possible to do most-recent-ancestor with CVS, through careful
   use of labels: you tag the branchpoint, you tag the branch you're
   merging from right before each merge, you tag the branch you're
   merging to right after each merge, and you always give cvs update
   two -j options.  It's a pain but it works.

   The GCC team has documented this procedure and uses it for all
   branch development.  I suspect we're going to see 1.0 Subversion as
   a step backward in this regard if 'svn merge' can't do this at all,
   even with the equivalent kludges (cp-ing every merge point into the
   tags directory).  I don't know if 'svn merge' has the equivalent of
   cvs update -jTAG1 -jTAG2.

Given all these things, I think it's plausible to talk about doing MRA
real soon (1.1 at the latest, 1.0 would be nice) and AS later, as a
generalization.  And even if nothing uses the information, I would
like to see 'merge arrows' (borrowing terminology from ClearCase) in
1.0, so that it's there for later versions to use.  Going back and
adding them to a repository after the fact is usually infeasible.

I *think* it's plausible to do merge arrows, and in fact ancestor
sets, entirely with revision properties.  For MRA functionality, all
you need is a 'svn:mergedfrom' prop on the merged revision that points
to the indirect ancestor.  AS is a bit trickier; you put a list of
(merge start, merge end, add/subtract) tuples as properties on the
merged revision, and walk up the revision graph building the set.

zw

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org