You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Julian Foad <ju...@btopenworld.com> on 2012/01/09 12:14:18 UTC

Implicit keep-alive after reintegrate merge

Merging would be simpler if the user didn't have to think about doing the "keep-alive dance" after a reintegrate [1].

I'm trying to teach the sync merge to detect when a reintegrate has been done, so that we could avoid the need for r45 in the following typical sequence:

| Rev  Branch Commit

|      A   B
|--------------------

| r10  X

|      | \

| r20  |   X  "New branch B, a copy of A."

|      |   |

| r30  |   X  "Modify B."

|      | / |

| r40  X   |  "Reintegrate from B into A."

|      | \ |

| r45* |   X  "Record-only merge to keep B alive."
|      |   |
| r50  X   |  "More work in A."

|      | \ |
| r60  |   X  "Sync from A into B."

A patch in progress is attached.  It makes 'merge' detect when one of the revision ranges to be merged would include changes that have 
already been merged in the opposite direction.  At present it stops the merge, giving a friendly error message.  I would like to make it skip the particular revision(s) and continue to merge all the other revisions in the range, but first we need to get the detection right.


Using the above example, if we don't do the r45 record-only merge, then during the last sync merge (which will be committed in r60), the normal merge tracking code first determines that the revision range to be merged is r11-r50 [2].  Then the new code detects that this incoming change brings in new mergeinfo about merges *from* the current target branch B.  Therefore the new code will stop and complain.

That's all right for simple cases but it's not good enough.

We can say for sure that when we reintegrated B to A (in A:40), that will have added new mergeinfo on A describing merges from B.  However, if change A:40 had instead been a different merge into A, let's say from C, it is still possible that merge might have brought along some new mergeinfo describing merges from B, because of the way mergeinfo is propagated from branch to branch.  Therefore, if we find that change A:40 adds new mergeinfo about merges from B, we cannot simply say that A:40 describes a reintegrate merge from B.  We need to look more closely.  That's what I'm currently working on.
Comments? 

- Julian


[1] See the end of <http://wiki.apache.org/subversion/KeepingReintegratedBranchAlive>.
[2] Assume HEAD is r50 at this time.

Re: Implicit keep-alive after reintegrate merge

Posted by Stefan Sperling <st...@elego.de>.
On Mon, Jan 30, 2012 at 10:01:23AM -0500, Mark Mielke wrote:
> No merge is perfect. The situation is either complex, or it is not
> complex - and moving resolution to the private branch is a matter of
> process - not a matter of algorithm.

Where did you get the idea that conflict resolution must happen
in the private branch?

You can run --reintegrate at any time. The only requirement[*] is that,
if you merged from the parent branch, you merged a revision range which
doesn't contain any gaps. But this merged range may be empty if you wish.
If you get conflicts during that merge you just resolve them as
with any other merge.

In svnbook terms, you don't need to sync to trunk before reintegration.
There is no such requirement in the tool.

[*] Well, you're also not supposed to have local changes in the WC
    and it must not be a mixed-rev WC. But those requirements don't
    matter for this discussion.

> The first thing the tool can do to be genuinely useful in this
> situation, is to accept some of the responsibility of detecting
> whether or not the race is one of these "diff3 is not idempotent"
> situations, and providing automatic handling.

And this is exactly what --reintegrate does for you.
It automatically eliminates spurious conflicts. It doesn't do so by
dynamically detecting such conflicts (and how on earth would you do
that???) but it avoids input which is unnecessary and can cause such
conflicts. So the end result is the same.

> If the case has been
> hit, then --reintegrate could be used as a form of "special error
> checking" where it does the same as "merge", except in the case that
> the merge has a true conflict with any particular element of the
> change set (as opposed to a potential conflict with the end result),
> where the results of diff3 would need to be "trusted", then it could
> bail and provide the user with the information required to resolve
> the conflict locally before submission.
>
> The second thing the tool can do to be genuinely useful in this
> situation, is to allow for this check to be overridden. If I didn't
> trust diff3 - I wouldn't use merges at all. Sometimes a source
> management tool just needs to help me resolve conflicts. Especially
> with merge tracking and intelligent designer workflows, many cases
> of so called "conflicts" touch unrelated lines of code, and it *is*
> safe to complete the merge, even to the integration stream. I should
> have the ability to choose to do this, rather than race for
> submission with 100 other users.

If you don't like what --reintegrate does you're free to use 2-URL
merges instead to merge whatever changes you like and run them all
through diff3.

In fact, --reintegrate is just there for convenience and it is
entirely optional. We usually do recommend it because it is quite easy
to mess up 2-URL merges if you don't know what you're doing, and because
most people are happy with less conflicts from diff3 if they can be
avoided without any risk.

> The worst thing the tool can do is to declare that "diff3 is
> idempotent therefore it should be disabled" during --reintegrate.
> Yuck. This is a partial solution and at least as I understand it -
> it is even dangerous.

Except... diff3 is not being disabled :)
It's just not fed changes which have already been applied.

> What happens if I use --reintegrate in a
> situation that actually does require merge resolution? Will every
> situation be blocked? Or will it take --reintegrate as a license to
> overwrite results, trusting that I can do all the necessary conflict
> checking myself? I have seen nothing so far that allows me to
> conclude that architecturally, Subversion requires the --reintegrate
> behaviour. It's a short cut in providing a complete branch merging
> solution for users of the system. Somebody started work on the
> canvas, and then drafted in the last corner rather than finish it.

Well, I suppose it just seemed strange to you because you didn't
understand the purpose and mechanics of --reintegrate. And you somehow
concluded that using it must be mandatory. But it isn't.

The complete and general merge solution already existed before
--reintegrate was added, in form of the 2-URL syntax.

Granted, it is affected by the same well-known shortcomings of
Subversion's merge operation (e.g. merging through moves doesn't
work yet). But apart from that I don't see how you could want something
more general because it's already as general as possible.

Re: Implicit keep-alive after reintegrate merge

Posted by Johan Corveleyn <jc...@gmail.com>.
On Mon, Jan 30, 2012 at 11:16 PM, Stefan Sperling <st...@elego.de> wrote:
> On Mon, Jan 30, 2012 at 10:37:51PM +0100, Johan Corveleyn wrote:
>> On Mon, Jan 30, 2012 at 10:19 PM, Stefan Sperling <st...@elego.de> wrote:
>> > But you cannot use the last-synced revision as left anchor either:
>> >  svn co b
>> >  svn merge b@r4 a@r6 b
>> > Because applying this delta reverts b@r5 (this change appears reversed
>> > in the diff between b@r4 and a@r6 since it isn't present on branch a).
>>
>> No, I don't think it does. The change b@r5 doesn't appear in this
>> diff, neither forward nor reversed. Say b@r5 adds a line in file b/X,
>> I see no reason this change (forward or reversed) would be part of the
>> difference between b@r4 and a@r6.
>
> Sorry, I made an error while transferring my experiment into an example.
>
> The problem happens if a non-merge commit (b@r4) happens prior to
> the first merge commit (b@r5), like this:
>
>                 +b@r2---b@r4---b@r5--b@r6---b@r8--
>       (branch) /               ^             ^ (merge 2)
>               /                | (merge 1)   |
>         --- a@r1---a@r3--------+-------a@r7--+-------
>
> b@r4 appears reversed in 'svn diff b@5 a@7' -- not good
> a@r3 appears in 'svn diff b@2 a@7' -- not good either, applied twice
> But 'svn merge a@r5 a@r7' works fine.

Ah yes, I see. Thanks for clarifying.

However, I'm still not convinced :-). I'm starting to think more about
the symmetry with the standard sync-reintegrate cycle. In one of your
previous mails in this thread you described reintegrate like this:

On Mon, Jan 30, 2012 at 2:51 PM, Stefan Sperling <st...@elego.de> wrote:
> On Mon, Jan 30, 2012 at 02:23:46PM +0100, Stefan Sperling wrote:
>> What we use during --reintegrate is (3.2 b).
>
> And here I'm catching myself spreading misinformation again.
>
> There is another tweak we use during reintegrate.
> Consider the graph again (fixed version):
>
>      (3)
>                 +-b@r2--+ b@r4--b@r5-b@r6 ----
>       (branch) /        ^             | (merge 2)
>               /         | (merge 1)   v
>         --- a@r1 ------a@r3-----------+- a@r7 ----
>
> If we used:
>
>  (3.2 b) svn co a@r6 wc; svn merge b@r4 b@r7 a
>
> we'd miss b@r2 during the merge. It won't be applied to branch a.
> But we want it in the changeset.
>
> So what really happens is:
>
>  (3.2 b) svn co a@r6 wc; svn merge a@r4 b@r6 a
>
> Note that a is compared to b, rather than b against its former self.
> This delta includes b@r2 because this change isn't yet on branch a.

So the left argument of the 2-URL merge is
target@last-rev-target-was-brought-in-sync (and the right is
source@HEAD). That makes sense.

If we translate this to our situation, i.e. the other way around, then
'svn merge b@2 a@7 b' would be the one. Because b@2 is the last time b
was still synced with a. But there is the problem of change a@r3 then
being applied twice. However, isn't this the same as the "multiple
reintegrate" problem, i.e. "implicit keep-alive after reintegrate"?

Your example is effectively reintegrating the same "branch" twice.
Which means the same problem applies. And maybe the solution is: we
should be able to skip the already "reintegrated" stuff, i.e. a@r3.
(I'm not sure anymore what the state-of-the-art is concerning the
implicit keep-alive stuff, but maybe it's that 'svn diff b@2 a@7'
needs to be adjusted by subtracting a@3 from it, because that's
already been applied)

So I'm guessing that if we can solve the "implicit keepalive after
reintegrate", i.e. let reintegrate skip the already integrated stuff,
we would no longer need --reintegrate, because everything can now be
done with a reintegrate merge.

-- 
Johan

Re: Implicit keep-alive after reintegrate merge

Posted by Stefan Sperling <st...@elego.de>.
On Mon, Jan 30, 2012 at 10:37:51PM +0100, Johan Corveleyn wrote:
> On Mon, Jan 30, 2012 at 10:19 PM, Stefan Sperling <st...@elego.de> wrote:
> > But you cannot use the last-synced revision as left anchor either:
> >  svn co b
> >  svn merge b@r4 a@r6 b
> > Because applying this delta reverts b@r5 (this change appears reversed
> > in the diff between b@r4 and a@r6 since it isn't present on branch a).
> 
> No, I don't think it does. The change b@r5 doesn't appear in this
> diff, neither forward nor reversed. Say b@r5 adds a line in file b/X,
> I see no reason this change (forward or reversed) would be part of the
> difference between b@r4 and a@r6.

Sorry, I made an error while transferring my experiment into an example.

The problem happens if a non-merge commit (b@r4) happens prior to
the first merge commit (b@r5), like this:

                 +b@r2---b@r4---b@r5--b@r6---b@r8--
       (branch) /               ^             ^ (merge 2)
               /                | (merge 1)   |
         --- a@r1---a@r3--------+-------a@r7--+-------

b@r4 appears reversed in 'svn diff b@5 a@7' -- not good
a@r3 appears in 'svn diff b@2 a@7' -- not good either, applied twice
But 'svn merge a@r5 a@r7' works fine.

Full transcript below.

+ rm -rf merge-test
+ mkdir -p merge-test
+ mkdir -p merge-test/a
+ echo alpha
+ > merge-test/a/alpha 
+ echo beta
+ > merge-test/a/beta 
+ svnadmin create /tmp/merge-test/repos
+ svn import merge-test/a file:////tmp/merge-test/repos/a -m importing project tree
Adding         merge-test/a/alpha
Adding         merge-test/a/beta

Committed revision 1.
+ svn copy file:////tmp/merge-test/repos/a file:////tmp/merge-test/repos/b -m creating b

Committed revision 2.
+ rm -rf merge-test/a
+ svn checkout file:////tmp/merge-test/repos/a merge-test/a
A    merge-test/a/alpha
A    merge-test/a/beta
Checked out revision 2.
+ svn checkout file:////tmp/merge-test/repos/b merge-test/b
A    merge-test/b/alpha
A    merge-test/b/beta
Checked out revision 2.
+ echo foo
+ >> merge-test/a/alpha 
+ svn commit merge-test/a -mm
Sending        merge-test/a/alpha
Transmitting file data .
Committed revision 3.
+ echo bar
+ >> merge-test/b/beta 
+ svn commit merge-test/b -mm
Sending        merge-test/b/beta
Transmitting file data .
Committed revision 4.
+ svn up merge-test/b
Updating 'merge-test/b':
At revision 4.
+ svn merge file:////tmp/merge-test/repos/b@2 file:////tmp/merge-test/repos/a@3 merge-test/b
--- Merging differences between repository URLs into 'merge-test/b':
U    merge-test/b/alpha
--- Recording mergeinfo for merge between repository URLs into 'merge-test/b':
 U   merge-test/b
+ svn commit merge-test/b -mm
Sending        merge-test/b
Sending        merge-test/b/alpha
Transmitting file data .
Committed revision 5.
+ echo bar2
+ >> merge-test/b/beta 
+ svn commit merge-test/b -mm
Sending        merge-test/b/beta
Transmitting file data .
Committed revision 6.
+ echo foo2
+ >> merge-test/a/alpha 
+ svn commit merge-test/a -mm
Sending        merge-test/a/alpha
Transmitting file data .
Committed revision 7.
+ svn diff file:////tmp/merge-test/repos/b@2 file:////tmp/merge-test/repos/a@7
Index: alpha
===================================================================
--- alpha	(.../b)	(revision 2)
+++ alpha	(.../a)	(revision 7)
@@ -1 +1,3 @@
 alpha
+foo
+foo2
+ svn up merge-test/b
Updating 'merge-test/b':
At revision 7.
+ svn merge --accept=postpone file:////tmp/merge-test/repos/b@2 file:////tmp/merge-test/repos/a@7 merge-test/b
--- Merging differences between repository URLs into 'merge-test/b':
C    merge-test/b/alpha
--- Recording mergeinfo for merge between repository URLs into 'merge-test/b':
 U   merge-test/b
Summary of conflicts:
  Text conflicts: 1
+ svn diff merge-test/b
Index: merge-test/b
===================================================================
--- merge-test/b	(revision 7)
+++ merge-test/b	(working copy)

Property changes on: merge-test/b
___________________________________________________________________
Modified: svn:mergeinfo
   Merged /a:r4-7
Index: merge-test/b/alpha
===================================================================
--- merge-test/b/alpha	(revision 7)
+++ merge-test/b/alpha	(working copy)
@@ -1,2 +1,7 @@
 alpha
+<<<<<<< .working
 foo
+=======
+foo
+foo2
+>>>>>>> .merge-right.r7
+ svn revert -R merge-test/b
Reverted 'merge-test/b'
Reverted 'merge-test/b/alpha'
+ svn diff file:////tmp/merge-test/repos/b@5 file:////tmp/merge-test/repos/a@7
Index: alpha
===================================================================
--- alpha	(.../b)	(revision 5)
+++ alpha	(.../a)	(revision 7)
@@ -1,2 +1,3 @@
 alpha
 foo
+foo2
Index: beta
===================================================================
--- beta	(.../b)	(revision 5)
+++ beta	(.../a)	(revision 7)
@@ -1,2 +1 @@
 beta
-bar
Index: .
===================================================================
--- .	(.../b)	(revision 5)
+++ .	(.../a)	(revision 7)

Property changes on: .
___________________________________________________________________
Deleted: svn:mergeinfo
   Reverse-merged /a:r2-3
+ svn merge --accept=postpone file:////tmp/merge-test/repos/b@5 file:////tmp/merge-test/repos/a@7 merge-test/b
--- Merging differences between repository URLs into 'merge-test/b':
U    merge-test/b/alpha
C    merge-test/b/beta
 U   merge-test/b
--- Recording mergeinfo for merge between repository URLs into 'merge-test/b':
 G   merge-test/b
Summary of conflicts:
  Text conflicts: 1
+ svn diff merge-test/b
Index: merge-test/b
===================================================================
--- merge-test/b	(revision 7)
+++ merge-test/b	(working copy)

Property changes on: merge-test/b
___________________________________________________________________
Modified: svn:mergeinfo
   Merged /a:r4-7
Index: merge-test/b/alpha
===================================================================
--- merge-test/b/alpha	(revision 7)
+++ merge-test/b/alpha	(working copy)
@@ -1,2 +1,3 @@
 alpha
 foo
+foo2
Index: merge-test/b/beta
===================================================================
--- merge-test/b/beta	(revision 7)
+++ merge-test/b/beta	(working copy)
@@ -1,3 +1,6 @@
 beta
+<<<<<<< .working
 bar
 bar2
+=======
+>>>>>>> .merge-right.r7
+ svn revert -R merge-test/b
Reverted 'merge-test/b'
Reverted 'merge-test/b/alpha'
Reverted 'merge-test/b/beta'
+ svn merge file:////tmp/merge-test/repos/a@5 file:////tmp/merge-test/repos/a@7 merge-test/b
--- Merging r6 through r7 into 'merge-test/b':
U    merge-test/b/alpha
--- Recording mergeinfo for merge of r6 through r7 into 'merge-test/b':
 U   merge-test/b
+ svn diff merge-test/b
Index: merge-test/b
===================================================================
--- merge-test/b	(revision 7)
+++ merge-test/b	(working copy)

Property changes on: merge-test/b
___________________________________________________________________
Modified: svn:mergeinfo
   Merged /a:r6-7
Index: merge-test/b/alpha
===================================================================
--- merge-test/b/alpha	(revision 7)
+++ merge-test/b/alpha	(working copy)
@@ -1,2 +1,3 @@
 alpha
 foo
+foo2

Re: Implicit keep-alive after reintegrate merge

Posted by Branko Čibej <br...@apache.org>.
On 31.01.2012 12:08, Julian Foad wrote:
> (I'm assuming that your b@r2 in this example is a simple copy with no diff,
>  otherwise the first part of 'merge 3' should use a@r1 instead of b@r2 for its base.)
>
> But no, merge 3 should be even simpler than that.  We don't have to go and fetch the old (b@r2:b@r4) and (b@r5:b@r6) diffs again and merge them into (a), because that's already been done recently and stored as (b@r8).  So all we have to do is locate the youngest common ancestor of (b@r8) and (a@head), which is (a@r7), and do a single 3-way merge:
>
>   * merge 3: (mine=a@head, old=a@r7, yours=b@head)
>     diff3 a@r7 a@r7 b@r8 | patch a
>
> (In this case this merge is trivial because mine==old, but if there had been a further change on (a), say in (a@r8), then it would be non-trivial.)

Yeah, you're right, I forgot that b@r8 already has all the info. I got
muddled by my own example. :)
Since there are no cherry-picks in a->b merges, there would of course be
none in b->a merges, either.

[...]

> Locating the youngest common ancestor is a step that needs further 
> explanation; it's not trivially obvious.  We mean to follow the "merge 
> arrows" that are shown on these graphs, as well as the basic lines of 
> history.  Each of these "merge arrows" means that all changes in the 
> source branch have been merged into the target branch at that point.  When modify this algorithm to cope with cherry-picking, that is
>  to skip revisions that have already been cherry-picked, which we will need to do, then we will also have another kind of merge arrow on our graphs, a kind that doesn't mean all source changes have been merged to the target.  The algorithm will get more complex.  I won't talk about that here.)

Yes, it's complex. I'm almost sure that the current mergeinfo format can
represent cherry-picks correctly, though, so at least you won't have to
extend the format.

Cherry-picking is the part that really makes this hairy. But it might be
worth looking at the poor, defunct svnmerge.py, which /could/ properly
handle cherry-picks, at least most of the time. :)

Big cheer for grabbing this dragon by the horns, Julian.

-- Brane

Re: Implicit keep-alive after reintegrate merge

Posted by Julian Foad <ju...@btopenworld.com>.
I agree that a symmetric merge algorithm would suffice for both 'sync' and 'reintegrate' merges.


I'm starting to write this up here: <http://wiki.apache.org/subversion/SvnMergeTheory>.

Subversion's current merge implementation makes assumptions in both the 'sync' and 'reintegrate' cases.  (The part of the 'algorithm' I'm talking about is the part that looks at the mergeinfo and determines which three-way merge to perform next.)  Specifically, I think the main assumptions are that 'sync' assumes the base for the 3-way merge is going to be on the source branch, while 'reintegrate' assumes the base is going to be on the target branch.


A slightly more general merge algorithm should be able to pick a base on either the source or the target branch depending on what was merged before, and so satisfy both of those cases.  In order to do so, it needs to look more deeply at the mergeinfo that the courrent implementation does. 


Branko Čibej wrote:
> [...] given the slightly more complex example:
> 
>                  +b@r2--b@r4-+-b@r5---b@r6---+--b@r8-----
>        (branch) /            ^                ^ (merge 2) \
>                /              | (merge 1)      |            \ (merge 3)
>          --- a@r1---a@r3-----+---------a@r7--+-------------+-a@r9 ---
> 
> the results should be, effectively:
> 
>   * merge 1:
>     diff3 b@r4 a@r1 a@r3 | patch b
>   * merge 2:
>     diff3 b@r6 a@r3 a@r7 | patch b
>   * merge 3:
>     diff3 a@r7 b@r2 b@r4 | patch a
>     diff3 a@r7 b@r5 b@r6 | patch a

(I'm assuming that your b@r2 in this example is a simple copy with no diff,
 otherwise the first part of 'merge 3' should use a@r1 instead of b@r2 for its base.)

But no, merge 3 should be even simpler than that.  We don't have to go and fetch the old (b@r2:b@r4) and (b@r5:b@r6) diffs again and merge them into (a), because that's already been done recently and stored as (b@r8).  So all we have to do is locate the youngest common ancestor of (b@r8) and (a@head), which is (a@r7), and do a single 3-way merge:

  * merge 3: (mine=a@head, old=a@r7, yours=b@head)
    diff3 a@r7 a@r7 b@r8 | patch a

(In this case this merge is trivial because mine==old, but if there had been a further change on (a), say in (a@r8), then it would be non-trivial.)

> Merge 3 is a cherry-pick merge, of course. But whatever you do, you
> always pick your common ancestor so that it's the most recent merge
> point from the revision you're merging, and the "myfile" is always 
> the
> most recent version on the branch you're merging to (or, effectively, in
> the target WC).
> 
> You'll be cherry-picking as long as both branches are being actively
> modified, but you always have to do the check.

I don't follow all that about cherry-picking.

> The merge algorithm is symmetric.

That, I do agree with.

> SVN's mergeinfo has all the data that are
> required to automate this merging, it's just not being used correctly.

I know that Subversion stores sufficient info after a sync, and how to derive those merge arrows accurately from it; I'm about to check exactly what mergeinfo it stores after a reintegrate and make sure that's also sufficient.

Locating the youngest common ancestor is a step that needs further 
explanation; it's not trivially obvious.  We mean to follow the "merge 
arrows" that are shown on these graphs, as well as the basic lines of 
history.  Each of these "merge arrows" means that all changes in the 
source branch have been merged into the target branch at that point.  When modify this algorithm to cope with cherry-picking, that is
 to skip revisions that have already been cherry-picked, which we will need to do, then we will also have another kind of merge arrow on our graphs, a kind that doesn't mean all source changes have been merged to the target.  The algorithm will get more complex.  I won't talk about that here.)

- Julian

Re: Implicit keep-alive after reintegrate merge

Posted by Branko Čibej <br...@apache.org>.
On 30.01.2012 22:19, Stefan Sperling wrote:
> On Mon, Jan 30, 2012 at 09:38:15PM +0100, Johan Corveleyn wrote:
>> No, AFAIU, Brane's suggestion was not that we shouldn't use the
>> "reintegrate-way" for 3.2, but rather that we should *always* use the
>> "reintegrate-way", also for sync merges. So that a sync merge picks
>> its arguments for the 2-URL merge in the same way as a reintegrate
>> merge. Unless I misunderstood what Brane meant.
>>
>> And I think he's got a point. I don't have the time to write up a
>> detailed example right now, but I think it should work.
>>
>> If that would be the case, then we effectively implement the merges
>> completely symmetrical: always the "reintegrate-way".
> Counter-example:
>
>                  +b@r2---b@r4---b@r5------b@r7-----
>        (branch) /        ^                ^ (merge 2)
>                /         | (merge 1)      |
>          --- a@r1---a@r3-+---------a@r6--+-------
>
> This performs two "sync" merges from a to b.
>
> The first merge can be done the "reintegrate way":
>
>   svn co b
>   svn merge b@r2 a@r3 b
>  
> This merge applies the a@r3 change to b@r2, yielding b@r4. Fine.
>
> But how would you perform the second merge, which applies a@r6 to
> b@r5 yielding b@r7, using the "reintegrate way", without undoing
> b@r5 (a non-merge commit)?

The second mege is exactly the same as if the branch were created from
a@r3, not a@r1. Right? In your example, this is even trivially true,
since there were no changes on the branch between r2 and r4. But given
the slightly more complex example:

                 +b@r2--b@r4-+-b@r5---b@r6---+--b@r8-----
       (branch) /            ^               ^ (merge 2) \
               /             | (merge 1)     |            \ (merge 3)
         --- a@r1---a@r3-----+---------a@r7--+-------------+-a@r9 ---


the results should be, effectively:

  * merge 1:
    diff3 b@r4 a@r1 a@r3 | patch b
  * merge 2:
    diff3 b@r6 a@r3 a@r7 | patch b
  * merge 3:
    diff3 a@r7 b@r2 b@r4 | patch a
    diff3 a@r7 b@r5 b@r6 | patch a

Merge 3 is a cherry-pick merge, of course. But whatever you do, you
always pick your common ancestor so that it's the most recent merge
point from the revision you're merging, and the "myfile" is always the
most recent version on the branch you're merging to (or, effectively, in
the target WC).

You'll be cherry-picking as long as both branches are being actively
modified, but you always have to do the check. The merge algorithm is
symmetric.

Now I don't know how the above merges translate into "svn merge" syntax
and whether or not --reintegrate does it this way, but that's how you do
it manually with diff3. SVN's mergeinfo has all the data that are
required to automate this merging, it's just not being used correctly.

-- Brane


Re: Implicit keep-alive after reintegrate merge

Posted by Johan Corveleyn <jc...@gmail.com>.
On Mon, Jan 30, 2012 at 10:19 PM, Stefan Sperling <st...@elego.de> wrote:
> On Mon, Jan 30, 2012 at 09:38:15PM +0100, Johan Corveleyn wrote:
>> No, AFAIU, Brane's suggestion was not that we shouldn't use the
>> "reintegrate-way" for 3.2, but rather that we should *always* use the
>> "reintegrate-way", also for sync merges. So that a sync merge picks
>> its arguments for the 2-URL merge in the same way as a reintegrate
>> merge. Unless I misunderstood what Brane meant.
>>
>> And I think he's got a point. I don't have the time to write up a
>> detailed example right now, but I think it should work.
>>
>> If that would be the case, then we effectively implement the merges
>> completely symmetrical: always the "reintegrate-way".
>
> Counter-example:
>
>                 +b@r2---b@r4---b@r5------b@r7-----
>       (branch) /        ^                ^ (merge 2)
>               /         | (merge 1)      |
>         --- a@r1---a@r3-+---------a@r6--+-------
>
> This performs two "sync" merges from a to b.
>
> The first merge can be done the "reintegrate way":
>
>  svn co b
>  svn merge b@r2 a@r3 b
>
> This merge applies the a@r3 change to b@r2, yielding b@r4. Fine.
>
> But how would you perform the second merge, which applies a@r6 to
> b@r5 yielding b@r7, using the "reintegrate way", without undoing
> b@r5 (a non-merge commit)?
>
> You cannot do this:
>  svn co b
>  svn merge b@r2 a@r6 b
> because this applies the a@r3 change again (conflict due to diff3)
> since it re-uses the branch point as left anchor for the diff.
>
> But you cannot use the last-synced revision as left anchor either:
>  svn co b
>  svn merge b@r4 a@r6 b
> Because applying this delta reverts b@r5 (this change appears reversed
> in the diff between b@r4 and a@r6 since it isn't present on branch a).

No, I don't think it does. The change b@r5 doesn't appear in this
diff, neither forward nor reversed. Say b@r5 adds a line in file b/X,
I see no reason this change (forward or reversed) would be part of the
difference between b@r4 and a@r6.

> The way to specify the diff you want to merge is thus:
>  svn co b
>  svn merge a@r3 a@r6 b
> Which is what "svn merge ^/trunk" would do.

AFAICS, diff(b@r4, a@r6) should be identical to diff(a@r3, a@r6),
modulo the variation that is specific to b@<4. Which seems to be
effectively desirable, because that makes it easier for diff3 (same as
for reintegrate).

-- 
Johan

Re: Implicit keep-alive after reintegrate merge

Posted by Stefan Sperling <st...@elego.de>.
On Mon, Jan 30, 2012 at 09:38:15PM +0100, Johan Corveleyn wrote:
> No, AFAIU, Brane's suggestion was not that we shouldn't use the
> "reintegrate-way" for 3.2, but rather that we should *always* use the
> "reintegrate-way", also for sync merges. So that a sync merge picks
> its arguments for the 2-URL merge in the same way as a reintegrate
> merge. Unless I misunderstood what Brane meant.
> 
> And I think he's got a point. I don't have the time to write up a
> detailed example right now, but I think it should work.
> 
> If that would be the case, then we effectively implement the merges
> completely symmetrical: always the "reintegrate-way".

Counter-example:

                 +b@r2---b@r4---b@r5------b@r7-----
       (branch) /        ^                ^ (merge 2)
               /         | (merge 1)      |
         --- a@r1---a@r3-+---------a@r6--+-------

This performs two "sync" merges from a to b.

The first merge can be done the "reintegrate way":

  svn co b
  svn merge b@r2 a@r3 b
 
This merge applies the a@r3 change to b@r2, yielding b@r4. Fine.

But how would you perform the second merge, which applies a@r6 to
b@r5 yielding b@r7, using the "reintegrate way", without undoing
b@r5 (a non-merge commit)?

You cannot do this:
  svn co b
  svn merge b@r2 a@r6 b
because this applies the a@r3 change again (conflict due to diff3)
since it re-uses the branch point as left anchor for the diff.

But you cannot use the last-synced revision as left anchor either:
  svn co b
  svn merge b@r4 a@r6 b
Because applying this delta reverts b@r5 (this change appears reversed
in the diff between b@r4 and a@r6 since it isn't present on branch a).

The way to specify the diff you want to merge is thus:
  svn co b
  svn merge a@r3 a@r6 b
Which is what "svn merge ^/trunk" would do.

Maybe there is a flaw and it could be made to work if we somehow
changed the way 'svn merge' operates. But I don't see how.

Re: Implicit keep-alive after reintegrate merge

Posted by Stefan Sperling <st...@elego.de>.
On Mon, Jan 30, 2012 at 02:23:46PM +0100, Stefan Sperling wrote:
>      (3)
>                 +-b@r2--+ b@r3--b@r4-b@r5 ----
>       (branch) /        ^             | (merge 2)
>               /         | (merge 1)   v
>         --- a@r1 ------a@r2-----------+- a@r6 ----
> 
> Merge 1 brings a@r2 into b@r2.
> Merge 2 brings b@r4 into a@r5.

Hmpf. I tweaked this before hitting 'send' and some of the numbers
are off. You get the idea:

      (3)
                 +-b@r2--+ b@r4--b@r5-b@r6 ----
       (branch) /        ^             | (merge 2)
               /         | (merge 1)   v
         --- a@r1 ------a@r3-----------+- a@r7 ----

Re: Implicit keep-alive after reintegrate merge

Posted by Stefan Sperling <st...@elego.de>.
On Mon, Jan 30, 2012 at 02:23:46PM +0100, Stefan Sperling wrote:
> What we use during --reintegrate is (3.2 b).

And here I'm catching myself spreading misinformation again.

There is another tweak we use during reintegrate.
Consider the graph again (fixed version):

      (3)
                 +-b@r2--+ b@r4--b@r5-b@r6 ----
       (branch) /        ^             | (merge 2)
               /         | (merge 1)   v
         --- a@r1 ------a@r3-----------+- a@r7 ----

If we used:

  (3.2 b) svn co a@r6 wc; svn merge b@r4 b@r7 a
 
we'd miss b@r2 during the merge. It won't be applied to branch a.
But we want it in the changeset.

So what really happens is:

  (3.2 b) svn co a@r6 wc; svn merge a@r4 b@r6 a

Note that a is compared to b, rather than b against its former self.
This delta includes b@r2 because this change isn't yet on branch a.

I'll readily admit that my initial statement about reintegrate
taking no shortcuts may not be correct, depending on the definition
of "shortcut". However, this is all about driving diff3 in a way that
produces results without spurious conflicts, rather than a general
mistake in applying some CM theory merge model.

Re: Implicit keep-alive after reintegrate merge

Posted by Johan Corveleyn <jc...@gmail.com>.
On Mon, Jan 30, 2012 at 2:23 PM, Stefan Sperling <st...@elego.de> wrote:
> On Tue, Jan 24, 2012 at 01:12:39AM +0100, Branko Čibej wrote:
>> By the way, I read Stefan's description of why --reintegrate is
>> necessary, and after slogging through the unfortunate terminology (2-URL
>> merge doesn't mean a thing in CM theory :) and one little bit caught my
>> attention:
>>
>> > A sync merge can fill in the all parameters as well, except PATH2.
>> > However, it needs to do so in a different way. With a sync merge
>> > PATH1 and PATH2 are the same
>>
>> I keep reading this in the context of the rest of the reasoning, any my
>> reaction is still: "WTF? Bogus!" This looks like someone /started off/
>> with the assumption that a sync merge can take shortcuts where a
>> reintegrate merge cannot; but, so sorry, that's just plain nonsense.

[ ... ]

> Generally, we want to avoid spurious conflicts from diff3 which happen
> when changes are applied twice because diff3 is not idempotent.
> I.e. we break the nice symmetry to work around a limitation of diff3.
>
> In the following case we can avoid spurious conflicts by picking
> our parameters carefully:
>
>     (3)
>                +-b@r2--+ b@r3--b@r4-b@r5 ----
>      (branch) /        ^             | (merge 2)
>              /         | (merge 1)   v
>        --- a@r1 ------a@r2-----------+- a@r6 ----
>
> Merge 1 brings a@r2 into b@r2.
> Merge 2 brings b@r4 into a@r5.
>
>  (3.1) svn co b@r2 wc; svn merge a@r1 a@r2 b
>
> There are two ways of performing merge 2.
> The first is symmetrical and re-applies a@r2 to a@r6, via b@r3,
> with possible spurious conflicts from diff3:
>
>  (3.2 a) svn co a@r5 wc; svn merge b@r2 b@r5 a
>
> The second does not re-apply a@r2, so there are no possible conflicts
> from diff3 because of a@r2/b@r3. Only b@r4 can conflict.
>
>  (3.2 b) svn co a@r5 wc; svn merge b@r3 b@r5 a
>
> The result is the same, however.
>
> What we use during --reintegrate is (3.2 b).
> You can argue that this approach is broken and we should be using (3.2 a)
> for symmetry, and let users deal with spurious conflicts.

No, AFAIU, Brane's suggestion was not that we shouldn't use the
"reintegrate-way" for 3.2, but rather that we should *always* use the
"reintegrate-way", also for sync merges. So that a sync merge picks
its arguments for the 2-URL merge in the same way as a reintegrate
merge. Unless I misunderstood what Brane meant.

And I think he's got a point. I don't have the time to write up a
detailed example right now, but I think it should work.

If that would be the case, then we effectively implement the merges
completely symmetrical: always the "reintegrate-way".

-- 
Johan

Re: Implicit keep-alive after reintegrate merge

Posted by Mark Mielke <ma...@mark.mielke.cc>.
Stefan: I believe you are agreeing that the merge in either direction is 
the same complexity, and describing how --reintegrate moves the 
responsibility for the complexity to the owner of the private branch, 
and requires resolution before submission. I think you are saying this 
is a good thing because diff3 isn't perfect.

In my experience:

No merge is perfect. The situation is either complex, or it is not 
complex - and moving resolution to the private branch is a matter of 
process - not a matter of algorithm. That is, it is the responsibility 
of the team to decide that "we will always make sure our private branch 
is up to date before submitting to the integration stream."

In particular, if I have a stream with 100 users working in parallel, 
all submitting on a regular basis because this is their full time paid 
job to work on a piece of software, it may be a race to actually get the 
submission - depending on if the algorithm can detect whether the same 
files are being changed or not.

The first thing the tool can do to be genuinely useful in this 
situation, is to accept some of the responsibility of detecting whether 
or not the race is one of these "diff3 is not idempotent" situations, 
and providing automatic handling. If the case has been hit, then 
--reintegrate could be used as a form of "special error checking" where 
it does the same as "merge", except in the case that the merge has a 
true conflict with any particular element of the change set (as opposed 
to a potential conflict with the end result), where the results of diff3 
would need to be "trusted", then it could bail and provide the user with 
the information required to resolve the conflict locally before submission.

The second thing the tool can do to be genuinely useful in this 
situation, is to allow for this check to be overridden. If I didn't 
trust diff3 - I wouldn't use merges at all. Sometimes a source 
management tool just needs to help me resolve conflicts. Especially with 
merge tracking and intelligent designer workflows, many cases of so 
called "conflicts" touch unrelated lines of code, and it *is* safe to 
complete the merge, even to the integration stream. I should have the 
ability to choose to do this, rather than race for submission with 100 
other users.

The worst thing the tool can do is to declare that "diff3 is idempotent 
therefore it should be disabled" during --reintegrate. Yuck. This is a 
partial solution and at least as I understand it - it is even dangerous. 
What happens if I use --reintegrate in a situation that actually does 
require merge resolution? Will every situation be blocked? Or will it 
take --reintegrate as a license to overwrite results, trusting that I 
can do all the necessary conflict checking myself? I have seen nothing 
so far that allows me to conclude that architecturally, Subversion 
requires the --reintegrate behaviour. It's a short cut in providing a 
complete branch merging solution for users of the system. Somebody 
started work on the canvas, and then drafted in the last corner rather 
than finish it. :-)

Cheers,
mark


On 01/30/2012 08:23 AM, Stefan Sperling wrote:
> The same applies to "reintegrate", BTW. It is a Subversion-specific 
> concept that might not be represented in CM theory because it is, as 
> you point out, just a special case of the general merge (you didn't 
> describe what "merge" means in your theory so I'm just going to make 
> assumptions).

>> Just to make sure it's understood: When you create a branch, the origin
>> of the branch is an interesting bit of information. However, for
>> merging, it is entirely irrelevant if branch A was created from B or the
>> other way around. To illustrate:
>>
>>      (1)
>>                 +- b@r2 ---- b@r3 ----
>>       (branch) /              | (merge)
>>               /               v
>>         --- a@r1 -------------+- a@r4 ----
>>
>>      (2)
>>         --- a@r1 ----------- a@r3 ----
>>               \               | (merge)
>>       (branch) \              v
>>                 +- b@r2 ------+- b@r4 ----
>>
>>
>> Cases (1) and (2) are exactly equivalent as far as the merge algorithm
>> is concerned, but Subversion calls the first a reintegrate merge and the
>> second a sync merge, and treats them differently, as if branch (a) were
>> somehow special. It's not.
> If you always use the 2-URL merge syntax all the abstractions go away
> and you'll have symmetry.
>
>   (1) svn co a@r4 wc; svn merge b@r2 b@r3 a
>   (2) svn co b@r4 wc; svn merge a@r1 a@r3 b
>
> See? Perfectly symmetrical.
>
> Your example is too simple, though.
> You only have one change being merged either way, and no cycles.
>
> Generally, we want to avoid spurious conflicts from diff3 which happen
> when changes are applied twice because diff3 is not idempotent.
> I.e. we break the nice symmetry to work around a limitation of diff3.
>
> In the following case we can avoid spurious conflicts by picking
> our parameters carefully:
>
>       (3)
>                  +-b@r2--+ b@r3--b@r4-b@r5 ----
>        (branch) /        ^             | (merge 2)
>                /         | (merge 1)   v
>          --- a@r1 ------a@r2-----------+- a@r6 ----
>
> Merge 1 brings a@r2 into b@r2.
> Merge 2 brings b@r4 into a@r5.
>
>   (3.1) svn co b@r2 wc; svn merge a@r1 a@r2 b
>
> There are two ways of performing merge 2.
> The first is symmetrical and re-applies a@r2 to a@r6, via b@r3,
> with possible spurious conflicts from diff3:
>
>   (3.2 a) svn co a@r5 wc; svn merge b@r2 b@r5 a
>
> The second does not re-apply a@r2, so there are no possible conflicts
> from diff3 because of a@r2/b@r3. Only b@r4 can conflict.
>
>   (3.2 b) svn co a@r5 wc; svn merge b@r3 b@r5 a
>
> The result is the same, however.
>
> What we use during --reintegrate is (3.2 b).
> You can argue that this approach is broken and we should be using (3.2 a)
> for symmetry, and let users deal with spurious conflicts.
>
> But (3.2 b) is always correct and more convenient if diff3 fails to
> produce a conflict-free diff when b@r3 is applied to a@r5.
> So why not use it?
>
> Alternatively, do you know of a diff3 replacement that is idempotent?


-- 
Mark Mielke<ma...@mielke.cc>


Re: Implicit keep-alive after reintegrate merge

Posted by Stefan Sperling <st...@elego.de>.
On Tue, Jan 24, 2012 at 01:12:39AM +0100, Branko Čibej wrote:
> By the way, I read Stefan's description of why --reintegrate is
> necessary, and after slogging through the unfortunate terminology (2-URL
> merge doesn't mean a thing in CM theory :) and one little bit caught my
> attention:
> 
> > A sync merge can fill in the all parameters as well, except PATH2.
> > However, it needs to do so in a different way. With a sync merge
> > PATH1 and PATH2 are the same
> 
> I keep reading this in the context of the rest of the reasoning, any my
> reaction is still: "WTF? Bogus!" This looks like someone /started off/
> with the assumption that a sync merge can take shortcuts where a
> reintegrate merge cannot; but, so sorry, that's just plain nonsense.

Oh, it's not nonsense. And there are no special shortcuts reintegrate
can take. You just misunderstood what I was writing about.

I didn't write about CM theory. I wrote about usage of Subversion.

When using svn, the term "2-URL" merge refers to a specific way of
invoking 'svn merge'. It is the most general type of invocation.
All other forms are syntactic sugar which can be represented by
equivalent 2-URL merge invocations.

Consider: svn merge ^/trunk
with mergeinfo on the current dir being: /trunk:2-6
The following 2-URL merge is the equivalent: 
 svn merge ^/trunk@6 ^/trunk@HEAD .
That's all there is to it.

The same applies to "reintegrate", BTW. It is a Subversion-specific
concept that might not be represented in CM theory because it is, as you
point out, just a special case of the general merge (you didn't describe
what "merge" means in your theory so I'm just going to make assumptions).

> The cases are exactly symmetrical, all edge cases apply to both directions
> of the merge, a sync merge can encounter all the complications of a
> reintegrate merge. I'll be bold enough to assume that the keep-alive
> song-and-dance is a direct result of these invalid assumptions.
> 
> Well, at least this answers the question of whether it's the model or
> the implementation that's wrong ... the answer is, that the
> implementation is misinterpreting the model. :)

Huh? I don't follow. Which model do you think is being misinterpreted?
Does the model you have in mind cleanly map to what Subversion can
represent?

> Just to make sure it's understood: When you create a branch, the origin
> of the branch is an interesting bit of information. However, for
> merging, it is entirely irrelevant if branch A was created from B or the
> other way around. To illustrate:
> 
>     (1)
>                +- b@r2 ---- b@r3 ----
>      (branch) /              | (merge)
>              /               v
>        --- a@r1 -------------+- a@r4 ----
> 
>     (2)
>        --- a@r1 ----------- a@r3 ----
>              \               | (merge)
>      (branch) \              v
>                +- b@r2 ------+- b@r4 ----
> 
> 
> Cases (1) and (2) are exactly equivalent as far as the merge algorithm
> is concerned, but Subversion calls the first a reintegrate merge and the
> second a sync merge, and treats them differently, as if branch (a) were
> somehow special. It's not.

If you always use the 2-URL merge syntax all the abstractions go away
and you'll have symmetry.

 (1) svn co a@r4 wc; svn merge b@r2 b@r3 a
 (2) svn co b@r4 wc; svn merge a@r1 a@r3 b

See? Perfectly symmetrical.

Your example is too simple, though.
You only have one change being merged either way, and no cycles.

Generally, we want to avoid spurious conflicts from diff3 which happen
when changes are applied twice because diff3 is not idempotent.
I.e. we break the nice symmetry to work around a limitation of diff3.

In the following case we can avoid spurious conflicts by picking
our parameters carefully:

     (3)
                +-b@r2--+ b@r3--b@r4-b@r5 ----
      (branch) /        ^             | (merge 2)
              /         | (merge 1)   v
        --- a@r1 ------a@r2-----------+- a@r6 ----

Merge 1 brings a@r2 into b@r2.
Merge 2 brings b@r4 into a@r5.

 (3.1) svn co b@r2 wc; svn merge a@r1 a@r2 b

There are two ways of performing merge 2.
The first is symmetrical and re-applies a@r2 to a@r6, via b@r3,
with possible spurious conflicts from diff3:

 (3.2 a) svn co a@r5 wc; svn merge b@r2 b@r5 a

The second does not re-apply a@r2, so there are no possible conflicts
from diff3 because of a@r2/b@r3. Only b@r4 can conflict.

 (3.2 b) svn co a@r5 wc; svn merge b@r3 b@r5 a

The result is the same, however.

What we use during --reintegrate is (3.2 b).
You can argue that this approach is broken and we should be using (3.2 a)
for symmetry, and let users deal with spurious conflicts.

But (3.2 b) is always correct and more convenient if diff3 fails to
produce a conflict-free diff when b@r3 is applied to a@r5.
So why not use it?

Alternatively, do you know of a diff3 replacement that is idempotent?

Re: Implicit keep-alive after reintegrate merge

Posted by Branko Čibej <br...@apache.org>.
On 23.01.2012 18:53, Julian Foad wrote:
> I've committed my latest version of this work to the new 'reintegrate-keep-alive' branch in r1234919.
>
> I'm working on explaining how this stuff relates to ClearCase's model and the need or non-need for a difference between "reintegrate" and "normal" merges in Subversion.
>

By the way, I read Stefan's description of why --reintegrate is
necessary, and after slogging through the unfortunate terminology (2-URL
merge doesn't mean a thing in CM theory :) and one little bit caught my
attention:

> A sync merge can fill in the all parameters as well, except PATH2.
> However, it needs to do so in a different way. With a sync merge
> PATH1 and PATH2 are the same

I keep reading this in the context of the rest of the reasoning, any my
reaction is still: "WTF? Bogus!" This looks like someone /started off/
with the assumption that a sync merge can take shortcuts where a
reintegrate merge cannot; but, so sorry, that's just plain nonsense. The
cases are exactly symmetrical, all edge cases apply to both directions
of the merge, a sync merge can encounter all the complications of a
reintegrate merge. I'll be bold enough to assume that the keep-alive
song-and-dance is a direct result of these invalid assumptions.

Well, at least this answers the question of whether it's the model or
the implementation that's wrong ... the answer is, that the
implementation is misinterpreting the model. :)

Just to make sure it's understood: When you create a branch, the origin
of the branch is an interesting bit of information. However, for
merging, it is entirely irrelevant if branch A was created from B or the
other way around. To illustrate:

    (1)
               +- b@r2 ---- b@r3 ----
     (branch) /              | (merge)
             /               v
       --- a@r1 -------------+- a@r4 ----

    (2)
       --- a@r1 ----------- a@r3 ----
             \               | (merge)
     (branch) \              v
               +- b@r2 ------+- b@r4 ----


Cases (1) and (2) are exactly equivalent as far as the merge algorithm
is concerned, but Subversion calls the first a reintegrate merge and the
second a sync merge, and treats them differently, as if branch (a) were
somehow special. It's not.

-- Brane

Re: Implicit keep-alive after reintegrate merge

Posted by Julian Foad <ju...@btopenworld.com>.
I've committed my latest version of this work to the new 'reintegrate-keep-alive' branch in r1234919.

I'm working on explaining how this stuff relates to ClearCase's model and the need or non-need for a difference between "reintegrate" and "normal" merges in Subversion.

- Julian

Re: Implicit keep-alive after reintegrate merge

Posted by Branko Čibej <br...@apache.org>.
On 20.01.2012 12:20, Mark Mielke wrote:
> I know that Subversion has these limitations compared to ClearCase:
>
> 1) Merge tracking is done at a commit level rather than at a per file
> or directory level. This means more complex analysis as the per commit
> information is "lossy" compared to what ClearCase is able to achieve.
> However, I think the information is still mostly available - it just
> requires some effort to extract.

I agree with all of your points, except possibly about the lossy nature
of mergeinfo. A revision is just a set of files (and directories), an
that set + mergeinfo is isomorphic to ClearCase's concept of merge
arrows (note that mergeinfo contains paths).

My original assumption that something must be wrong with either
Subversion's model or its implementation also comes from my experience
with ClearCase, by the way. And from a bit of reading on configuration
management theory; just as you said, the higher-level concepts
(promote/demote, or your even more restricted examples, deliver/rebase)
are just special cases of a lower-level, direction-agnostic merge primitive.

> 2) Branch off points are mysteriously hidden. Captured within the
> underlying repository but not exposed to the client? I think I saw
> some discussion about making this information available to the client
> but I didn't follow the conclusion of this?

Branch-off points are one issue, but a more fundamental one IMO is that
the client knows nothing about object identity; IOW, when given two
path/revision pairs, the client must jump through a lot of hoops to
figure out if they refer to the same entity at different times and/or on
different branches.

> Putting all of the above together - I don't understand why
> --reintegrate is required. Maybe I'm just ignorant. :-)

Apparently just about as ignorant of the matter as I am ...

-- Brane

Re: Implicit keep-alive after reintegrate merge

Posted by Mark Mielke <ma...@mark.mielke.cc>.
On 01/20/2012 05:40 AM, Daniel Shahaf wrote:
> Sure, here it is:
>
> 12:37:24 @danielsh | wayita: reintegrate?
> ..
> 12:37:25    wayita | danielsh: If you want to know why --reintegrate is necessary, see http://mail-archives.apache.org/mod_mbox/subversion-dev/201107.mbox/%3C20110720124721.GA7557@ted.stsp.name%3E
>
> Sorry for not including the link originally --- I was tired and couldn't
> recall how to recompute it.

Heh. I'm too tired to comprehend that email. I will try tomorrow. 
However, I do wish to mention that I have looked at this problem in the 
past and I have yet to understand why Subversion seems to have a 
particular problem that requires --reintegrate whereas other solutions 
such as ClearCase do not seem to have this problem.

In ClearCase, it comes down to the following key points:

1) Most recent common ancestor. Discovered by tracking back merge arrows 
and branches to their branch off points until it reaches a common point.
2) The diff of the most recent common ancestor to the target version to 
the target version is merged with the diff of the most recent common 
ancestor to the source version to come up with the target.
3) In the special case where the most recent common ancestor is exactly 
equivalent (byte for byte for files, directory entry for directory entry 
for directories), a "trivial merge" is performed which means that the 
content can be copied from the source version over top of the target 
version with no merge effort required.
4) After completing a merge, a merge arrow is drawn.
5) A branch is a branch. "cleartool merge" doesn't need to treat one 
direction special compared to the opposite direction beyond flipping 
which version is the source and which version is the target. There are 
higher level operations on top that presume a direction ("cleartool 
deliver" sends from the private branch to the integration branch, 
"cleartool rebase" sends from the integration branch to the private 
branch), but these high level operations are implemented using 
"cleartool merge" underneath, and "cleartool merge" is unaware of these 
higher level conventions. It doesn't need to know about them.

I know that ClearCase does exactly the above because we reproduce it and 
manipulate it and various non-standard ways to make it do exactly what 
we want.

I know that Subversion has these limitations compared to ClearCase:

1) Merge tracking is done at a commit level rather than at a per file or 
directory level. This means more complex analysis as the per commit 
information is "lossy" compared to what ClearCase is able to achieve. 
However, I think the information is still mostly available - it just 
requires some effort to extract.
2) Branch off points are mysteriously hidden. Captured within the 
underlying repository but not exposed to the client? I think I saw some 
discussion about making this information available to the client but I 
didn't follow the conclusion of this?

Putting all of the above together - I don't understand why --reintegrate 
is required. Maybe I'm just ignorant. :-)

-- 
Mark Mielke<ma...@mielke.cc>


Re: Implicit keep-alive after reintegrate merge

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Branko Čibej wrote on Fri, Jan 20, 2012 at 09:52:34 +0100:
> On 20.01.2012 03:23, Daniel Shahaf wrote:
> > To rule out the common case: are you familiar with Stefan's post giving
> > a reasons for this?
> 
> Nope.
> 
> > His argument is that 'cd trunk-wc && svn merge ^/branches/feature-branch'
> > and 'cd 1.7.x-wc && svn merge ^/trunk' are syntactically indistinguishable
> > but need to execute a semantically different algorithm each.
> 
> Huh. Why? How are branches, once created, not partial mirrors of each
> other, with the missing bits noted in mergeinfo? Do you have a link to
> the post? I'm really intrigued about this assertion.

Sure, here it is:

12:37:24 @danielsh | wayita: reintegrate?
..
12:37:25    wayita | danielsh: If you want to know why --reintegrate is necessary, see http://mail-archives.apache.org/mod_mbox/subversion-dev/201107.mbox/%3C20110720124721.GA7557@ted.stsp.name%3E

Sorry for not including the link originally --- I was tired and couldn't
recall how to recompute it.

Re: Implicit keep-alive after reintegrate merge

Posted by Branko Čibej <br...@apache.org>.
On 20.01.2012 03:23, Daniel Shahaf wrote:
> To rule out the common case: are you familiar with Stefan's post giving
> a reasons for this?

Nope.

> His argument is that 'cd trunk-wc && svn merge ^/branches/feature-branch'
> and 'cd 1.7.x-wc && svn merge ^/trunk' are syntactically indistinguishable
> but need to execute a semantically different algorithm each.

Huh. Why? How are branches, once created, not partial mirrors of each
other, with the missing bits noted in mergeinfo? Do you have a link to
the post? I'm really intrigued about this assertion.

-- Brane


Re: Implicit keep-alive after reintegrate merge

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Branko Čibej wrote on Thu, Jan 19, 2012 at 20:11:25 +0100:
> On 19.01.2012 15:38, Julian Foad wrote:
> > Branko Čibej wrote:
> >
> >> Instead of trying to invent ways to not make current reintegrate suck
> >> rocks, I'd suggest taking a look at how other tools handle such repeated
> >> merges between branches. Specifically, since git afficionados have so
> >> much to say about how good merging in git is, especially compared to
> >> Subversion, I'd be really interested to see if -- when you get down to
> >> cases -- it can actually do anything that analyzing the diff3 results
> >> (or, more likely, using diff4) can't already do.
> > I have to ask, are you writing from a point of view of having a mental model in which simply analyzing diffs *could* achieve the requisite tracking results?  Because I can't begin to see how.
> 
> No, I'm just trying to wrap my head around the concept of --reintegrate
> in the first place. :)
> What I mean is, no other tool I've ever used -- and I include CVS with
> all its problems in this list -- has had or needed this concept. So I
> must assume that something must be fundamentally wrong with our model of
> merge tracking if we ended up having to add --reintegrate. I really
> don't know /what/ could be wrong, but comparing what svn does to what
> others do (I pushed git as an example because of its vocal proponents,
> of course) might give a hint.
> 

To rule out the common case: are you familiar with Stefan's post giving
a reasons for this?

His argument is that 'cd trunk-wc && svn merge ^/branches/feature-branch'
and 'cd 1.7.x-wc && svn merge ^/trunk' are syntactically indistinguishable
but need to execute a semantically different algorithm each.

> I grant it could turn out that there's nothing really wrong with the
> model and that --reintegrate simply masks implementation problems that
> you're tackling on a more fundamental level now.
> 
> -- Brane
> 

Re: Implicit keep-alive after reintegrate merge

Posted by Branko Čibej <br...@apache.org>.
On 19.01.2012 15:38, Julian Foad wrote:
> Branko Čibej wrote:
>
>> Instead of trying to invent ways to not make current reintegrate suck
>> rocks, I'd suggest taking a look at how other tools handle such repeated
>> merges between branches. Specifically, since git afficionados have so
>> much to say about how good merging in git is, especially compared to
>> Subversion, I'd be really interested to see if -- when you get down to
>> cases -- it can actually do anything that analyzing the diff3 results
>> (or, more likely, using diff4) can't already do.
> I have to ask, are you writing from a point of view of having a mental model in which simply analyzing diffs *could* achieve the requisite tracking results?  Because I can't begin to see how.

No, I'm just trying to wrap my head around the concept of --reintegrate
in the first place. :)
What I mean is, no other tool I've ever used -- and I include CVS with
all its problems in this list -- has had or needed this concept. So I
must assume that something must be fundamentally wrong with our model of
merge tracking if we ended up having to add --reintegrate. I really
don't know /what/ could be wrong, but comparing what svn does to what
others do (I pushed git as an example because of its vocal proponents,
of course) might give a hint.

I grant it could turn out that there's nothing really wrong with the
model and that --reintegrate simply masks implementation problems that
you're tackling on a more fundamental level now.

-- Brane


Re: Implicit keep-alive after reintegrate merge

Posted by Julian Foad <ju...@btopenworld.com>.
Branko Čibej wrote:

> On 18.01.2012 23:37, Julian Foad wrote:
>> [...] there are certain simple 
>> changes that the three-way text merge function will detect as "looks 
>> like that change has already been made here" and auto-resolve it as a 
>> no-op, but that's beside the point.
> 
> I don't think it's beside the point. Mergeinfo can only give you a hint
> about node-level conflicts some of the time, and you still have to be
> careful about when to believe it. But "svn merge" isn't really about
> nodes, it's about textual changes -- at least as far as the user is
> concerned (and I'm carefully hand-waving away tracking renames).

Hi Brane.  Interesting thoughts, even before you hand-waved away the renames!

We seem to have very different views of merge tracking.  I invite you to read a bit about my line of thinking in the "Identifying Logical Changes vs. Commits" section of <http://wiki.apache.org/subversion/SupportedMergeScenarios>.

> So in
> the end, you have to rely on textual diffs in order to guesstimate what
> actually happened. Even looking at your cherry-pick merge of r8 in the
> diagram example above, all bets are off if only half of the r8 change to
> A was actually committed in r9 (in other words, if the user edited the
> result before committing the merge). It's easy to tell the users "don't 
> do that", but not really realistic. After all, this happens every time 
> you have to manually resolve a text conflict.

Not at all.  In general, whenever a change is merged to another branch, there will be some conflict resolution to make it fit the physical (e.g. source code syntax) and logical (e.g. functional or documentation) requirements of the target branch.  But Subversion records that you've merged "r8", and I believe that means you've merged the *logical substance* of r8.  By no means are all bets off after that; indeed, that is exactly the information we want to track.

Furthermore, it is fundamental to (at least this model of) merge tracking that the user only modifies the merge-commit in ways that preserve the logical substance of the change.  That is what I call "conflict resolution" in its full generality of resolving both the conflicts that Subversion happens to detect (which depends on what diff3 plug-in you're using) and those (functional, whatever) conflicts that it doesn't detect for you.

If a user chooses to edit the merge result so as to introduce another logical change, or to undo part of the logical change, then that isn't going to be recorded and the user can't expect Subversion to help them track such a change.  (If this concept is unfamiliar it may need a bit more explanation with examples to make it clear.)

> Instead of trying to invent ways to not make current reintegrate suck
> rocks, I'd suggest taking a look at how other tools handle such repeated
> merges between branches. Specifically, since git afficionados have so
> much to say about how good merging in git is, especially compared to
> Subversion, I'd be really interested to see if -- when you get down to
> cases -- it can actually do anything that analyzing the diff3 results
> (or, more likely, using diff4) can't already do.

I have to ask, are you writing from a point of view of having a mental model in which simply analyzing diffs *could* achieve the requisite tracking results?  Because I can't begin to see how.

- Julian


> I suspect that the real trick is in finding the correct common ancestor
> for diff3, and properly pruning the diff3 result, rather than hoping
> that merginfo will correctly tell you which cherry-picked revisions to
> skip. Because it certainly won't and can't, in general. What it /can/
> tell you is which ones to be extra careful about.
> 
> -- Brane
> 

Re: Implicit keep-alive after reintegrate merge

Posted by Branko Čibej <br...@apache.org>.
On 18.01.2012 23:37, Julian Foad wrote:
> Branko Čibej wrote:
>
>> On 18.01.2012 14:45, Julian Foad wrote:
>>>>  A@1---r4---r5---r6---r7----------------r11----------->
>>>>    |\                                     ^          |
>>>>    | \                                   |          |
>>>>    |  \                              reintegrate     |
>>>>    |   V                                  |          |
>>>>    |   A_COPY_2-----------------r9---r10---          |
>>>>    |                            ^                sync merge
>>>>    |                           /                      |
>>>>    |                  cherry-pick merge of r8       |
>>>>    V                         /                        V
>>>>    A_COPY-------------------r8------------------------->
>>>  Thanks for this good example of a more complex use case.  I've included 
>>> it in <http://wiki.apache.org/subversion/KeepingReintegratedBranchAlive> 
>>> and written up a discussion of the options there.
>>>
>>>  What do we want in this case?  The options are basically:
> (To be sure we're talking about the same thing, here I'm talking about how Subversion decides whether to merge the change "r11" from A onto A_COPY.)
>
>>>    (1) try to merge (as we do now), despite knowing it will conflict;
>> Actually, you know that it will not conflict, unless r9 or r10
>> introduced changes in the same place as r8. Otherwise the code is the
>> same and diff3 should skip that part, yes?
> No; I mean in general it will conflict.  Yes there are certain simple changes that the three-way text merge function will detect as "looks like that change has already been made here" and auto-resolve it as a no-op, but that's beside the point.

I don't think it's beside the point. Mergeinfo can only give you a hint
about node-level conflicts some of the time, and you still have to be
careful about when to believe it. But "svn merge" isn't really about
nodes, it's about textual changes -- at least as far as the user is
concerned (and I'm carefully hand-waving away tracking renames). So in
the end, you have to rely on textual diffs in order to guesstimate what
actually happened. Even looking at your cherry-pick merge of r8 in the
diagram example above, all bets are off if only half of the r8 change to
A was actually committed in r9 (in other words, if the user edited the
result before committing the merge).

It's easy to tell the users "don't do that", but not really realistic.
After all, this happens every time you have to manually resolve a text
conflict.

Instead of trying to invent ways to not make current reintegrate suck
rocks, I'd suggest taking a look at how other tools handle such repeated
merges between branches. Specifically, since git afficionados have so
much to say about how good merging in git is, especially compared to
Subversion, I'd be really interested to see if -- when you get down to
cases -- it can actually do anything that analyzing the diff3 results
(or, more likely, using diff4) can't already do.

I suspect that the real trick is in finding the correct common ancestor
for diff3, and properly pruning the diff3 result, rather than hoping
that merginfo will correctly tell you which cherry-picked revisions to
skip. Because it certainly won't and can't, in general. What it /can/
tell you is which ones to be extra careful about.

-- Brane

Re: Implicit keep-alive after reintegrate merge

Posted by Julian Foad <ju...@btopenworld.com>.
Branko Čibej wrote:

> On 18.01.2012 14:45, Julian Foad wrote:
>>>  A@1---r4---r5---r6---r7----------------r11----------->
>>>    |\                                     ^          |
>>>    | \                                   |          |
>>>    |  \                              reintegrate     |
>>>    |   V                                  |          |
>>>    |   A_COPY_2-----------------r9---r10---          |
>>>    |                            ^                sync merge
>>>    |                           /                      |
>>>    |                  cherry-pick merge of r8       |
>>>    V                         /                        V
>>>    A_COPY-------------------r8------------------------->
>> 
>>  Thanks for this good example of a more complex use case.  I've included 
>> it in <http://wiki.apache.org/subversion/KeepingReintegratedBranchAlive> 
>> and written up a discussion of the options there.
>> 
>>  What do we want in this case?  The options are basically:

(To be sure we're talking about the same thing, here I'm talking about how Subversion decides whether to merge the change "r11" from A onto A_COPY.)

>>    (1) try to merge (as we do now), despite knowing it will conflict;
> 
> Actually, you know that it will not conflict, unless r9 or r10
> introduced changes in the same place as r8. Otherwise the code is the
> same and diff3 should skip that part, yes?

No; I mean in general it will conflict.  Yes there are certain simple changes that the three-way text merge function will detect as "looks like that change has already been made here" and auto-resolve it as a no-op, but that's beside the point.

[sorry, can't reply to the rest just now]

- Julian

Re: Implicit keep-alive after reintegrate merge

Posted by Branko Čibej <br...@xbc.nu>.
On 18.01.2012 14:45, Julian Foad wrote:
>> A@1---r4---r5---r6---r7----------------r11----------->
>>   |\                                     ^          |
>>   | \                                    |          |
>>   |  \                              reintegrate     |
>>   |   V                                  |          |
>>   |   A_COPY_2-----------------r9---r10---          |
>>   |                            ^                sync merge
>>   |                           /                     |
>>   |                  cherry-pick merge of r8        |
>>   V                         /                       V
>>   A_COPY-------------------r8------------------------->
>
> Thanks for this good example of a more complex use case.  I've included it in <http://wiki.apache.org/subversion/KeepingReintegratedBranchAlive> and written up a discussion of the options there.
>
> What do we want in this case?  The options are basically:
>
>   (1) try to merge (as we do now), despite knowing it will conflict;

Actually, you know that it will not conflict, unless r9 or r10
introduced changes in the same place as r8. Otherwise the code is the
same and diff3 should skip that part, yes?

>   (2) skip r11 (not usually good in this kind of case);

In fact, always wrong.

>   (3) figure out what really "should" be merged and merge it;

That's what diff3 is for, isn't it. Mergeinfo is always only a hint
about whole revisions. If you want to know what "should" be merged,
you'd have to track the merged changes at least at hunk granularity, if
not actual line level. Specifically, mergeinfo doesn't help at all once
you start cherry-picking. Even a cherry-pick merge of a single revision
will produce /some/ extraneous changes to the target branch, even if
it's only as much as shifting lines around a bit due to patch fuzz.

>   (4) stop and tell the user what's up.

"This bit is a conflict" that comes out of diff3 is exactly that. Using
diff4 would help a bit, but basically, if the user moves the code around
too much, all bets are off.

-- Brane


Re: Implicit keep-alive after reintegrate merge

Posted by Julian Foad <ju...@btopenworld.com>.
I (Julian Foad) wrote:
[....]

> What do we want in this case?  The options are basically:
> 
>   (1) try to merge (as we do now), despite knowing it will conflict;
>   (2) skip r11 (not usually good in this kind of case);

Two clarifications:

The options I'm listing are all the options that I think could serve the user well, with different options being best in different circumstances.

As explained on <http://wiki.apache.org/subversion/KeepingReintegratedBranchAlive>, I mean that option (1) should record the revision as merged and (2) not record it.  Neither of those is what my previous patch did; it skipped but recorded, which is great for the simple case but not for the general case.

>   (3) figure out what really "should" be merged and merge it;
>   (4) stop and tell the user what's up.
[...]

- Julian

Re: Implicit keep-alive after reintegrate merge

Posted by Julian Foad <ju...@btopenworld.com>.
Paul Burba wrote:
> Julian Foad wrote:

>>  I describe the idea of "logical changes" in 
>> <http://wiki.apache.org/subversion/MergeTrackingIdeas>.
> 
> Most of the ideas there make sense, though I'm a bit foggy as to how
> the mergeinfo format will change and how we'll migrate existing
> repositories.

No, I only referred to that ideas page to explain the idea about categorizing changes as 'logical changes' versus 'merges'.  I'm not proposing any mergeinfo format changes or anything that would need migration.

> I know you are still early in the proof-of-concept phase, but have you
> given much thought to performance?  In particular I'm thinking of how
> mergeinfo_graph_populate recursively scans the history of reflective
> merges to build the mergeinfo graph.  In doing so it ultimately calls
> svn_client__get_repos_mergeinfo twice for each revision (operative or
> not) described in the reflected mergeinfo.  That could get slow in a
> hurry.

The 'twice' is obviously and simple avoidable.  As for the rest, I've only thought a little as yet.  In the worst case it is necessary, but in common cases I suspect we will be able to avoid much of it by delaying the graph population and scanning on demand, as we can prune the recursion at some points.

- Julian

Re: Implicit keep-alive after reintegrate merge

Posted by Paul Burba <pt...@gmail.com>.
On Wed, Jan 18, 2012 at 8:45 AM, Julian Foad <ju...@btopenworld.com> wrote:
> Hi Paul.
>
> Paul Burba wrote:
>
>> Julian Foad wrote:
>>> I (Julian Foad) wrote:
>> I think I understand now.  Your patch works fine when the only cyclic
>> merges are done via reintegrate (i.e. r24).
>
> [...]
>
>> I thought you were going to tackle more complex reflective merge
>> cases, such as this example:
>
>
> Yes, exactly!
>
>> A@1---r4---r5---r6---r7----------------r11----------->
>>  |\                                     ^          |
>>  | \                                    |          |
>>  |  \                              reintegrate     |
>>  |   V                                  |          |
>>  |   A_COPY_2-----------------r9---r10---          |
>>  |                            ^                sync merge
>>  |                           /                     |
>>  |                  cherry-pick merge of r8        |
>>  V                         /                       V
>>  A_COPY-------------------r8------------------------->
>
>
> Thanks for this good example of a more complex use case.  I've included it in <http://wiki.apache.org/subversion/KeepingReintegratedBranchAlive> and written up a discussion of the options there.
>
> What do we want in this case?  The options are basically:
>
>   (1) try to merge (as we do now), despite knowing it will conflict;
>   (2) skip r11 (not usually good in this kind of case);
>   (3) figure out what really "should" be merged and merge it;
>
>   (4) stop and tell the user what's up.
>
> I'm not aiming for (3) because I think that's much more complex than it sounds.

Actually it *sounds* pretty complex ;-)  Seriously, if you solve #3
then I think you will also have solved issue #2897.

> With any other option, I believe it is essential to tell the user why Subversion can't simply include or exclude r11, with details.
>
>
> Any of these options could be best for the user.  It is not at all the case that (1) is best.  It depends how big the changes in r8/r9 are compared with those in r10, and how much they conflict, and how the user prefers to resolve the situation.
>
> I'm aiming for:
>
>   * Detect whether we have the simple case (the rev is effectively already merged) or the complex case (partially merged).
>   * Simple case: Do The Right Thing which is skip it but record it as merged.
>
>   * Complex case: let user choose (1) or (2) or (4), in advance or interactively.

+1

>   * Complex case: always issue a diagnostic message that clearly explains what's up.
>
>> As your patch stands, the above use-case causes some serious problems.
>
> Sure, my last patch only detects a Go/No-go and chooses No-go because that's sufficient for the simple use case.  But now we can get to the interesting bit: how to detect the "Partially merged" situation.

Sorry, I was initially unclear on how far you intended to take this.
I'm with you now.

> The principle that we are grasping as humans but haven't yet codified formally is that we want the merge to know whether all the "original changes" have been made already (directly or via any series of merges) in the target branch.
>
> I describe the idea of "logical changes" in <http://wiki.apache.org/subversion/MergeTrackingIdeas>.

Most of the ideas there make sense, though I'm a bit foggy as to how
the mergeinfo format will change and how we'll migrate existing
repositories.

> Briefly:  When we make a new change in A@10 and merge that into B as B@12, we get two physical changes in the repository.  B@12 is a physical manifestation of the same logical change as A@10, but it looks a bit different because it is adjusted (automatically and/or manually) to fit the physical form and logical requirements of branch B.  I say that there is one "logical change" there, which we refer to by the co-ordinates where it originated, A@10.
>
> Whenever we ask Subversion to track merges, although the current implementation just tracks physical changes from the immediate source branch, I believe that what we (as users) really expect is for it to track logical changes.  The "partial merge" use case above is an example of where we have to make a distinction and explicitly ask whether all the required logical changes are present.
>
> ALGORITHM
>
> Purpose: To determine whether all logical changes comprising a given physical change (e.g. A@11) are already present in the target.
>
> Step 1: build a tree of merged revisions (i.e. mergeinfo changes), recursing until we reach "logical changes".

I know you are still early in the proof-of-concept phase, but have you
given much thought to performance?  In particular I'm thinking of how
mergeinfo_graph_populate recursively scans the history of reflective
merges to build the mergeinfo graph.  In doing so it ultimately calls
svn_client__get_repos_mergeinfo twice for each revision (operative or
not) described in the reflected mergeinfo.  That could get slow in a
hurry.

> Step 2: traverse the tree of merged revisions, seeing whether all of the logical changes that comprise A@11 are recorded in the mergeinfo of the target (A_COPY).
>
> While doing this we need to ignore no-op revisions because we don't care whether they are recorded in the target or not.
>
> I've written code for step 1  (the attached 'merge-avoid-reflective-5.patch' is my current state) and am starting on step 2.

I look forward to see what you come up with for step 2.

Paul

>> (The attached patch is an update of yours with a new
>
>> merge_reintegrate.py test #20, which demonstrates the above use case)
>
> Thanks; that's very useful.
>
>
> - Julian

Re: Implicit keep-alive after reintegrate merge

Posted by Julian Foad <ju...@btopenworld.com>.
Hi Paul.

Paul Burba wrote:

> Julian Foad wrote:
>> I (Julian Foad) wrote:
> I think I understand now.  Your patch works fine when the only cyclic
> merges are done via reintegrate (i.e. r24).

[...]

> I thought you were going to tackle more complex reflective merge
> cases, such as this example:


Yes, exactly!

> A@1---r4---r5---r6---r7----------------r11----------->
>  |\                                     ^          |
>  | \                                    |          |
>  |  \                              reintegrate     |
>  |   V                                  |          |
>  |   A_COPY_2-----------------r9---r10---          |
>  |                            ^                sync merge
>  |                           /                     |
>  |                  cherry-pick merge of r8        |
>  V                         /                       V
>  A_COPY-------------------r8------------------------->


Thanks for this good example of a more complex use case.  I've included it in <http://wiki.apache.org/subversion/KeepingReintegratedBranchAlive> and written up a discussion of the options there.

What do we want in this case?  The options are basically:

  (1) try to merge (as we do now), despite knowing it will conflict;
  (2) skip r11 (not usually good in this kind of case);
  (3) figure out what really "should" be merged and merge it;

  (4) stop and tell the user what's up.

I'm not aiming for (3) because I think that's much more complex than it sounds.  With any other option, I believe it is essential to tell the user why Subversion can't simply include or exclude r11, with details.


Any of these options could be best for the user.  It is not at all the case that (1) is best.  It depends how big the changes in r8/r9 are compared with those in r10, and how much they conflict, and how the user prefers to resolve the situation.


I'm aiming for:

  * Detect whether we have the simple case (the rev is effectively already merged) or the complex case (partially merged).
  * Simple case: Do The Right Thing which is skip it but record it as merged.

  * Complex case: let user choose (1) or (2) or (4), in advance or interactively.
  * Complex case: always issue a diagnostic message that clearly explains what's up.


> As your patch stands, the above use-case causes some serious problems.


Sure, my last patch only detects a Go/No-go and chooses No-go because that's sufficient for the simple use case.  But now we can get to the interesting bit: how to detect the "Partially merged" situation.

The principle that we are grasping as humans but haven't yet codified formally is that we want the merge to know whether all the "original changes" have been made already (directly or via any series of merges) in the target branch.

I describe the idea of "logical changes" in <http://wiki.apache.org/subversion/MergeTrackingIdeas>.  Briefly:  When we make a new change in A@10 and merge that into B as B@12, we get two physical changes in the repository.  B@12 is a physical manifestation of the same logical change as A@10, but it looks a bit different because it is adjusted (automatically and/or manually) to fit the physical form and logical requirements of branch B.  I say that there is one "logical change" there, which we refer to by the co-ordinates where it originated, A@10.

Whenever we ask Subversion to track merges, although the current implementation just tracks physical changes from the immediate source branch, I believe that what we (as users) really expect is for it to track logical changes.  The "partial merge" use case above is an example of where we have to make a distinction and explicitly ask whether all the required logical changes are present.



ALGORITHM

Purpose: To determine whether all logical changes comprising a given physical change (e.g. A@11) are already present in the target.

Step 1: build a tree of merged revisions (i.e. mergeinfo changes), recursing until we reach "logical changes".

Step 2: traverse the tree of merged revisions, seeing whether all of the logical changes that comprise A@11 are recorded in the mergeinfo of the target (A_COPY).

While doing this we need to ignore no-op revisions because we don't care whether they are recorded in the target or not.

I've written code for step 1  (the attached 'merge-avoid-reflective-5.patch' is my current state) and am starting on step 2.

> (The attached patch is an update of yours with a new

> merge_reintegrate.py test #20, which demonstrates the above use case)

Thanks; that's very useful.


- Julian

Re: Implicit keep-alive after reintegrate merge

Posted by Paul Burba <pt...@gmail.com>.
On Tue, Jan 17, 2012 at 10:12 AM, Julian Foad
<ju...@btopenworld.com> wrote:
> I (Julian Foad) wrote:
>> I think merge_reintegrate_tests.py 19 demonstrates it better than 1000 words so
>> I'll leave it at that for today.
>
> For people who'd like to follow along without having to apply and run the patch, that test does:
>
> Rev  A     A_COPY
>      |
> 02   |--cp-->X     # every X is a commit
> 3~6  X       |
> 07   |       X
> 08   |       |->X  # side-branch 'BRANCH_DEV1'
> 09   |       |  X
> 10   |       X<-|
> 11   X       |
> 12   |->X    |     # side-branch 'TRUNK_DEV1'
> 13   |  X    |
> 14   X<-|    |
>      |       |
> 15   |--mrg->X     # 'merge ^/A A_COPY'
>      |       |
> 16   |       X
> 17   |       |->X  # side-branch 'BRANCH_DEV2'
> 18   |       |  X
> 19   |       X<-|
> 20   X       |
> 21   |->X    |     # side-branch 'TRUNK_DEV2'
> 22   |  X    |
> 23   X<-|    |
>      |       |
> 24   X<-mrg--|     # 'merge --reintegrate ^/A_COPY A'
>      |       |     # no keep-alive dance here
> 25   |       X
> 26   |       |->X  # side-branch 'BRANCH_DEV3'
> 27   |       |  X
> 28   |       X<-|
> 29   X       |
> 30   |->X    |
> 31   |  X    |
> 32   X<-|    |     # side-branch 'TRUNK_DEV3'
>      |       |
> WC   |--mrg->X     # 'merge ^/A A_COPY'

Hi Julian,

I think I understand now.  Your patch works fine when the only cyclic
merges are done via reintegrate (i.e. r24).

When you said this though,

"However, if change A:40 had instead been a different merge into A,
let's say from C, it is still possible that merge might have brought
along some new mergeinfo describing merges from B, because of the way
mergeinfo is propagated from branch to branch.  Therefore, if we find
that change A:40 adds new mergeinfo about merges from B, we cannot
simply say that A:40 describes a reintegrate merge from B.  We need to
look more closely.  That's what I'm currently working on."

I thought you were going to tackle more complex reflective merge
cases, such as this example:

A@1---r4---r5---r6---r7----------------r11----------->
 |\                                     ^          |
 | \                                    |          |
 |  \                              reintegrate     |
 |   V                                  |          |
 |   A_COPY_2-----------------r9---r10---          |
 |                            ^                sync merge
 |                           /                     |
 |                  cherry-pick merge of r8        |
 V                         /                       V
 A_COPY-------------------r8------------------------->

As your patch stands, the above use-case causes some serious problems.

(The attached patch is an update of yours with a new
merge_reintegrate.py test #20, which demonstrates the above use case)

Here is the mergeinfo right before the final sync merge:

>svn pl -vR
Properties on 'A':
  svn:mergeinfo
    /A_COPY:8
    /A_COPY_2:3-10
Properties on 'A_COPY_2':
  svn:mergeinfo
    /A_COPY:8

Now perform the sync merge, your patch skips r11 because the naive
view of the mergeinfo change in that rev makes it look reflective.  So
the change made in r10 on the A_COPY_2 branch, which should be applied
to A_COPY, is not:

  >svn merge ^^/A A_COPY
  DBG: merge.c:8464: r1:11 mi added:
  DBG:   /A_COPY:8
  DBG: merge.c:8464: r6:11 mi added:
  DBG:   /A_COPY:8
  DBG: merge.c:8464: r8:11 mi added:
  DBG:   /A_COPY:8
  DBG: merge.c:8464: r9:11 mi added:
  DBG:   /A_COPY:8
  DBG: merge.c:8464: r10:11 mi added:
  DBG:   /A_COPY:8
  DBG: merge.c:8633: Skipping reflective revision r11
                     ^^^^^^^^
                Oops, there goes r10
  --- Merging r2 through r10 into 'A_COPY':
  U    A_COPY\B\E\beta
  U    A_COPY\D\G\rho
  U    A_COPY\D\H\omega
  U    A_COPY\D\H\psi
  --- Recording mergeinfo for merge of r2 through r11 into 'A_COPY':
   U   A_COPY

Definitely not what the user intended I think.  Worse, if the user
doesn't catch it and commits the sync merge...

  >svn ci -m "Sync merge from /A to A_COPY"
  Sending        A_COPY
  Sending        A_COPY\B\E\beta
  Sending        A_COPY\D\G\rho
  Sending        A_COPY\D\H\omega
  Sending        A_COPY\D\H\psi
  Transmitting file data ....
  Committed revision 12.

...and then reintegrates A_COPY to A, then the change from r10 is
actually removed from A:

  >svn up -q

  >svn merge ^^/A_COPY A --reintegrate
  --- Merging differences between repository URLs into 'A':
  U    A\D\H\chi
  --- Recording mergeinfo for merge between repository URLs into 'A':
   U   A

  >svn diff A\D\H\chi
  Index: A/D/H/chi
  ===================================================================
  --- A/D/H/chi   (revision 12)
  +++ A/D/H/chi   (working copy)
  @@ -1 +1 @@
  -Branch 2 edit.  <--- edit made in r10
  +This is the file 'chi'.

Again, probably not what was intended.

Paul

> and the tail output for the last merge is:
>
> DBG: merge.c:8464: r23:24 mi added:
> DBG:   /A_COPY:2-23
> DBG: merge.c:8633: Skipping reflective revision r24
> --- Merging r15 through r23 into 'A_COPY':
> A    A_COPY/TrunkFix2
> A    A_COPY/NewViaTRUNK_DEV2
>  G   A_COPY
> --- Merging r25 through r32 into 'A_COPY':
> A    A_COPY/TrunkFix3
> A    A_COPY/NewViaTRUNK_DEV3
>  G   A_COPY
> --- Recording mergeinfo for merge of r15 through r32 into 'A_COPY':
>  G   A_COPY
>
>
> mergeinfo on 'A':
>     /A_COPY:2-23
>     /BRANCH_DEV1:8-9
>     /BRANCH_DEV2:17-18
>     /TRUNK_DEV1:12-13
>     /TRUNK_DEV2:21-22
>     /TRUNK_DEV3:30-31
> mergeinfo on 'A_COPY@BASE':
>     /A:2-14
>     /BRANCH_DEV1:8-9
>     /BRANCH_DEV2:17-18
>     /BRANCH_DEV3:26-27
>     /TRUNK_DEV1:12-13
> mergeinfo on 'A_COPY' (working version):
>     /A:2-32
>     /BRANCH_DEV1:8-9
>     /BRANCH_DEV2:17-18
>     /BRANCH_DEV3:26-27
>     /TRUNK_DEV1:12-13
>     /TRUNK_DEV2:21-22
>     /TRUNK_DEV3:30-31
>
>
> - Julian

Re: Implicit keep-alive after reintegrate merge

Posted by Julian Foad <ju...@btopenworld.com>.
I (Julian Foad) wrote:
> I think merge_reintegrate_tests.py 19 demonstrates it better than 1000 words so 
> I'll leave it at that for today.

For people who'd like to follow along without having to apply and run the patch, that test does:

Rev  A     A_COPY
     |
02   |--cp-->X     # every X is a commit
3~6  X       |

07   |       X

08   |       |->X  # side-branch 'BRANCH_DEV1'

09   |       |  X
10   |       X<-|

11   X       |
12   |->X    |     # side-branch 'TRUNK_DEV1'
13   |  X    |
14   X<-|    |
     |       |

15   |--mrg->X     # 'merge ^/A A_COPY'

     |       |
16   |       X

17   |       |->X  # side-branch 'BRANCH_DEV2'
18   |       |  X
19   |       X<-|

20   X       |
21   |->X    |     # side-branch 'TRUNK_DEV2'
22   |  X    |
23   X<-|    |
     |       |
24   X<-mrg--|     # 'merge --reintegrate ^/A_COPY A'
     |       |     # no keep-alive dance here

25   |       X

26   |       |->X  # side-branch 'BRANCH_DEV3'
27   |       |  X
28   |       X<-|

29   X       |
30   |->X    |
31   |  X    |
32   X<-|    |     # side-branch 'TRUNK_DEV3'
     |       |

WC   |--mrg->X     # 'merge ^/A A_COPY'



and the tail output for the last merge is:

DBG: merge.c:8464: r23:24 mi added:
DBG:   /A_COPY:2-23
DBG: merge.c:8633: Skipping reflective revision r24
--- Merging r15 through r23 into 'A_COPY':
A    A_COPY/TrunkFix2
A    A_COPY/NewViaTRUNK_DEV2
 G   A_COPY
--- Merging r25 through r32 into 'A_COPY':
A    A_COPY/TrunkFix3
A    A_COPY/NewViaTRUNK_DEV3
 G   A_COPY
--- Recording mergeinfo for merge of r15 through r32 into 'A_COPY':
 G   A_COPY


mergeinfo on 'A':
    /A_COPY:2-23
    /BRANCH_DEV1:8-9
    /BRANCH_DEV2:17-18
    /TRUNK_DEV1:12-13
    /TRUNK_DEV2:21-22
    /TRUNK_DEV3:30-31
mergeinfo on 'A_COPY@BASE':
    /A:2-14
    /BRANCH_DEV1:8-9
    /BRANCH_DEV2:17-18
    /BRANCH_DEV3:26-27
    /TRUNK_DEV1:12-13
mergeinfo on 'A_COPY' (working version):
    /A:2-32
    /BRANCH_DEV1:8-9
    /BRANCH_DEV2:17-18
    /BRANCH_DEV3:26-27
    /TRUNK_DEV1:12-13
    /TRUNK_DEV2:21-22
    /TRUNK_DEV3:30-31


- Julian

Re: Implicit keep-alive after reintegrate merge

Posted by Julian Foad <ju...@btopenworld.com>.
Hi Paul.

Here's an updated patch, which is both simpler (now that I learnt more about how the existing code works) and contains tests that demonstrate the functionality (merge_reintegrate_tests.py 18 and 19).

I think merge_reintegrate_tests.py 19 demonstrates it better than 1000 words so I'll leave it at that for today.

I'll respond to your last response (below) maybe tomorrow.


- Julian



----- Original Message -----
> From: Paul Burba <pt...@gmail.com>
> To: Julian Foad <ju...@btopenworld.com>
> Cc: "dev@subversion.apache.org" <de...@subversion.apache.org>
> Sent: Wednesday, 11 January 2012, 19:57
> Subject: Re: Implicit keep-alive after reintegrate merge
> 
> On Wed, Jan 11, 2012 at 7:43 AM, Julian Foad <ju...@btopenworld.com> 
> wrote:
>>  Hi Paul.  Thanks for your thoughts.
>> 
>>  Paul Burba wrote:
>> 
>>>  Julian Foad wrote:
>>>>   We can say for sure that when we reintegrated B to A (in A:40), 
> that
>>>>  will have added new mergeinfo on A describing merges from B.  
> However,
>>>>  if change A:40 had instead been a different merge into A, let's 
> say
>>>>  from C, it is still possible that merge might have brought along 
> some
>>>>  new mergeinfo describing merges from B, because of the way 
> mergeinfo
>>>>  is propagated from branch to branch.  Therefore, if we find that
>>>>  change A:40 adds new mergeinfo about merges from B, we cannot 
> simply
>>>>  say that A:40 describes a reintegrate merge from B.
>>> 
>>>  Absolutely!  There's also the similar case where 'B' is 
> reintegrated
>>>  to 'A', additional edits are made to 'A', and then the 
> whole thing is
>>>  committed as r40. Now r40 represents both the reintegrate change and
>>>  unrelated edits -- but of course the smallest unit mergetracking works
>>>  with is a revision, so how does it classify r40?
>> 
>>  Actually, I don't think we need to be concerned about the "mixture 
> of merges and new changes" scenario, and here's why.  As we know, 
> merging is not a deterministic operation, and requires user input in general 
> [1].  When the user does a merge and commits the resulting working copy, the 
> result is tracked as a merge.  There's no conceptual way Subversion could 
> distinguish the two kinds of change without the user telling it.  The way to 
> tell Subversion about two separate changes is by putting them in two separate 
> revisions [2].
> 
> Hi Julian,
> 
> I agree that users making changes in addition to a merge in a single
> commit are asking for problems.  But the underlying problem of
> detection doesn't seem any different from what you described as:
> 
> "However, if change A:40 had instead been a different merge into A,
> let's say from C, it is still possible that merge might have brought
> along some new mergeinfo describing merges from B, because of the way
> mergeinfo is propagated from branch to branch.  Therefore, if we find
> that change A:40 adds new mergeinfo about merges from B, we cannot
> simply say that A:40 describes a reintegrate merge from B.  We need to
> look more closely.  That's what I'm currently working on."
> 
> Maybe I need to wait to see what your current work yields before
> commenting much more :-)  Nah, more comments below...
> 
>>  So, "Don't commit new changes with a merge" is a rule we have 
> to embrace if we want to make progress in defining more useful merge behaviour.  
> Or rather the rule is a bit softer: "You shouldn't commit new changes 
> with a merge, because Subversion won't track the new changes separately but 
> will assume they are part of the merge conflict resolution."
>> 
>>  In fact the only cases in which Subversion's behaviour will actually be 
> affected by this rule are multi-path or cyclic merges, that is the cases where 
> we want to avoid merging a change to a branch that already has "it".
>> 
>>  The only form of cyclic merge we've supported up till now has been the 
> reintegrate merge, and that is a special case because it is limited to cases 
> where it doesn't need to track individual changes, it simply compares at the 
> entire states of the two branches.  The effect is that it doesn't matter 
> whether the user included new changes along with any merges into the source 
> (child) branch.
>> 
>>  In a multi-path merge, such as the third step of A->B, A->C, B->C, 
> we don't yet support automatically skipping any changes coming via B that 
> have already been merged directly from A to C.  Subversion will still try to 
> merge that whole change from B to C, and the already-merged part of it will 
> conflict (logically if not physically).  The user's options are to avoid 
> merging that specific revision from B to C (perhaps by record-only merging it), 
> or to  deal with that conflict.  If a new change had been committed along with 
> that A->B merge, neither option successfully merges that new change to C.
>> 
>>  So the reintegrate merge doesn't care if you have mixed a merge with a 
> new change, and for the multi-path scenario the user already needs to keep 
> merges separate from new changes.
>> 
>>  Now, what about the case I'm addressing?  That is, the sync merge into 
> a branch that has already been reintegrated.  If the user wants to carry on 
> using the branch in this way, up till now it has been the user's 
> responsibility to tell Subversion to ignore the reintegrate merge commit, by the 
> record-only merge known as the "keep-alive dance".  If there had been 
> a new change mixed up with that reintegration, the user would be telling 
> Subversion to ignore it.  So here again we already had the rule that you have to 
> keep new changes separate.
>> 
>>  [1] Of course we have automatic text merge tools which give the user a good 
> guess at a likely
>>  answer, which often turns out to need no further editing in simple
>>  cases.
>> 
>>  [2] I don't believe it would be productive to invent a mechanism for
>>  recording two separate changes within one revision, and if we did the
>>  user would still have to tell Subversion about them.
>> 
>> 
>>>>   We need to look more closely.  That's what I'm currently 
> working on.
>>> 
>>>  This runs up against issue #2897 'Reflective merges are 
> faulty'.  You
>>>  seem to be acknowledging this with your comments in the patch itself:
>>> 
>>>    "Note that with all these improvements, the result might 
> approach a more
>>>    flexible merge system in which any reflected or cyclic changes are 
> skipped,
>>>    *BUT WOULD NOT COPE* (emphasis mine) with the case where a revision
>>>    in the source branch contains a mixture of changes merged from the 
> target
>>>    branch and other changes."
>>> 
>>>  But if we can't cope with that case how can we do anything 
> that's an
>>>  improvement over what we do today (i.e. just merge r40)?  We can't
>>>  skip r40 based on mergeinfo alone because there might be other
>>>  legitimate changes.
>> 
>>  There are two potential kinds of "other" changes.  New changes 
> mixed in with the merge are against the rules, as explained above.  Other merges 
> mixed in with that merge are, I think, legitimate.
> 
> Again, I agree that the first case is really one where the user is
> asking for trouble, so I can live without solving that.  Thing is, we
> both agree the second type of change should be detected.  What puzzles
> me is that if we can reliably detect the second type it seems we could
> detect the first type too...
> 
>>  If we can calculate that r40 contains nothing but changes merged directly 
> from branch B, then the Right Thing is to skip it.  If it contains other changes 
> (or if we can't determine for sure), then we can tell the user what's 
> happening (that it would be wrong to try to merge this change because it 
> contains "...") and we can either skip it or stop the merge.
>> 
>>  My contention is that that would be better than what we do today.  Note 
> that what we do today is unconditionally *attempt* to merge r40 even though 
> (with my new code) we could know for sure that it will conflict logically (and 
> will probably but not necessarily conflict physically).
> 
> I agree, sorry if I came across as opposed to improvement in this
> space, I'm not.
> 
>>  Another way of looking at it is that when we merge a revision that will 
> logically conflict with the target, today the default action is to attempt to 
> merge it anyway, and my proposed action is to tell the user what's happening 
> and allow them to take corrective action in advance.  The user could choose a 
> different merge source, or perhaps realize that they were trying to merge into 
> the wrong target branch, or something else.
>> 
>>  Does that make sense?
> 
> A random thought about your patch: The skip/stop logic kicks in when
> performing a cherry pick merge.  It might be better if it only applies
> when doing sync merges.  Reasoning: If a user explicitly states the
> revisions she wants to merge, I think we should assume she knows what
> she wants.  Also, if our detection method isn't foolproof we need a
> way for the user to merge a revision we want to skip (or we error out
> on), but we still want mergetracking active so using --ignore-ancestry
> isn't an option.  If the user is relying on Subversion to select the
> appropriate revisions (i.e. a sync merge) then your patch's behavior
> makes more sense.
> 
> Paul
> 

Re: Implicit keep-alive after reintegrate merge

Posted by Julian Foad <ju...@btopenworld.com>.
Paul Burba wrote on 2012-01-11:

> A random thought about your patch: The skip/stop logic kicks in when
> performing a cherry pick merge.  It might be better if it only applies
> when doing sync merges.  Reasoning: If a user explicitly states the
> revisions she wants to merge, I think we should assume she knows what
> she wants.

That could make sense, but your reasoning should apply in general and not just to this patch.

It's not the UI or API semantics we currently have for cherry-picks.  Currently, we prune already-merged revisions from a cherry-pick revision range just the same as from a sync merge range.  Nowhere in the code do we codify the difference between a cherry-pick and a sync merge, beyond where we parse the command-line arguments.  From that point on, we simply pass around a list of revision ranges to consider for merging.  The sync-merge UI populates the list with one range; the cherry-pick UI populates the list with one or more ranges.

I'm open to changing that.

>  Also, if our detection method isn't foolproof we need a
> way for the user to merge a revision we want to skip [...]

Agreed in principle, but I intend the detection to be precise.

- Julian

Re: Implicit keep-alive after reintegrate merge

Posted by Paul Burba <pt...@gmail.com>.
On Wed, Jan 11, 2012 at 7:43 AM, Julian Foad <ju...@btopenworld.com> wrote:
> Hi Paul.  Thanks for your thoughts.
>
> Paul Burba wrote:
>
>> Julian Foad wrote:
>>>  We can say for sure that when we reintegrated B to A (in A:40), that
>>> will have added new mergeinfo on A describing merges from B.  However,
>>> if change A:40 had instead been a different merge into A, let's say
>>> from C, it is still possible that merge might have brought along some
>>> new mergeinfo describing merges from B, because of the way mergeinfo
>>> is propagated from branch to branch.  Therefore, if we find that
>>> change A:40 adds new mergeinfo about merges from B, we cannot simply
>>> say that A:40 describes a reintegrate merge from B.
>>
>> Absolutely!  There's also the similar case where 'B' is reintegrated
>> to 'A', additional edits are made to 'A', and then the whole thing is
>> committed as r40. Now r40 represents both the reintegrate change and
>> unrelated edits -- but of course the smallest unit mergetracking works
>> with is a revision, so how does it classify r40?
>
> Actually, I don't think we need to be concerned about the "mixture of merges and new changes" scenario, and here's why.  As we know, merging is not a deterministic operation, and requires user input in general [1].  When the user does a merge and commits the resulting working copy, the result is tracked as a merge.  There's no conceptual way Subversion could distinguish the two kinds of change without the user telling it.  The way to tell Subversion about two separate changes is by putting them in two separate revisions [2].

Hi Julian,

I agree that users making changes in addition to a merge in a single
commit are asking for problems.  But the underlying problem of
detection doesn't seem any different from what you described as:

"However, if change A:40 had instead been a different merge into A,
let's say from C, it is still possible that merge might have brought
along some new mergeinfo describing merges from B, because of the way
mergeinfo is propagated from branch to branch.  Therefore, if we find
that change A:40 adds new mergeinfo about merges from B, we cannot
simply say that A:40 describes a reintegrate merge from B.  We need to
look more closely.  That's what I'm currently working on."

Maybe I need to wait to see what your current work yields before
commenting much more :-)  Nah, more comments below...

> So, "Don't commit new changes with a merge" is a rule we have to embrace if we want to make progress in defining more useful merge behaviour.  Or rather the rule is a bit softer: "You shouldn't commit new changes with a merge, because Subversion won't track the new changes separately but will assume they are part of the merge conflict resolution."
>
> In fact the only cases in which Subversion's behaviour will actually be affected by this rule are multi-path or cyclic merges, that is the cases where we want to avoid merging a change to a branch that already has "it".
>
> The only form of cyclic merge we've supported up till now has been the reintegrate merge, and that is a special case because it is limited to cases where it doesn't need to track individual changes, it simply compares at the entire states of the two branches.  The effect is that it doesn't matter whether the user included new changes along with any merges into the source (child) branch.
>
> In a multi-path merge, such as the third step of A->B, A->C, B->C, we don't yet support automatically skipping any changes coming via B that have already been merged directly from A to C.  Subversion will still try to merge that whole change from B to C, and the already-merged part of it will conflict (logically if not physically).  The user's options are to avoid merging that specific revision from B to C (perhaps by record-only merging it), or to  deal with that conflict.  If a new change had been committed along with that A->B merge, neither option successfully merges that new change to C.
>
> So the reintegrate merge doesn't care if you have mixed a merge with a new change, and for the multi-path scenario the user already needs to keep merges separate from new changes.
>
> Now, what about the case I'm addressing?  That is, the sync merge into a branch that has already been reintegrated.  If the user wants to carry on using the branch in this way, up till now it has been the user's responsibility to tell Subversion to ignore the reintegrate merge commit, by the record-only merge known as the "keep-alive dance".  If there had been a new change mixed up with that reintegration, the user would be telling Subversion to ignore it.  So here again we already had the rule that you have to keep new changes separate.
>
> [1] Of course we have automatic text merge tools which give the user a good guess at a likely
> answer, which often turns out to need no further editing in simple
> cases.
>
> [2] I don't believe it would be productive to invent a mechanism for
> recording two separate changes within one revision, and if we did the
> user would still have to tell Subversion about them.
>
>
>>>  We need to look more closely.  That's what I'm currently working on.
>>
>> This runs up against issue #2897 'Reflective merges are faulty'.  You
>> seem to be acknowledging this with your comments in the patch itself:
>>
>>   "Note that with all these improvements, the result might approach a more
>>   flexible merge system in which any reflected or cyclic changes are skipped,
>>   *BUT WOULD NOT COPE* (emphasis mine) with the case where a revision
>>   in the source branch contains a mixture of changes merged from the target
>>   branch and other changes."
>>
>> But if we can't cope with that case how can we do anything that's an
>> improvement over what we do today (i.e. just merge r40)?  We can't
>> skip r40 based on mergeinfo alone because there might be other
>> legitimate changes.
>
> There are two potential kinds of "other" changes.  New changes mixed in with the merge are against the rules, as explained above.  Other merges mixed in with that merge are, I think, legitimate.

Again, I agree that the first case is really one where the user is
asking for trouble, so I can live without solving that.  Thing is, we
both agree the second type of change should be detected.  What puzzles
me is that if we can reliably detect the second type it seems we could
detect the first type too...

> If we can calculate that r40 contains nothing but changes merged directly from branch B, then the Right Thing is to skip it.  If it contains other changes (or if we can't determine for sure), then we can tell the user what's happening (that it would be wrong to try to merge this change because it contains "...") and we can either skip it or stop the merge.
>
> My contention is that that would be better than what we do today.  Note that what we do today is unconditionally *attempt* to merge r40 even though (with my new code) we could know for sure that it will conflict logically (and will probably but not necessarily conflict physically).

I agree, sorry if I came across as opposed to improvement in this
space, I'm not.

> Another way of looking at it is that when we merge a revision that will logically conflict with the target, today the default action is to attempt to merge it anyway, and my proposed action is to tell the user what's happening and allow them to take corrective action in advance.  The user could choose a different merge source, or perhaps realize that they were trying to merge into the wrong target branch, or something else.
>
> Does that make sense?

A random thought about your patch: The skip/stop logic kicks in when
performing a cherry pick merge.  It might be better if it only applies
when doing sync merges.  Reasoning: If a user explicitly states the
revisions she wants to merge, I think we should assume she knows what
she wants.  Also, if our detection method isn't foolproof we need a
way for the user to merge a revision we want to skip (or we error out
on), but we still want mergetracking active so using --ignore-ancestry
isn't an option.  If the user is relying on Subversion to select the
appropriate revisions (i.e. a sync merge) then your patch's behavior
makes more sense.

Paul

Re: Implicit keep-alive after reintegrate merge

Posted by Julian Foad <ju...@btopenworld.com>.
Hi Paul.  Thanks for your thoughts.

Paul Burba wrote:

> Julian Foad wrote:
>>  We can say for sure that when we reintegrated B to A (in A:40), that 
>> will have added new mergeinfo on A describing merges from B.  However, 
>> if change A:40 had instead been a different merge into A, let's say 
>> from C, it is still possible that merge might have brought along some 
>> new mergeinfo describing merges from B, because of the way mergeinfo 
>> is propagated from branch to branch.  Therefore, if we find that 
>> change A:40 adds new mergeinfo about merges from B, we cannot simply 
>> say that A:40 describes a reintegrate merge from B.
> 
> Absolutely!  There's also the similar case where 'B' is reintegrated
> to 'A', additional edits are made to 'A', and then the whole thing is
> committed as r40. Now r40 represents both the reintegrate change and
> unrelated edits -- but of course the smallest unit mergetracking works
> with is a revision, so how does it classify r40?

Actually, I don't think we need to be concerned about the "mixture of merges and new changes" scenario, and here's why.  As we know, merging is not a deterministic operation, and requires user input in general [1].  When the user does a merge and commits the resulting working copy, the result is tracked as a merge.  There's no conceptual way Subversion could distinguish the two kinds of change without the user telling it.  The way to tell Subversion about two separate changes is by putting them in two separate revisions [2].

So, "Don't commit new changes with a merge" is a rule we have to embrace if we want to make progress in defining more useful merge behaviour.  Or rather the rule is a bit softer: "You shouldn't commit new changes with a merge, because Subversion won't track the new changes separately but will assume they are part of the merge conflict resolution."

In fact the only cases in which Subversion's behaviour will actually be affected by this rule are multi-path or cyclic merges, that is the cases where we want to avoid merging a change to a branch that already has "it".

The only form of cyclic merge we've supported up till now has been the reintegrate merge, and that is a special case because it is limited to cases where it doesn't need to track individual changes, it simply compares at the entire states of the two branches.  The effect is that it doesn't matter whether the user included new changes along with any merges into the source (child) branch.

In a multi-path merge, such as the third step of A->B, A->C, B->C, we don't yet support automatically skipping any changes coming via B that have already been merged directly from A to C.  Subversion will still try to merge that whole change from B to C, and the already-merged part of it will conflict (logically if not physically).  The user's options are to avoid merging that specific revision from B to C (perhaps by record-only merging it), or to  deal with that conflict.  If a new change had been committed along with that A->B merge, neither option successfully merges that new change to C.

So the reintegrate merge doesn't care if you have mixed a merge with a new change, and for the multi-path scenario the user already needs to keep merges separate from new changes.

Now, what about the case I'm addressing?  That is, the sync merge into a branch that has already been reintegrated.  If the user wants to carry on using the branch in this way, up till now it has been the user's responsibility to tell Subversion to ignore the reintegrate merge commit, by the record-only merge known as the "keep-alive dance".  If there had been a new change mixed up with that reintegration, the user would be telling Subversion to ignore it.  So here again we already had the rule that you have to keep new changes separate.

[1] Of course we have automatic text merge tools which give the user a good guess at a likely 
answer, which often turns out to need no further editing in simple 
cases.

[2] I don't believe it would be productive to invent a mechanism for 
recording two separate changes within one revision, and if we did the 
user would still have to tell Subversion about them.


>>  We need to look more closely.  That's what I'm currently working on.
> 
> This runs up against issue #2897 'Reflective merges are faulty'.  You
> seem to be acknowledging this with your comments in the patch itself:
> 
>   "Note that with all these improvements, the result might approach a more
>   flexible merge system in which any reflected or cyclic changes are skipped,
>   *BUT WOULD NOT COPE* (emphasis mine) with the case where a revision
>   in the source branch contains a mixture of changes merged from the target
>   branch and other changes."
> 
> But if we can't cope with that case how can we do anything that's an
> improvement over what we do today (i.e. just merge r40)?  We can't
> skip r40 based on mergeinfo alone because there might be other
> legitimate changes.

There are two potential kinds of "other" changes.  New changes mixed in with the merge are against the rules, as explained above.  Other merges mixed in with that merge are, I think, legitimate.

If we can calculate that r40 contains nothing but changes merged directly from branch B, then the Right Thing is to skip it.  If it contains other changes (or if we can't determine for sure), then we can tell the user what's happening (that it would be wrong to try to merge this change because it contains "...") and we can either skip it or stop the merge.

My contention is that that would be better than what we do today.  Note that what we do today is unconditionally *attempt* to merge r40 even though (with my new code) we could know for sure that it will conflict logically (and will probably but not necessarily conflict physically).

Another way of looking at it is that when we merge a revision that will logically conflict with the target, today the default action is to attempt to merge it anyway, and my proposed action is to tell the user what's happening and allow them to take corrective action in advance.  The user could choose a different merge source, or perhaps realize that they were trying to merge into the wrong target branch, or something else.

Does that make sense?

- Julian

Re: Implicit keep-alive after reintegrate merge

Posted by Paul Burba <pt...@gmail.com>.
On Mon, Jan 9, 2012 at 6:14 AM, Julian Foad <ju...@btopenworld.com> wrote:
> Merging would be simpler if the user didn't have to think about doing the "keep-alive dance" after a reintegrate [1].
>
> I'm trying to teach the sync merge to detect when a reintegrate has been done, so that we could avoid the need for r45 in the following typical sequence:
>
> | Rev  Branch Commit
> |      A   B
> |--------------------
> | r10  X
> |      | \
> | r20  |   X  "New branch B, a copy of A."
> |      |   |
> | r30  |   X  "Modify B."
> |      | / |
> | r40  X   |  "Reintegrate from B into A."
> |      | \ |
> | r45* |   X  "Record-only merge to keep B alive."
> |      |   |
> | r50  X   |  "More work in A."
> |      | \ |
> | r60  |   X  "Sync from A into B."
>
> A patch in progress is attached.  It makes 'merge' detect when one of the revision ranges to be merged would include changes that have
> already been merged in the opposite direction.  At present it stops the merge, giving a friendly error message.
>
> I would like to make it skip the particular revision(s) and continue to merge all the other revisions in the range, but first we need to get the detection right.

Hi Julian,

Let me first say that I am all for trying to make this work.  The need
to delete and recreate a branch or use --record-only to block it,
after reintegrating is not very user-friendly.  That said, I think
you'll find this trickier than it looks at first glance...

> Using the above example, if we don't do the r45 record-only merge, then during the last sync merge (which will be committed in r60), the normal merge tracking code first determines that the revision range to be merged is r11-r50 [2].  Then the new code detects that this incoming change brings in new mergeinfo about merges *from* the current target branch B.  Therefore the new code will stop and complain.
>
> That's all right for simple cases but it's not good enough.
>
> We can say for sure that when we reintegrated B to A (in A:40), that will have added new mergeinfo on A describing merges from B.  However, if change A:40 had instead been a different merge into A, let's say from C, it is still possible that merge might have brought along some new mergeinfo describing merges from B, because of the way mergeinfo is propagated from branch to branch.  Therefore, if we find that change A:40 adds new mergeinfo about merges from B, we cannot simply say that A:40 describes a reintegrate merge from B.

Absolutely!  There's also the similar case where 'B' is reintegrated
to 'A', additional edits are made to 'A', and then the whole thing is
committed as r40. Now r40 represents both the reintegrate change and
unrelated edits -- but of course the smallest unit mergetracking works
with is a revision, so how does it classify r40?

> We need to look more closely.  That's what I'm currently working on.

This runs up against issue #2897 'Reflective merges are faulty'.  You
seem to be acknowledging this with your comments in the patch itself:

  "Note that with all these improvements, the result might approach a more
  flexible merge system in which any reflected or cyclic changes are skipped,
  *BUT WOULD NOT COPE* (emphasis mine) with the case where a revision
  in the source branch contains a mixture of changes merged from the target
  branch and other changes."

But if we can't cope with that case how can we do anything that's an
improvement over what we do today (i.e. just merge r40)?  We can't
skip r40 based on mergeinfo alone because there might be other
legitimate changes.  If we merge r40 that means we determined that
*some* of the changes in that revision are legitimate.  Do we apply
only the fractional part of r40 that is legitimate?  In other words,
this whole thing appears to be a non-starter unless we can fix issue
#2897.  Is that your intention?

Paul

> Comments?
>
> - Julian
>
>
> [1] See the end of <http://wiki.apache.org/subversion/KeepingReintegratedBranchAlive>.
> [2] Assume HEAD is r50 at this time.