You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Julian Foad <ju...@btopenworld.com> on 2012/07/09 22:16:59 UTC

Re: Subtree mergeinfo -- what I learnt at the Hackathon

To move forward and decide what behaviour is right, we need to be able to compare the 1.7 behaviour with the proposed behaviour in *specific* scenarios.  So we need to be able to enumerate the specific scenarios that we mean by the general term "merging with subtree mergeinfo".  This is what I am doing currently.

The twist is that the best way to enumerate all the possibilities for 1.7 merges, and the best way to enumerate all the possibilities for an ideal symmetric merge, are different.  For example, the 1.7 non-reintegrate merge (let's say from branch A to branch B) doesn't look at any mergeinfo indicating merges from B to A, so the primary way to categorize those cases is by their last complete A->B merge (of the root of the merge, and of a subtree).  B->A merges can also affect the result if performed after the last A->B merge, so we categorize secondarily by their B->A mergeinfo.  By contrast, an ideal symmetric merge only cares about the last complete merge (A->B or B->A); any earlier merges in the other direction make no difference at all.

After separately enumerating the 1.7 cases and the (ideal) symmetric merge cases, we can split or combine categories as necessary to merge these cases into a single list.  Then we can decide what constitutes "same" or "backward-compatible" behaviour in each case.  Don't be suspicious -- I'm not trying to twist the word "compatible" to mean something else -- rather, what I mean is that not all of the possible scenarios are ones in which the 1.7 behaviour is "good".  For example, we know that in 1.7 if the last complete merge was A->B (in r10, say) and then you reintegrate a subtree (B/foo -> A/foo in r20), then try to sync A->B again, Subversion will not notice that r20 merge and will attempt to re-merge r19:20 of A/foo into B/foo, which "works" only in rather trivial cases (due to auto-resolving of some types of duplicate change) and I trust we can agree it is in general wrong.

So far, I've only been compiling this categorization from my head; the next step is to write tests to confirm or correct this.


CATEGORIZING SUBTREE MERGES

For the purposes of this categorization, we consider:

  * The merging history of the root 'R' (the root node of the requested merge source and target trees), and of a subtree 'S' (a single node or subtree whose merging history differs from that of the root node in a significant way).  Only one subtree is considered; multiple subtrees are assumed to be handled independently, even if they are nested (such as root 'A', subtree 'A/D' whose history differs from 'A', and subtree 'A/D/foo' whose history differs from 'A/D').

  * The "last complete merge" of the Root (in one or both directions), and the "last complete merge" of the Subtree (in one or both directions).  The "last complete merge" in direction A->B means the last revision R for which all revisions on A, up to and including R, are (currently) recorded as having been merged from A to B.  This state could have been reached through any kind of merge or sequence of merges; all that matters is what the current mergeinfo says has been merged.

  * There is an assumption that the actual content changes were in fact merged in accordance with what the mergeinfo says, subject to any editing that was required to resolve any conflicts detected by the merge process and any semantic conflicts.


CATEGORIZING SUBTREE MERGES: 1.7 Non-Reintegrate

These cases are for a reintegrate merge A->B.

In each row of this table, up to two Root merges are indicated, and their relative ordering is significant; similarly for Subtree merges.  The ordering of R merges relative to S merges is not significant.

    Root     | Subtree   | Behaviour
    ---------+-----------+-----------------------------------
 1. never    | same      | OK (not a subtree scenario)
             +-----------+-----------------------------------
 2.          | [S<] S>   | Merge all needed changes
             +-----------+-----------------------------------
 3.          | [S>] S<   | All needed; & some duplicates in S
    ---------+-----------+-----------------------------------
 4. [R<] R>  | same      | OK (not a subtree scenario)
             +-----------+-----------------------------------
 5.          | never     | Merge all needed changes 
 6.          | [S<] S> * |
             +-----------+-----------------------------------
 7.          | [S>] S<   | All needed; & some duplicates in S
    ---------+-----------+-----------------------------------
 8. [R>] R<  | same      | All needed; & some duplicates in R
             |           | (not a subtree scenario)
             +-----------+-----------------------------------
 9.          | none      | All needed; & some duplicates in R 
10.          | [S<] S>   |
             +-----------+-----------------------------------
11.          | [S>] S< * | All needed; & some duplicates in R and S
    ---------+-----------+----------------------------------- 

Key:
  *      -- S> not at same revision as R>, or S< not same as R<.
  R>     -- last complete Root merge in direction A->B.
  R<     -- last complete Root merge in direction B->A.
  never  -- never been merged in either direction since the YCA of A and B.
  [S<]   -- shorthand for both of the cases: no 'S<' merge and an 'S<' merge.
  duplicates -- changes that are already present in the target.

Example:  Row 9 represents the case where the Root's last complete merge was in the B->A direction, and its last complete A->B merge was earlier or never; and the Subtree likewise.  The root's last complete merge was before or after but not the same as the subtree's. 


CATEGORIZING SUBTREE MERGES: 1.7 Reintegrate

These cases are for a reintegrate merge B->A.

Treat this table as just a rough first draft for now; I'm not sure if this is the best way to categorize the reintegrate cases, and I need to investigate this more thoroughly and test it.

    Root     | Subtree   | Behaviour
    ---------+-----------+-----------------------------------
 1. never    | same      | OK (not a subtree scenario) 
             +-----------+-----------------------------------
 2.          | [S<] S>   | Rejected
 3.          |    S<     | Rejected??
 4.          |  S>  S<   | Rejected
    ---------+-----------+-----------------------------------
 5. [R<] R>  | same      | OK (not a subtree scenario) 
             +-----------+-----------------------------------
 6.          | never     | Rejected 
 7.          | [S<] S> * | Rejected
 8.          |    S<     | Rejected?? 
 9.          |  S>  S<   | Rejected
    ---------+-----------+-----------------------------------
10. [R>] R<  | same      | ?? (not a subtree scenario)
             +-----------+-----------------------------------
11.          | never     | Rejected?? )
12.          | [S<] S>   | Rejected   > non-reint. cases
13.          | [S>] S<   | Rejected   )
    ---------+-----------+----------------------------------- 


CATEGORIZING SUBTREE MERGES: Ideal Symmetric Merge

Now we consider the ideal symmetric merge (not what's currently implemented).

This is primarily concerned with the last complete merge of the root, in whichever direction that was, and similarly the last complete merge of the subtree.  The (earlier) last complete merge of the root in the other direction is not significant, nor is that of the subtree.

    Root  | Subtree       | Ideal behaviour
    ------+---------------+----------------------------------
 1.       | same          | OK (not a subtree scenario)
    R> or +---------------+----------------------------------
 2. never | S> later      | Merge all needed changes
 3.       | S< later      | 
    ------+---------------+ 
 4. R>    | never         |
 5.       | S> earlier    |
 6.       | S< earlier    |
    ------+---------------+----------------------------------
 7.       | same          | OK (not a subtree scenario)
    R<    +---------------+----------------------------------
 8.       | S> later      | Merge all needed changes
 9.       | S< later      |
10.       | never         |
11.       | S> earlier    |
12.       | S< earlier    |
    ------+---------------+---------------------------------- 

Key:
  R>       The last complete merge of Root was in the A->B direction.
  R<       The last complete merge of Root was in the B->A direction.
  never    Has never been merged between A & B in either direction.
  same     Same time and same direction as last complete merge of Root.
  earlier  Earlier than the last complete merge of the Root.
  later    Later than the last complete merge of the Root.


I'm working on integarting these tables, and also on writing tests (or rather scenarios using the test suite framework).

Please let me know if this seems like the approach we need, or any other thoughts.

- Julian
--
Certified & Supported Apache Subversion Downloads: http://www.wandisco.com/subversion/download



I (Julian Foad) wrote:
> [...] I'm not at all demanding we break backward
> compatibility.  Sorry if it sounded like it.  I'm just saying that
> we're proposing to change the behaviour of the plain merge command,
> and in doing that we need to work out what the details of the new
> behaviour will be, and this thread is helping us to do just that.
> I ended up with a bias towards trying to move toward a more rename-
> friendly approach, but I recognise we can't get there yet so the
> "follow each node's own ancestry" idea is just an idea for the
> future.  We need a simpler approach for now.
> 
> Stefan Sperling wrote:
[...]
>> Agreed. Ideally, the symmetric merge will support all currently supported
>> use cases, without throwing errors at users or requiring new command-line
>> switches.
>> 
>> I haven't yet made up my mind about interim measures for 1.8 though.
>> I suppose if symmetric merge won't support all currently supported use cases
>> in 1.8, we could keep the --symmetric option in place for 1.8, and drop it 
>> in 1.9 or later once the symmetric merge code can handle all use cases?


Re: Subtree mergeinfo -- what I learnt at the Hackathon

Posted by Paul Burba <pt...@gmail.com>.
On Wed, Jul 11, 2012 at 7:44 AM, Julian Foad <ju...@btopenworld.com> wrote:
> Branko Čibej wrote:
>> On 10.07.2012 23:40, Julian Foad wrote:
>>> I think the essence of this line of thought is:
>
>>>
>>> We set up all of the possible mergeinfo scenarios, and we see what 'merge' does, and we see what 1.7 'merge --reintegrate' does, and we debate what cases are 'supported' versus knowingly 'unsupported', and we see what an ideal symmetric merge would do.  Then we evaluate the current 'symmetric'[1] implementation against that: does it do "the same as" or "better than" or "worse than" what a user can do with the existing merge (assuming he/she chooses the "reintegrate" option appropriately).
>>>
>>> It's not easy to set up all of the possible scenarios: reintegrating the root but excluding a subtree, for example, can't be done in a single merge command.  Does that mean that scenario is uninteresting?  No, I don't think so.  What we can do is eliminate the existing merge command from the set-up phases of the test scenarios, and set up the desired mergeinfo directly: "faking it", if you will.  Doing it that way separates the concerns: the question of what the merge command will do, given a certain scenario, is a separate question from how we can use merges to create that scenario.
>>>
>>> The question I have at the moment is: Does this approach to evaluation make sense to you?
>>
>> Am I correct in assuming that most of this discussion is a consequence
>> of the current implementation of mergeinfo inheritance? I.e., that there
>> are a certain number of hoops one needs to jump through in order to
>> determine which, if any, mergeinfo applies to a particular file (or
>> subtree)?
>
> No; the inheritance algorithm is simple enough.
>
> This discussion is about how to compare what "works" now (in 1.7) versus what "works the same" or "works better" or "works worse" in a proposed "symmetric" implementation.  It's a matter of devising a way to enumerate all the possible the cases systematically, so that we have a way to answer the question "do all the cases that work in 1.7 still work in the proposed implementation?"

Totally agree with what you are doing here, particularly the "do all
the cases that work in 1.7 still work in the proposed implementation?"
part.  What remains an open question is what we do if something that
worked in 1.7 no longer works in the proposed implementation -- and
obviously we know about some of these cases already.

-- 
Paul T. Burba
CollabNet, Inc. -- www.collab.net -- Enterprise Cloud Development
Skype: ptburba

Re: Subtree mergeinfo -- what I learnt at the Hackathon

Posted by Branko Čibej <br...@wandisco.com>.
On 11.07.2012 12:44, Julian Foad wrote:
> Branko Čibej wrote:
>> Am I correct in assuming that most of this discussion is a consequence
>> of the current implementation of mergeinfo inheritance? I.e., that there
>> are a certain number of hoops one needs to jump through in order to
>> determine which, if any, mergeinfo applies to a particular file (or
>> subtree)?
> No; the inheritance algorithm is simple enough.
>
> This discussion is about how to compare what "works" now (in 1.7) versus what "works the same" or "works better" or "works worse" in a proposed "symmetric" implementation.  It's a matter of devising a way to enumerate all the possible the cases systematically, so that we have a way to answer the question "do all the cases that work in 1.7 still work in the proposed implementation?"

Thanks; and sorry for the misunderstanding. I plead airport lounge at 5
a.m. after 3 hours' sleep.

-- Brane


-- 
Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download


Re: Subtree mergeinfo -- what I learnt at the Hackathon

Posted by Julian Foad <ju...@btopenworld.com>.
Branko Čibej wrote:
> On 10.07.2012 23:40, Julian Foad wrote:
>> I think the essence of this line of thought is:

>>
>> We set up all of the possible mergeinfo scenarios, and we see what 'merge' does, and we see what 1.7 'merge --reintegrate' does, and we debate what cases are 'supported' versus knowingly 'unsupported', and we see what an ideal symmetric merge would do.  Then we evaluate the current 'symmetric'[1] implementation against that: does it do "the same as" or "better than" or "worse than" what a user can do with the existing merge (assuming he/she chooses the "reintegrate" option appropriately).
>>
>> It's not easy to set up all of the possible scenarios: reintegrating the root but excluding a subtree, for example, can't be done in a single merge command.  Does that mean that scenario is uninteresting?  No, I don't think so.  What we can do is eliminate the existing merge command from the set-up phases of the test scenarios, and set up the desired mergeinfo directly: "faking it", if you will.  Doing it that way separates the concerns: the question of what the merge command will do, given a certain scenario, is a separate question from how we can use merges to create that scenario.
>>
>> The question I have at the moment is: Does this approach to evaluation make sense to you?
>
> Am I correct in assuming that most of this discussion is a consequence
> of the current implementation of mergeinfo inheritance? I.e., that there
> are a certain number of hoops one needs to jump through in order to
> determine which, if any, mergeinfo applies to a particular file (or
> subtree)?

No; the inheritance algorithm is simple enough.

This discussion is about how to compare what "works" now (in 1.7) versus what "works the same" or "works better" or "works worse" in a proposed "symmetric" implementation.  It's a matter of devising a way to enumerate all the possible the cases systematically, so that we have a way to answer the question "do all the cases that work in 1.7 still work in the proposed implementation?"

- Julian


> Assuming that retrieving merginfo is expensive (which I gather it is
> right now) and you want to minimize the number of explicit mergeinfo
> lookups, your approach makes sense to me.
>
>> [1] 'symmetric' -- Urgh! -- we must change the name as the current incarnation is not very symmetric at all.
>
> You can always try --symmetricer ...
>
> -- Brane

Re: Subtree mergeinfo -- what I learnt at the Hackathon

Posted by Julian Foad <ju...@btopenworld.com>.
Johan Corveleyn wrote:

> On Wed, Jul 11, 2012 at 6:19 AM, Branko Čibej wrote:
>>  Am I correct in assuming that most of this discussion is a consequence
>>  of the current implementation of mergeinfo inheritance? I.e., that there
>>  are a certain number of hoops one needs to jump through in order to
>>  determine which, if any, mergeinfo applies to a particular file (or
>>  subtree)?
>> 
>>  Assuming that retrieving merginfo is expensive (which I gather it is
>>  right now) and you want to minimize the number of explicit mergeinfo
>>  lookups, your approach makes sense to me.
> 
> I assume Julian's remarks are not related to "expensiveness of
> retrieving mergeinfo", but to the difficulty of *creating* certain
> mergeinfo situations (difficult to create through normal UI commands),
> although these configurations are still interesting to have as a test
> fixture (and the merge logic should be able to handle them in a sane
> way -- and we should be able to specify what the behavior should be).
> 
> +1, I'd say.
> 
> Perhaps we should have both:
> 
> - Tests with artificially created mergeinfo (a bit like "unit tests":
> we give the merge algorithm some input, and expect certain output; we
> don't care about how a user would create that input -- also a bit
> comparable to "synthetic benchmarks").
> 
> - Tests with a full scenario of user commands (a bit like "integration
> tests": we consider the entire application, and treat it like a black
> box; we only interact with actual user commands -- a bit comparable to
> "real-world benchmarks").
> 
> 
> BTW, great effort, Julian, in trying to categorize these scenarios a
> bit, and trying to describe the current and desired behaviors, in a
> rather concise manner. I agree this is the best way forward right now
> ...
> 

Thanks, Johan, and yes you've described my position exactly.

- Julian

Re: Subtree mergeinfo -- what I learnt at the Hackathon

Posted by Johan Corveleyn <jc...@gmail.com>.
On Wed, Jul 11, 2012 at 6:19 AM, Branko Čibej <br...@wandisco.com> wrote:
> On 10.07.2012 23:40, Julian Foad wrote:
>> I think the essence of this line of thought is:
>>
>> We set up all of the possible mergeinfo scenarios, and we see what 'merge' does, and we see what 1.7 'merge --reintegrate' does, and we debate what cases are 'supported' versus knowingly 'unsupported', and we see what an ideal symmetric merge would do.  Then we evaluate the current 'symmetric'[1] implementation against that: does it do "the same as" or "better than" or "worse than" what a user can do with the existing merge (assuming he/she chooses the "reintegrate" option appropriately).
>>
>> It's not easy to set up all of the possible scenarios: reintegrating the root but excluding a subtree, for example, can't be done in a single merge command.  Does that mean that scenario is uninteresting?  No, I don't think so.  What we can do is eliminate the existing merge command from the set-up phases of the test scenarios, and set up the desired mergeinfo directly: "faking it", if you will.  Doing it that way separates the concerns: the question of what the merge command will do, given a certain scenario, is a separate question from how we can use merges to create that scenario.
>>
>>
>> The question I have at the moment is: Does this approach to evaluation make sense to you?
>
> Am I correct in assuming that most of this discussion is a consequence
> of the current implementation of mergeinfo inheritance? I.e., that there
> are a certain number of hoops one needs to jump through in order to
> determine which, if any, mergeinfo applies to a particular file (or
> subtree)?
>
> Assuming that retrieving merginfo is expensive (which I gather it is
> right now) and you want to minimize the number of explicit mergeinfo
> lookups, your approach makes sense to me.

I assume Julian's remarks are not related to "expensiveness of
retrieving mergeinfo", but to the difficulty of *creating* certain
mergeinfo situations (difficult to create through normal UI commands),
although these configurations are still interesting to have as a test
fixture (and the merge logic should be able to handle them in a sane
way -- and we should be able to specify what the behavior should be).

+1, I'd say.

Perhaps we should have both:

- Tests with artificially created mergeinfo (a bit like "unit tests":
we give the merge algorithm some input, and expect certain output; we
don't care about how a user would create that input -- also a bit
comparable to "synthetic benchmarks").

- Tests with a full scenario of user commands (a bit like "integration
tests": we consider the entire application, and treat it like a black
box; we only interact with actual user commands -- a bit comparable to
"real-world benchmarks").


BTW, great effort, Julian, in trying to categorize these scenarios a
bit, and trying to describe the current and desired behaviors, in a
rather concise manner. I agree this is the best way forward right now
...

-- 
Johan

Re: Subtree mergeinfo -- what I learnt at the Hackathon

Posted by Branko Čibej <br...@wandisco.com>.
On 10.07.2012 23:40, Julian Foad wrote:
> I think the essence of this line of thought is:
>
> We set up all of the possible mergeinfo scenarios, and we see what 'merge' does, and we see what 1.7 'merge --reintegrate' does, and we debate what cases are 'supported' versus knowingly 'unsupported', and we see what an ideal symmetric merge would do.  Then we evaluate the current 'symmetric'[1] implementation against that: does it do "the same as" or "better than" or "worse than" what a user can do with the existing merge (assuming he/she chooses the "reintegrate" option appropriately).
>
> It's not easy to set up all of the possible scenarios: reintegrating the root but excluding a subtree, for example, can't be done in a single merge command.  Does that mean that scenario is uninteresting?  No, I don't think so.  What we can do is eliminate the existing merge command from the set-up phases of the test scenarios, and set up the desired mergeinfo directly: "faking it", if you will.  Doing it that way separates the concerns: the question of what the merge command will do, given a certain scenario, is a separate question from how we can use merges to create that scenario.
>
>
> The question I have at the moment is: Does this approach to evaluation make sense to you?

Am I correct in assuming that most of this discussion is a consequence
of the current implementation of mergeinfo inheritance? I.e., that there
are a certain number of hoops one needs to jump through in order to
determine which, if any, mergeinfo applies to a particular file (or
subtree)?

Assuming that retrieving merginfo is expensive (which I gather it is
right now) and you want to minimize the number of explicit mergeinfo
lookups, your approach makes sense to me.

> [1] 'symmetric' -- Urgh! -- we must change the name as the current incarnation is not very symmetric at all.

You can always try --symmetricer ...

-- Brane


-- 
Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download


Re: Subtree mergeinfo -- what I learnt at the Hackathon

Posted by Julian Foad <ju...@btopenworld.com>.
I think the essence of this line of thought is:

We set up all of the possible mergeinfo scenarios, and we see what 'merge' does, and we see what 1.7 'merge --reintegrate' does, and we debate what cases are 'supported' versus knowingly 'unsupported', and we see what an ideal symmetric merge would do.  Then we evaluate the current 'symmetric'[1] implementation against that: does it do "the same as" or "better than" or "worse than" what a user can do with the existing merge (assuming he/she chooses the "reintegrate" option appropriately).

It's not easy to set up all of the possible scenarios: reintegrating the root but excluding a subtree, for example, can't be done in a single merge command.  Does that mean that scenario is uninteresting?  No, I don't think so.  What we can do is eliminate the existing merge command from the set-up phases of the test scenarios, and set up the desired mergeinfo directly: "faking it", if you will.  Doing it that way separates the concerns: the question of what the merge command will do, given a certain scenario, is a separate question from how we can use merges to create that scenario.


The question I have at the moment is: Does this approach to evaluation make sense to you?

 
[1] 'symmetric' -- Urgh! -- we must change the name as the current incarnation is not very symmetric at all.

- Julian

--

Certified & Supported Apache Subversion Downloads: http://www.wandisco.com/subversion/download




----- Original Message -----
> From: Julian Foad <ju...@btopenworld.com>
> To: Paul Burba <pt...@gmail.com>
> Cc: Subversion Development <de...@subversion.apache.org>; C. Michael Pilato <cm...@collab.net>; Branko Čibej <br...@apache.org>; Stefan Sperling <st...@elego.de>
> Sent: Tuesday, 10 July 2012, 12:15
> Subject: Re: Subtree mergeinfo -- what I learnt at the Hackathon
> 
> Hi Paul.  Thanks for your comments.
> 
> Paul Burba wrote:
> 
>>  On Mon, Jul 9, 2012 at 4:16 PM, Julian Foad wrote:
> [...]
>>>     * The merging history of the root 'R' (the root node of the 
>>>  requested merge source and target trees), and of a subtree 'S' 
> (a
>>>  single  node or subtree
>> 
>>  Minor point: I've always considered a single node as a subtree as far
>>  as mergetracking goes; so file merges are "subtree merges". I 
> mention
>>  this only because most of the code and testing comments I've written
>>  over the years assume this.  Again, it's a minor point, but AFAICT we
>>  can largely ignore the node type for this level of conversation.
> 
> Agreed; my parenthesis was meant to clarify this.
> 
>>>   whose merging history differs from that of the root node in a 
> significant 
>>>  way).
>> 
>>  You might want to define this a bit more explicitly.  I'm pretty sure
>>  I know what you mean, but it might not be clear to others.  Something
>>  like: "Mergeinfo[1] differences between the root and the subtree, if
>>  they exist, describe revisions that are inoperative on both the root
>>  and the subtree".
> 
> Yes, something like that.
> 
>>  [1] "Mergeinfo" interpreted here in the most general way, 
> including
>>  explicit mergeinfo, inherited mergeinfo, and implicit mergeinfo (i.e.
>>  natural history).
> 
> [...]
>>>   CATEGORIZING SUBTREE MERGES: 1.7 Non-Reintegrate
>>> 
>>>   These cases are for a reintegrate merge A->B.
>> 
>>  A bit confused, the category is "1.7 Non-Reintegrate" but the 
> cases
>>  are for a reintegrate merge A->B?  Could you clarify?
> 
> Sorry, typo: I meant "These cases are for a non-reintegrate merge 
> A->B."
> 
>>  For example,
>>  given row 3 below, are we talking about the third merge here:
>> 
>>  A--------------------------->
>>    \            ^     \
>>  subtree      /       \
>>    merge    subtree     ?
>>       \     merge        \
>>        V    /             V
>>  B--------------------------->
> 
> Yes, that.
> 
>>  or here?
>> 
>>  A--------------------------->
>>    \            ^       ^
>>  subtree      /       /
>>    merge    subtree   ?
>>       \     merge    /
>>        V    /       /
>>  B--------------------------->
> 
> Not that.
> 
>>>   In each row of this table, up to two Root merges are indicated, and 
> their 
>>>  relative ordering is significant; similarly for Subtree merges.  The 
> ordering of 
>>>  R merges relative to S merges is not significant.
>>> 
>>>       Root     | Subtree   | Behaviour
>>>       ---------+-----------+-----------------------------------
>>>    1. never    | same      | OK (not a subtree scenario)
>>>                +-----------+-----------------------------------
>>>    2.          | [S<] S>   | Merge all needed changes
>>>                +-----------+-----------------------------------
>>>    3.          | [S>] S<   | All needed; & some duplicates in 
> S
>>>       ---------+-----------+-----------------------------------
>>>    4. [R<] R>  | same      | OK (not a subtree scenario)
>>>                +-----------+-----------------------------------
>>>    5.          | never     | Merge all needed changes
>>>    6.          | [S<] S> * |
>>>                +-----------+-----------------------------------
>>>    7.          | [S>] S<   | All needed; & some duplicates in 
> S
>>>       ---------+-----------+-----------------------------------
>>>    8. [R>] R<  | same      | All needed; & some duplicates in 
> R
>>>                |           | (not a subtree scenario)
>>>                +-----------+-----------------------------------
>>>    9.          | none      | All needed; & some duplicates in R
>> 
>>  "none" is the same as "never" in the key right?
> 
> Yes; sorry, just another typo.
> 
>>>   10.          | [S<] S>   |
>>>                +-----------+-----------------------------------
>>>   11.          | [S>] S< * | All needed; & some duplicates in 
> R and S
>>>       ---------+-----------+-----------------------------------
>>> 
>>>   Key:
>>>     *      -- S> not at same revision as R>, or S< not same as 
> R<.
>>>     R>     -- last complete Root merge in direction A->B.
>>>     R<     -- last complete Root merge in direction B->A.
>>>     never  -- never been merged in either direction since the YCA of A 
> and B.
>>>     [S<]   -- shorthand for both of the cases: no 'S<' 
> merge and an 'S<' merge.
>> 
>>  To clarify, do you mean "shorthand for *EITHER* of these cases: no
>>  'S<' merge *OR* an 'S<' merge." right?
> 
> Yes, if you like.  The row expands to two cases: one with no 'S<' in 
> this position, another with 'S<' in this position.  So this row:
> 
>   3.          | [S>] S<   | All needed; & some duplicates in S
> 
> expands to the following two cases:
> 
>   3a.         |      S<   | All needed; & some duplicates in S
> 
> (that is, subtree was never merged from A to B, but was merged from B to A)
> 
>   3b.         |  S>  S<   | All needed; & some duplicates in S
> 
> (that is, the last time the subtree was merged in the A to B direction was 
> before the last time it was merged in the B to A direction).
> 
>>  Either the last complete merge
>>  of the subtree was in the A->B direction or it was in the B->A
>>  direction.
> 
> No, I don't think that describes it.
> 
>> 
>>>     duplicates -- changes that are already present in the target.
>>> 
>>>   Example:  Row 9 represents the case where the Root's last complete 
> 
>>  merge was in the B->A direction, and its last complete A->B merge was 
> 
>>  earlier or never; and the Subtree likewise.  The root's last complete 
> merge 
>>  was before or after but not the same as the subtree's.
>> 
>>  But the last subtree in row 9 is listed as "none"?  Did you mean 
> row 
>>  11?
> 
> Sigh, yes, you're right and I'm confusing.  (I edited the table after 
> writing that.)
> 
>>  I'll hold off on further comments/questions until I get the above 
> straight.
> 
> Thanks.
> - Julian
> 

Re: Subtree mergeinfo -- what I learnt at the Hackathon

Posted by Julian Foad <ju...@btopenworld.com>.
Hi Paul.  Thanks for your comments.

Paul Burba wrote:

> On Mon, Jul 9, 2012 at 4:16 PM, Julian Foad wrote:
[...]
>>    * The merging history of the root 'R' (the root node of the 
>> requested merge source and target trees), and of a subtree 'S' (a
>> single  node or subtree
> 
> Minor point: I've always considered a single node as a subtree as far
> as mergetracking goes; so file merges are "subtree merges". I mention
> this only because most of the code and testing comments I've written
> over the years assume this.  Again, it's a minor point, but AFAICT we
> can largely ignore the node type for this level of conversation.

Agreed; my parenthesis was meant to clarify this.

>>  whose merging history differs from that of the root node in a significant 
>> way).
> 
> You might want to define this a bit more explicitly.  I'm pretty sure
> I know what you mean, but it might not be clear to others.  Something
> like: "Mergeinfo[1] differences between the root and the subtree, if
> they exist, describe revisions that are inoperative on both the root
> and the subtree".

Yes, something like that.

> [1] "Mergeinfo" interpreted here in the most general way, including
> explicit mergeinfo, inherited mergeinfo, and implicit mergeinfo (i.e.
> natural history).

[...]
>>  CATEGORIZING SUBTREE MERGES: 1.7 Non-Reintegrate
>> 
>>  These cases are for a reintegrate merge A->B.
> 
> A bit confused, the category is "1.7 Non-Reintegrate" but the cases
> are for a reintegrate merge A->B?  Could you clarify?

Sorry, typo: I meant "These cases are for a non-reintegrate merge A->B."

> For example,
> given row 3 below, are we talking about the third merge here:
> 
> A--------------------------->
>   \            ^     \
> subtree      /       \
>   merge    subtree     ?
>      \     merge        \
>       V    /             V
> B--------------------------->

Yes, that.

> or here?
> 
> A--------------------------->
>   \            ^       ^
> subtree      /       /
>   merge    subtree   ?
>      \     merge    /
>       V    /       /
> B--------------------------->

Not that.

>>  In each row of this table, up to two Root merges are indicated, and their 
>> relative ordering is significant; similarly for Subtree merges.  The ordering of 
>> R merges relative to S merges is not significant.
>> 
>>      Root     | Subtree   | Behaviour
>>      ---------+-----------+-----------------------------------
>>   1. never    | same      | OK (not a subtree scenario)
>>               +-----------+-----------------------------------
>>   2.          | [S<] S>   | Merge all needed changes
>>               +-----------+-----------------------------------
>>   3.          | [S>] S<   | All needed; & some duplicates in S
>>      ---------+-----------+-----------------------------------
>>   4. [R<] R>  | same      | OK (not a subtree scenario)
>>               +-----------+-----------------------------------
>>   5.          | never     | Merge all needed changes
>>   6.          | [S<] S> * |
>>               +-----------+-----------------------------------
>>   7.          | [S>] S<   | All needed; & some duplicates in S
>>      ---------+-----------+-----------------------------------
>>   8. [R>] R<  | same      | All needed; & some duplicates in R
>>               |           | (not a subtree scenario)
>>               +-----------+-----------------------------------
>>   9.          | none      | All needed; & some duplicates in R
> 
> "none" is the same as "never" in the key right?

Yes; sorry, just another typo.

>>  10.          | [S<] S>   |
>>               +-----------+-----------------------------------
>>  11.          | [S>] S< * | All needed; & some duplicates in R and S
>>      ---------+-----------+-----------------------------------
>> 
>>  Key:
>>    *      -- S> not at same revision as R>, or S< not same as R<.
>>    R>     -- last complete Root merge in direction A->B.
>>    R<     -- last complete Root merge in direction B->A.
>>    never  -- never been merged in either direction since the YCA of A and B.
>>    [S<]   -- shorthand for both of the cases: no 'S<' merge and an 'S<' merge.
> 
> To clarify, do you mean "shorthand for *EITHER* of these cases: no
> 'S<' merge *OR* an 'S<' merge." right?

Yes, if you like.  The row expands to two cases: one with no 'S<' in this position, another with 'S<' in this position.  So this row:

  3.          | [S>] S<   | All needed; & some duplicates in S

expands to the following two cases:

  3a.         |      S<   | All needed; & some duplicates in S

(that is, subtree was never merged from A to B, but was merged from B to A)

  3b.         |  S>  S<   | All needed; & some duplicates in S

(that is, the last time the subtree was merged in the A to B direction was before the last time it was merged in the B to A direction).

> Either the last complete merge
> of the subtree was in the A->B direction or it was in the B->A
> direction.

No, I don't think that describes it.

> 
>>    duplicates -- changes that are already present in the target.
>> 
>>  Example:  Row 9 represents the case where the Root's last complete 
> merge was in the B->A direction, and its last complete A->B merge was 
> earlier or never; and the Subtree likewise.  The root's last complete merge 
> was before or after but not the same as the subtree's.
> 
> But the last subtree in row 9 is listed as "none"?  Did you mean row 
> 11?

Sigh, yes, you're right and I'm confusing.  (I edited the table after writing that.)

> I'll hold off on further comments/questions until I get the above straight.

Thanks.
- Julian

Re: Subtree mergeinfo -- what I learnt at the Hackathon

Posted by Paul Burba <pt...@gmail.com>.
On Mon, Jul 9, 2012 at 4:16 PM, Julian Foad <ju...@btopenworld.com> wrote:
> To move forward and decide what behaviour is right, we need to be able to compare the 1.7 behaviour with the proposed behaviour in *specific* scenarios.  So we need to be able to enumerate the specific scenarios that we mean by the general term "merging with subtree mergeinfo".  This is what I am doing currently.
>
> The twist is that the best way to enumerate all the possibilities for 1.7 merges, and the best way to enumerate all the possibilities for an ideal symmetric merge, are different.  For example, the 1.7 non-reintegrate merge (let's say from branch A to branch B) doesn't look at any mergeinfo indicating merges from B to A, so the primary way to categorize those cases is by their last complete A->B merge (of the root of the merge, and of a subtree).  B->A merges can also affect the result if performed after the last A->B merge, so we categorize secondarily by their B->A mergeinfo.  By contrast, an ideal symmetric merge only cares about the last complete merge (A->B or B->A); any earlier merges in the other direction make no difference at all.
>
> After separately enumerating the 1.7 cases and the (ideal) symmetric merge cases, we can split or combine categories as necessary to merge these cases into a single list.  Then we can decide what constitutes "same" or "backward-compatible" behaviour in each case.  Don't be suspicious -- I'm not trying to twist the word "compatible" to mean something else -- rather, what I mean is that not all of the possible scenarios are ones in which the 1.7 behaviour is "good".  For example, we know that in 1.7 if the last complete merge was A->B (in r10, say) and then you reintegrate a subtree (B/foo -> A/foo in r20), then try to sync A->B again, Subversion will not notice that r20 merge and will attempt to re-merge r19:20 of A/foo into B/foo, which "works" only in rather trivial cases (due to auto-resolving of some types of duplicate change) and I trust we can agree it is in general wrong.
>
> So far, I've only been compiling this categorization from my head; the next step is to write tests to confirm or correct this.
>
>
> CATEGORIZING SUBTREE MERGES
>
> For the purposes of this categorization, we consider:
>
>   * The merging history of the root 'R' (the root node of the requested merge source and target trees), and of a subtree 'S' (a single node or subtree

Minor point: I've always considered a single node as a subtree as far
as mergetracking goes; so file merges are "subtree merges".  I mention
this only because most of the code and testing comments I've written
over the years assume this.  Again, it's a minor point, but AFAICT we
can largely ignore the node type for this level of conversation.

> whose merging history differs from that of the root node in a significant way).

You might want to define this a bit more explicitly.  I'm pretty sure
I know what you mean, but it might not be clear to others.  Something
like: "Mergeinfo[1] differences between the root and the subtree, if
they exist, describe revisions that are inoperative on both the root
and the subtree".

[1] "Mergeinfo" interpreted here in the most general way, including
explicit mergeinfo, inherited mergeinfo, and implicit mergeinfo (i.e.
natural history).

> Only one subtree is considered; multiple subtrees are assumed to be handled independently, even if they are nested (such as root 'A', subtree 'A/D' whose history differs from 'A', and subtree 'A/D/foo' whose history differs from 'A/D').
>
>   * The "last complete merge" of the Root (in one or both directions), and the "last complete merge" of the Subtree (in one or both directions).  The "last complete merge" in direction A->B means the last revision R for which all revisions on A, up to and including R, are (currently) recorded as having been merged from A to B.  This state could have been reached through any kind of merge or sequence of merges; all that matters is what the current mergeinfo says has been merged.
>
>   * There is an assumption that the actual content changes were in fact merged in accordance with what the mergeinfo says, subject to any editing that was required to resolve any conflicts detected by the merge process and any semantic conflicts.
>
>
> CATEGORIZING SUBTREE MERGES: 1.7 Non-Reintegrate
>
> These cases are for a reintegrate merge A->B.

A bit confused, the category is "1.7 Non-Reintegrate" but the cases
are for a reintegrate merge A->B?  Could you clarify?  For example,
given row 3 below, are we talking about the third merge here:

A--------------------------->
  \            ^     \
 subtree      /       \
  merge    subtree     ?
     \     merge        \
      V    /             V
B--------------------------->

or here?

A--------------------------->
  \            ^       ^
 subtree      /       /
  merge    subtree   ?
     \     merge    /
      V    /       /
B--------------------------->

> In each row of this table, up to two Root merges are indicated, and their relative ordering is significant; similarly for Subtree merges.  The ordering of R merges relative to S merges is not significant.
>
>     Root     | Subtree   | Behaviour
>     ---------+-----------+-----------------------------------
>  1. never    | same      | OK (not a subtree scenario)
>              +-----------+-----------------------------------
>  2.          | [S<] S>   | Merge all needed changes
>              +-----------+-----------------------------------
>  3.          | [S>] S<   | All needed; & some duplicates in S
>     ---------+-----------+-----------------------------------
>  4. [R<] R>  | same      | OK (not a subtree scenario)
>              +-----------+-----------------------------------
>  5.          | never     | Merge all needed changes
>  6.          | [S<] S> * |
>              +-----------+-----------------------------------
>  7.          | [S>] S<   | All needed; & some duplicates in S
>     ---------+-----------+-----------------------------------
>  8. [R>] R<  | same      | All needed; & some duplicates in R
>              |           | (not a subtree scenario)
>              +-----------+-----------------------------------
>  9.          | none      | All needed; & some duplicates in R

"none" is the same as "never" in the key right?

> 10.          | [S<] S>   |
>              +-----------+-----------------------------------
> 11.          | [S>] S< * | All needed; & some duplicates in R and S
>     ---------+-----------+-----------------------------------
>
> Key:
>   *      -- S> not at same revision as R>, or S< not same as R<.
>   R>     -- last complete Root merge in direction A->B.
>   R<     -- last complete Root merge in direction B->A.
>   never  -- never been merged in either direction since the YCA of A and B.
>   [S<]   -- shorthand for both of the cases: no 'S<' merge and an 'S<' merge.

To clarify, do you mean "shorthand for *EITHER* of these cases: no
'S<' merge *OR* an 'S<' merge." right?  Either the last complete merge
of the subtree was in the A->B direction or it was in the B->A
direction.

>   duplicates -- changes that are already present in the target.
>
> Example:  Row 9 represents the case where the Root's last complete merge was in the B->A direction, and its last complete A->B merge was earlier or never; and the Subtree likewise.  The root's last complete merge was before or after but not the same as the subtree's.

But the last subtree in row 9 is listed as "none"?  Did you mean row 11?

I'll hold off on further comments/questions until I get the above straight.

-- 
Paul T. Burba
CollabNet, Inc. -- www.collab.net -- Enterprise Cloud Development
Skype: ptburba

P.S. Home Toronto is treating you right -- I assume the move was
successful since you're back on this thread ;-)

> CATEGORIZING SUBTREE MERGES: 1.7 Reintegrate
>
> These cases are for a reintegrate merge B->A.
>
> Treat this table as just a rough first draft for now; I'm not sure if this is the best way to categorize the reintegrate cases, and I need to investigate this more thoroughly and test it.
>
>     Root     | Subtree   | Behaviour
>     ---------+-----------+-----------------------------------
>  1. never    | same      | OK (not a subtree scenario)
>              +-----------+-----------------------------------
>  2.          | [S<] S>   | Rejected
>  3.          |    S<     | Rejected??
>  4.          |  S>  S<   | Rejected
>     ---------+-----------+-----------------------------------
>  5. [R<] R>  | same      | OK (not a subtree scenario)
>              +-----------+-----------------------------------
>  6.          | never     | Rejected
>  7.          | [S<] S> * | Rejected
>  8.          |    S<     | Rejected??
>  9.          |  S>  S<   | Rejected
>     ---------+-----------+-----------------------------------
> 10. [R>] R<  | same      | ?? (not a subtree scenario)
>              +-----------+-----------------------------------
> 11.          | never     | Rejected?? )
> 12.          | [S<] S>   | Rejected   > non-reint. cases
> 13.          | [S>] S<   | Rejected   )
>     ---------+-----------+-----------------------------------
>
>
> CATEGORIZING SUBTREE MERGES: Ideal Symmetric Merge
>
> Now we consider the ideal symmetric merge (not what's currently implemented).
>
> This is primarily concerned with the last complete merge of the root, in whichever direction that was, and similarly the last complete merge of the subtree.  The (earlier) last complete merge of the root in the other direction is not significant, nor is that of the subtree.
>
>     Root  | Subtree       | Ideal behaviour
>     ------+---------------+----------------------------------
>  1.       | same          | OK (not a subtree scenario)
>     R> or +---------------+----------------------------------
>  2. never | S> later      | Merge all needed changes
>  3.       | S< later      |
>     ------+---------------+
>  4. R>    | never         |
>  5.       | S> earlier    |
>  6.       | S< earlier    |
>     ------+---------------+----------------------------------
>  7.       | same          | OK (not a subtree scenario)
>     R<    +---------------+----------------------------------
>  8.       | S> later      | Merge all needed changes
>  9.       | S< later      |
> 10.       | never         |
> 11.       | S> earlier    |
> 12.       | S< earlier    |
>     ------+---------------+----------------------------------
>
> Key:
>   R>       The last complete merge of Root was in the A->B direction.
>   R<       The last complete merge of Root was in the B->A direction.
>   never    Has never been merged between A & B in either direction.
>   same     Same time and same direction as last complete merge of Root.
>   earlier  Earlier than the last complete merge of the Root.
>   later    Later than the last complete merge of the Root.
>
>
> I'm working on integarting these tables, and also on writing tests (or rather scenarios using the test suite framework).
>
> Please let me know if this seems like the approach we need, or any other thoughts.
>
> - Julian
> --
> Certified & Supported Apache Subversion Downloads: http://www.wandisco.com/subversion/download
>
>
>
> I (Julian Foad) wrote:
>> [...] I'm not at all demanding we break backward
>> compatibility.  Sorry if it sounded like it.  I'm just saying that
>> we're proposing to change the behaviour of the plain merge command,
>> and in doing that we need to work out what the details of the new
>> behaviour will be, and this thread is helping us to do just that.
>> I ended up with a bias towards trying to move toward a more rename-
>> friendly approach, but I recognise we can't get there yet so the
>> "follow each node's own ancestry" idea is just an idea for the
>> future.  We need a simpler approach for now.
>>
>> Stefan Sperling wrote:
> [...]
>>> Agreed. Ideally, the symmetric merge will support all currently supported
>>> use cases, without throwing errors at users or requiring new command-line
>>> switches.
>>>
>>> I haven't yet made up my mind about interim measures for 1.8 though.
>>> I suppose if symmetric merge won't support all currently supported use cases
>>> in 1.8, we could keep the --symmetric option in place for 1.8, and drop it
>>> in 1.9 or later once the symmetric merge code can handle all use cases?