You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Stefan Fuhrmann <st...@wandisco.com> on 2012/07/12 11:17:24 UTC

Incidence of criss-cross merges

Hi all,

In Berlin, Julian raised the question how relevant the criss-cross
merge case actually. I think I found a reasonable merge policy
where those cases become the norm rather than an exception.

Most people seem to do what one might call "unqualified" catch-up
merges, i.e. "merge everything up to HEAD" regardless of HEAD's
state with respect to stability, features, side-effects etc.

>From a process perspective, it seems much more prudent to
do "qualified" merges like "merge from /trunk up to the last
fully tested nightly build revision" and "merge from branch up
to the point that I think is safe". In both directions, there will
be changes between the catch-up source from A to B and
the merge commit form B to A (and vice versa). Even if it was
the same person doing the merge in both directions, this
situation could not be avoided.

Am I missing something or is that analysis correct? If it is,
criss-cross issues should be about as common as conflicts
themselves (depending on the relative size of to-be-merged
to  not-yet-to-be-merged history).

-- Stefan^2.

-- 
Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download

Re: Incidence of criss-cross merges

Posted by Julian Foad <ju...@btopenworld.com>.

Stefan Fuhrmann wrote:
> Julian Foad wrote:
[...]
>> At this point, I think the issue boils down to:
>> 
>>   * Yes, the criss-cross situation can be produced in plausible real life usage.
>> 
>>   * The current handling is sub-optimal, though not worse than we've had before.
>> 
>>   * We might want to look at more sophisticated ways to handle it.
>> 
>>   * We might want to look at ways to help users avoid the situation.
> 
> The last part has been my concern but I think the following
> policy should work and be feasible in most environments that
> care about qualified merges. Assume we want to update "branch"
> from "trunk" and vice versa.
> 
> * before merging, bring HEAD of branch into a working state
> 
> * catch-up merge from latest known-good rev of "trunk" to "branch"
>   in revision R1
> 
> * consolidate "branch" and bring it to a known-good state
> 
> * merge from latest known-good revision R2 "branch" to "trunk"

So, like this (for those able to read the mail in a fixed-width type face):

  trunk
    --o-.....-o-.........-o-...
       \       \         /
        \       \       /
         o-.....-o-...-o-......
  branch        R1    R2

> Ensuring R2 > R1 prevents the criss-cross.

Strictly, R2 >= R1.

> R2 does not need
> to be HEAD, i.e. the user may run expensive validation on
> "branch" to qualify it as known-good.

Yes.

> So, merging from "branch" to "trunk" will become harder and
> there will be fewer such merges. Catching up with "trunk" is
> not affected.

No, the problem is symmetrical: catching up with "trunk" is affected in just the same way as merging back the other way.

There are two ways to enforce such a policy.

  * Using only Subversion mechanisms, a pre-commit hook can reject a commit that would create a criss-crossing merge if committed, as defined above.

  * If we have a working environment that knows when any merge is being performed between these two branches, the tooling could complain if a user starts a merge where the source revision R2 is less than R1 (where R1 is the committed revision of the last merge in the opposite direction) ...

  trunk         R0
    --o-........-o-........
       \          \  ___...
        \          \/
         \         /\
          \       /  \
           o-...-o-..-o-...
  branch        R2   R1

    ... and also complain if the user starts a merge while a merge in the opposite direction has been 
started but not yet committed.

  trunk         R0
    --o-........-o-........
       \          \  ___...
        \          \/
         \         /\___...
          \       /
           o-...-o-........
  branch        R2

(I label the source revision of the first merge as "R0" and show it at the same time-position as R2, but in fact it doesn't matter whether R0 is before, at the same revision as or after R2.)

- Julian

Re: Incidence of criss-cross merges

Posted by Stefan Fuhrmann <st...@wandisco.com>.

On Fri, Jul 13, 2012 at 7:11 PM, Julian Foad <ju...@btopenworld.com>wrote:

<snip>

> As far as I have been able to discover, there is no generally agreed
> solution to the problem.  Most of the references I can find are, like [3],
> a commentary on one or more of the attempts that have been made to find a
> better solution.  I found one summary [2] of how different merge algorithms
> attempt to handle the criss-cross scenario.  It is a bit too light on
> detail, but it does mention some interesting solutions and their
> limitations.
>
> At this point, I think the issue boils down to:
>
>   * Yes, the criss-cross situation can be produced in plausible real life
> usage.
>
>   * The current handling is sub-optimal, though not worse than we've had
> before.
>
>   * We might want to look at more sophisticated ways to handle it.
>
>   * We might want to look at ways to help users avoid the situation.
>

The last part has been my concern but I think the following
policy should work and be feasible in most environments that
care about qualified merges. Assume we want to update "branch"
from "trunk" and vice versa.

* before merging, bring HEAD of branch into a working state

* catch-up merge from latest known-good rev of "trunk" to "branch"
  in revision R1

* consolidate "branch" and bring it to a known-good state

* merge from latest known-good revision R2 "branch" to "trunk"

Ensuring R2 > R1 prevents the criss-cross. R2 does not need
to be HEAD, i.e. the user may run expensive validation on
"branch" to qualify it as known-good.

So, merging from "branch" to "trunk" will become harder and
there will be fewer such merges. Catching up with "trunk" is
not affected.

-- Stefan^2.

-- 
Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download

Re: Incidence of criss-cross merges

Posted by Julian Foad <ju...@btopenworld.com>.

Stefan Fuhrmann wrote:
> On Thu, Jul 12, 2012 at 3:43 PM, Julian Foad wrote:
>> Stefan Fuhrmann wrote:
>>> From a process perspective, it seems much more prudent to
>>> do "qualified" merges like "merge from /trunk up to the last
>>> fully tested nightly build revision" and "merge from branch up
>>> to the point that I think is safe".
>> 
>> Yes, that makes sense.
[...]
>>  A = /trunk
>>  Tested versions   *                   *
>>  A o---o---o---o---o---o---o---o---o---o---o---o---o
>>     \               \____     /
>>      \               ____\ __/
>>       \             /     \
>>  B     o---o---o---o---o---o---o---o---o---o---o
>>  Safe versions     *                   *
>>                    1       2   3
>>
>>  Sequence of events:
>>  1. Nightly testing.
>>  2. Catch up branch from latest stable trunk.
>>  3. Reintegrate branch to trunk, from latest tested version of branch.
>>
>> That sort of scenario?
> 
> Exactly. The thing is that at this point the graph does
> not even depend on whether the changes to a given
> node actually produce a conflict. If the history processing
> code had any problems with this kind of merge graph,
> that problem would surface more often than actual conflicts.

Let's first clarify for other readers that we're not concerned with how the two criss-crossing merges themselves work: each of those is a straightforward merge and the fact that they cross over is not relevant to either of them.  We might want to build some sort of system for detecting this criss-crossing as it happens, with the aim of alerting the user or preventing the second merge from being committed, but that is not what we are talking about here.  Rather, we assume those two criss-crossing merges have already happened, and we are talking now about how a third merge (afterwards) will behave.

Currently, the "symmetric" merge code treats a criss-cross merge by choosing the later of the two possible common ancestors.  In fact, this is just a fall-out from choosing the youngest common ancestor: it doesn't currently fetch enough information to notice that the latest merges in each direction cross over.

The result of choosing one of the common ancestors will be a sub-optimal merge.  Typically, all needed changes are merged, but some unneeded changes are merged too (changes that had already been made in the target branch, either originally or through an earlier merge).  Therefore the merge will, in general, have "spurious" conflicts due to making the same change again.  In simple cases the duplicate changes may be auto-resolved: in particular, Subversion's internal text merge auto-resolves duplicate hunks of text added to a file, and Subversion auto-resolves duplicate added properties, but that is only going to happen when such additions were clean additions.

There may be other ways in which the result is sub-optimal, such as failing to report a conflict and just choosing one of the alternatives, where the user really should have been given a conflict to resolve.  Note also that "sub-optimal" doesn't necessarily mean "just a bit worse that ideal", it can mean "unusably bad" in some cases.

It so happens that in trying to make automated test scenarios for all possible merges [1], I am running in to the issue of criss-cross merges.  In preparation for attempting a merge from A to B after a scenario such as this one:

      A1      A2
    ---o---x---o------  mergeinfo = B:1
   /      /      \
   \     /        \
    ---o-------o---x--  mergeinfo = A:1-2
      B1      B2

my latest attempt to fake the scenario does this:

      A1      A2  AF    where AF makes changes B1
    ---o-------o---o--           and sets mergeinfo = B:1
   /
   \
    ---o-------o---o--  where BF makes change A1,A2
      B1      B2  BF             and sets mergeinfo = A:1-2

but I have just realized (after seeing unexpected conflicts in the following merge) that that actually represents a criss-cross merge:

      A1      A2
    ---o-------o---x--  mergeinfo = B:1-2
   /      ______\ _/
   \     /       \
    ---o-------o---x--  mergeinfo = A:1
      B1      B2

The end points (the tips of the two branches) are in the same state as what I was trying to achieve (the first diagram), but the difference is that the state labaled as 'A2' does not have change 'B1' in it in this second case.  The merge code will use that state as the 'base' of the three-way merge, and thus the result it produces will be different from the result produced by a merge after the scenario set up in the first diagram.  The state of A2 is the only relevant difference between the non-criss-cross (first diagram) and criss-cross (third diagram) scenarios; the mergeinfo on A2 will also differ and the states and mergeinfo of some other nodes may also differ, but those other differences are not noticed by the current algorithms ("symmetric" and 1.7 non-reintegrate, that is; 1.7 reintegrate would look at different nodes).

As far as I have been able to discover, there is no generally agreed solution to the problem.  Most of the references I can find are, like [3], a commentary on one or more of the attempts that have been made to find a better solution.  I found one summary [2] of how different merge algorithms attempt to handle the criss-cross scenario.  It is a bit too light on detail, but it does mention some interesting solutions and their limitations.

At this point, I think the issue boils down to:

  * Yes, the criss-cross situation can be produced in plausible real life usage.

  * The current handling is sub-optimal, though not worse than we've had before.

  * We might want to look at more sophisticated ways to handle it.

  * We might want to look at ways to help users avoid the situation.

References: 

[1] See the thread  "Subtree mergeinfo -- what I learnt..." on this list.

[2] "CrissCrossMerge" on revctrl.org:
<http://revctrl.org/CrissCrossMerge?action=recall&rev=34>
(Sadly, the revctrl.org Wiki is full of spam; you have to look at old versions of the pages to find the relevant content.)

[3] "Does Criss-Cross Merge Matter?" on vestasys.org:
<http://wiki.vestasys.org/MergingFuture/Food4Thought/TrickyCases#head-5ef046f558b5a8924506ec82d315e2ca7b9782a8>

- Julian
--
Certified & Supported Apache Subversion Downloads: http://www.wandisco.com/subversion/download

Re: Incidence of criss-cross merges

Posted by Stefan Fuhrmann <st...@wandisco.com>.

On Thu, Jul 12, 2012 at 3:43 PM, Julian Foad <ju...@btopenworld.com>wrote:

> Stefan Fuhrmann wrote:
> > In Berlin, Julian raised the question how relevant the criss-cross
> > merge case actually. I think I found a reasonable merge policy
> > where those cases become the norm rather than an exception.
> >
> > Most people seem to do what one might call "unqualified" catch-up
> > merges, i.e. "merge everything up to HEAD" regardless of HEAD's
> > state with respect to stability, features, side-effects etc.
> >
> > From a process perspective, it seems much more prudent to
> > do "qualified" merges like "merge from /trunk up to the last
> > fully tested nightly build revision" and "merge from branch up
> > to the point that I think is safe".
>
> Yes, that makes sense.
>
> > In both directions, there will
> > be changes between the catch-up source from A to B and
> > the merge commit form B to A (and vice versa).
>
> I don't quite follow that last sentence.
>

Taking your diagram, I was trying to say that there are revisions
between the qualified / starred revisions and the revisions where
the merges got committed.

> Even if it was
> > the same person doing the merge in both directions, this
> > situation could not be avoided.
>
>
> So, in a bit more detail, the situation is like this:
>
>   A = /trunk  Tested versions   *                   *
>
>   A o---o---o---o---o---o---o---o---o---o---o---o---o
>      \               \____     /
>
>       \               ____\ __/
>
>        \             /     \
>
>   B     o---o---o---o---o---o---o---o---o---o---o
>   Safe versions     *                   *
>
>                     1       2   3
>
>   Sequence of events:
>   1. Nightly testing.
>
>   2. Catch up branch from latest stable trunk.
>   3. Reintegrate branch to trunk, from latest tested version of branch.
>
> That sort of scenario?
>

Exactly. The thing is that at this point the graph does
not even depend on whether the changes to a given
node actually produce a conflict. If the history processing
code had any problems with this kind of merge graph,
that problem would surface more often than actual conflicts.

-- Stefan^2.

-- 
Certified & Supported Apache Subversion Downloads:
http://www.wandisco.com/subversion/download

Re: Incidence of criss-cross merges

Posted by Julian Foad <ju...@btopenworld.com>.

Stefan Fuhrmann wrote:
> In Berlin, Julian raised the question how relevant the criss-cross
> merge case actually. I think I found a reasonable merge policy
> where those cases become the norm rather than an exception.
> 
> Most people seem to do what one might call "unqualified" catch-up
> merges, i.e. "merge everything up to HEAD" regardless of HEAD's
> state with respect to stability, features, side-effects etc.
> 
> From a process perspective, it seems much more prudent to
> do "qualified" merges like "merge from /trunk up to the last
> fully tested nightly build revision" and "merge from branch up
> to the point that I think is safe".

Yes, that makes sense.

> In both directions, there will
> be changes between the catch-up source from A to B and
> the merge commit form B to A (and vice versa).

I don't quite follow that last sentence.

> Even if it was
> the same person doing the merge in both directions, this
> situation could not be avoided.


So, in a bit more detail, the situation is like this:

  A = /trunk  Tested versions   *                   *

  A o---o---o---o---o---o---o---o---o---o---o---o---o
     \               \____     /

      \               ____\ __/

       \             /     \

  B     o---o---o---o---o---o---o---o---o---o---o
  Safe versions     *                   *

                    1       2   3

  Sequence of events:
  1. Nightly testing.

  2. Catch up branch from latest stable trunk.
  3. Reintegrate branch to trunk, from latest tested version of branch.

That sort of scenario?


- Julian


> Am I missing something or is that analysis correct? If it is,
> criss-cross issues should be about as common as conflicts
> themselves (depending on the relative size of to-be-merged
> to  not-yet-to-be-merged history).