You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Andy Singleton <an...@assembla.com> on 2011/07/11 17:46:11 UTC

It's time to fix Subversion Merge

Many developers are moving from Subversion to other SCM systems that
have better merge capabilities. I have posted an article with a proposal
to fix this problem, here:

http://blog.assembla.com/assemblablog/tabid/12618/bid/58122/It-s-Time-to-Fix-Subversion-Merge.aspx

SUMMARY
I propose to add a “newmerge” command, which will be compatible with
existing svn servers and clients. It will be different from the existing
merge command in the following ways:

* It has a simple form “newmerge <from source>”, and a cherrypick form
“newmerge <fromsource> <specific changes>. This pulls new changes from
the source into the working copy, and merges them as automatically as
possible. Changes can travel through multiple branches or repositories.
The merge should work the same way, and work reliably, from branch to
trunk, from branch to branch, or from trunk to branch.

* We eliminate instructions like merge ranges, “reintegrate”, and
“depth”. They add complexity and opportunities for human error.

* It does not support subtree merges, merges to mixed working copies, or
subtree/file merginfo. I think those cases cause a lot of complexity and
are usually unwise. If you want to work on a subtree, then you can
branch or clone the subtree and merge it back to the complete tree.

* It can merge from foreign repositories, and track changes as they move
through foreign repositories (eg clones). This is a common case in
modern workflows. Changes have a global ID.

* It will track mergeinfo (“merge_history”) in a versioned file that is
committed into the branch. We want to have room to save any information
that we need. It will have an extensible data structure, so that we can
continuously improve this type of merge.

I think that we can build a newmerge prototype by stripping down the
existing merge to remove the subtree options, and moving to the
extensible merginfo format. It will be useful to get advice about this
from experienced team members.

--
Andy Singleton
Founder/CEO, Assembla: http://www.assembla.com
Phone: 781-328-2241
Skype: andysingleton

Re: It's time to fix Subversion Merge

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Mark Phippard wrote on Mon, Jul 11, 2011 at 12:37:52 -0400:
> On Mon, Jul 11, 2011 at 12:29 PM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> > Mark Phippard wrote on Mon, Jul 11, 2011 at 12:09:51 -0400:
> >> On Mon, Jul 11, 2011 at 11:46 AM, Andy Singleton <an...@assembla.com> wrote:
> >> > * It can merge from foreign repositories, and track changes as they move
> >> > through foreign repositories (eg clones). This is a common case in modern
> >> > workflows. Changes have a global ID.
> >>
> >> Changes have a global ID?  This sounds like you are proposing a
> >> product that does not exist?
> >
> > <repository UUID, revision number, branch root>
> 
> Are you proposing a user enter that format to merge a revision?

Users enter the URL.  If that URL refers to a foreign repository, then
the UUID of that repository could be stored in the svn:mergeinfo2
property.

Re: It's time to fix Subversion Merge

Posted by Mark Phippard <ma...@gmail.com>.

On Mon, Jul 11, 2011 at 12:29 PM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> Mark Phippard wrote on Mon, Jul 11, 2011 at 12:09:51 -0400:
>> On Mon, Jul 11, 2011 at 11:46 AM, Andy Singleton <an...@assembla.com> wrote:
>> > * It can merge from foreign repositories, and track changes as they move
>> > through foreign repositories (eg clones). This is a common case in modern
>> > workflows. Changes have a global ID.
>>
>> Changes have a global ID?  This sounds like you are proposing a
>> product that does not exist?
>
> <repository UUID, revision number, branch root>

Are you proposing a user enter that format to merge a revision?

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

Re: It's time to fix Subversion Merge

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Mark Phippard wrote on Mon, Jul 11, 2011 at 12:09:51 -0400:
> On Mon, Jul 11, 2011 at 11:46 AM, Andy Singleton <an...@assembla.com> wrote:
> > * It can merge from foreign repositories, and track changes as they move
> > through foreign repositories (eg clones). This is a common case in modern
> > workflows. Changes have a global ID.
> 
> Changes have a global ID?  This sounds like you are proposing a
> product that does not exist?

<repository UUID, revision number, branch root>

Re: It's time to fix Subversion Merge

Posted by Mark Phippard <ma...@gmail.com>.

On Mon, Jul 11, 2011 at 11:46 AM, Andy Singleton <an...@assembla.com> wrote:
>  Many developers are moving from Subversion to other SCM systems that have
> better merge capabilities. I have posted an article with a proposal to fix
> this problem, here:
>
> http://blog.assembla.com/assemblablog/tabid/12618/bid/58122/It-s-Time-to-Fix-Subversion-Merge.aspx

Sorry, but I think this post is pretty hand-wavy in that it lacks much
substance or detail.  It frames itself around the notion that merge is
the problem and that fixing the problem is as simple as making a new
command that does not allow the complex options of the old command,
but it ignores the fact that almost everywhere the proposal provides
some details it is describing what would have to be a totally new
product or redesign of core SVN.

> SUMMARY
> I propose to add a “newmerge” command, which will be compatible with
> existing svn servers and clients. It will be different from the existing
> merge command in the following ways:
>
> * It has a simple form “newmerge <from source>”

SVN already has this.

>, and a cherrypick form  “newmerge <fromsource> <specific changes>.

It also has this, although "specific changes" is pretty vague.  So
hard to say for certain.

> This pulls new changes from the
> source into the working copy, and merges them as automatically as possible.
> Changes can travel through multiple branches or repositories. The merge
> should work the same way, and work reliably, from branch to trunk, from
> branch to branch, or from trunk to branch.
>
> * We eliminate instructions like merge ranges,

You want to eliminate ranges?  Then how do you propose to implement
cherry picks?

> “reintegrate”, and “depth”.
> They add complexity and opportunities for human error.
>
> * It does not support subtree merges, merges to mixed working copies, or
> subtree/file merginfo. I think those cases cause a lot of complexity and are
> usually unwise. If you want to work on a subtree, then you can branch or
> clone the subtree and merge it back to the complete tree.

These things definitely can make merge more complex and it probably
would have been good if we made some of the decisions back during 1.5
development, but most of the issues around them have been solved now.
So this actually gives SVN some fairly powerful merge capabilities.
In your blog post you talk about people coming from tools with
powerful merge but then propose a solution that removes the power?

How does having a new merge command change the fact that some users
might use the old merge command and create these kinds of merge issues
in your repository?  Obviously you have to invent some kind of way for
the server to dictate the types of merge you want to support, in which
case you do not need a new command at all and the current merge
command could provide what you ask for as is.  In other words, the
server could tell the current merge command not to allow these
options.

> * It can merge from foreign repositories, and track changes as they move
> through foreign repositories (eg clones). This is a common case in modern
> workflows. Changes have a global ID.

Changes have a global ID?  This sounds like you are proposing a
product that does not exist?

> * It will track mergeinfo (“merge_history”) in a versioned file that is
> committed into the branch. We want to have room to save any information that
> we need. It will have an extensible data structure, so that we can
> continuously improve this type of merge.

Why get into this level of implementation detail here when everything
else is so vague.  The current mergeinfo property handling has been
annoying, but that has been fixed in 1.7.

> I think that we can build a newmerge prototype by stripping down the
> existing merge to remove the subtree options, and moving to the extensible
> merginfo format. It will be useful to get advice about this from experienced
> team members.

There are two remaining problems in SVN merge:

1) The need for a --reintegrate option, and the limitations that come with it
2) The lack of support for automatically handling moved/renamed paths

Neither of these are failures in the merge code they are failures in
the core design of SVN.  You are not going to be able to solve these
problems with a new merge command, you are going to need to a new
design for SVN or some major changes in current design.  Again, once
that is done, there is no reason the current merge cannot take that
in.

I hope you are planning to devote some resources to this, I think that
would be great.  And I look forward to seeing a prototype move
forward.  I think you are approaching this wrong, but that should not
stop you and maybe some new ideas will come out of it.  I would like
to see more details though and more of an acknowledgement of how the
core SVN design is to be addressed to solve these issues.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

Re: It's time to fix Subversion Merge

Posted by Greg Hudson <gh...@MIT.EDU>.

On Mon, 2011-07-11 at 12:48 -0400, Mark Phippard wrote:
> 2. Subversion does not handle move/rename well (tree conflicts)
[...]
> When this problem was first approached (before we came up
> with tree conflicts) it hit a brick wall where it seemed a new
> repository design was needed:

It's worth considering that git has a reputation for good merge support
even though it has no commit-time copy/rename history whatsoever in its
history model.  By contrast, bzr paid a lot of attention to container
history and merge support in the face of tree reorgs, and it clearly
isn't as much of a killer feature as its designers had expected
(http://www.markshuttleworth.com/archives/123).

So, one possible way forward is to decide that copy history is just a
hint for "svn log" and that merging should ignore it.

Re: It's time to fix Subversion Merge

Posted by Stefan Sperling <st...@elego.de>.

On Mon, Jul 11, 2011 at 12:48:27PM -0400, Mark Phippard wrote:
> If you want to fix Subversion merge there are two issues that have to
> be addressed.  If you are not addressing these issues you are just
> rearranging the deck chairs on the Titanic.
> 
> 1. Cyclic merges (the reason we added --reintegrate).
> 
> http://subversion.tigris.org/issues/show_bug.cgi?id=2837
> 
> This is really a core design issue in SVN that is going to be hard to
> work around.
>
> kamesh explored some new merge algorithm ideas in:
> 
> http://subversion.tigris.org/issues/show_bug.cgi?id=2897
> 
> He may or may not have been on the right track.  The solution is just
> so complicated no one was able to really review it.

I believe that conflict-free cyclic merges are never going to work
in any version control system, in the general case.

I have no proper theory to back this up. I just have a hunch that
no algorithm exists to determine, during a cyclic merge, which part of
an incoming change was originally made on the branch which is now the
merge target, and how the incoming change needs to be tweaked in
order to guarantee a conflict-free merge, in the general case.

If someone knows about an algorithm for solving this problem
please let me know. I think the best you can do is using heuristics,
and there will always be cases where these fail miserably.

> 2. Subversion does not handle move/rename well (tree conflicts)
> 
> This is not just a merge issue, update/switch are also impacted by
> this problem.  I do not know what the current state of the art
> thinking is on this problem.  Can we auto-resolve tree conflicts at
> some point?  When this problem was first approached (before we came up
> with tree conflicts) it hit a brick wall where it seemed a new
> repository design was needed:
> 
> http://subversion.tigris.org/issues/show_bug.cgi?id=898
>
> Fix these two problems without killing performance and we would have
> solved the problem.

We will be able to auto-resolve a lot of tree conflicts involving renames
within Subversion's existing model as soon as we can follow copy-history
both ways on the client and server.

There is an existing prototype implementation by Nico Schellingerhout
and Piet-Hein Peeters at http://trumerge.open.collab.net/ that backs
up this claim (and which is used in production by aforementioned authors).

I am looking forward to exploring how to auto-resolve tree-conflicts
involving client-side renames, maybe for 1.8. This is a small part of
the general problem, but should be easy to do with wc-ng.

I think we should take a close look at trumerge and try to bring as
much of its functionality as possible into Subversion's core. 
This will likely require some server-side changes if we want to keep
performance up, but it will be well worth it.

The trumerge authors have also done useful theoretical work.
Nico once showed me a table of possible rename conflicts, and how he
decided to auto-resolve each on a case-by-case basis. This is very
valuable work we should build on.

In the meantime, if Andy has fresh ideas to bring to the table,
I'm all ears. I agree with Mark's and Mike's concerns about lack
of focus on the really hard problems. But I am also convinced that
there is still a lot of room for usability improvements in svn merge
even without tackling the underlying design issues.

Re: It's time to fix Subversion Merge

Posted by Andy Singleton <an...@assembla.com>.

  On 7/11/2011 3:33 PM, Bob Archer wrote:
>> If you want to fix Subversion merge there are two issues that have
>> to
>> be addressed.  If you are not addressing these issues you are just
>> rearranging the deck chairs on the Titanic.
>>
>> 1. Cyclic merges (the reason we added --reintegrate).
>>
>> http://subversion.tigris.org/issues/show_bug.cgi?id=2837
>>
>> This is really a core design issue in SVN that is going to be hard
>> to
>> work around.  Kamesh explored some new merge algorithm ideas in:
>>
>> http://subversion.tigris.org/issues/show_bug.cgi?id=2897
>>
>> He may or may not have been on the right track.  The solution is
>> just
>> so complicated no one was able to really review it.  Maybe a team
>> of
>> two or three people (so there there is peer review) starting over
>> with
>> the same basic idea could pull this off?  This seems the only
>> possible
>> way to solve this with our current design.
>>
>> 2. Subversion does not handle move/rename well (tree conflicts)
>>
>> This is not just a merge issue, update/switch are also impacted by
>> this problem.  I do not know what the current state of the art
>> thinking is on this problem.  Can we auto-resolve tree conflicts at
>> some point?  When this problem was first approached (before we came
>> up
>> with tree conflicts) it hit a brick wall where it seemed a new
>> repository design was needed:
>>
>> http://subversion.tigris.org/issues/show_bug.cgi?id=898
>>
>> Fix these two problems without killing performance and we would
>> have
>> solved the problem.  The syntactic sugar type changes could come
>> later
>> or on the side.  But I see little real benefit in any of those
>> proposals without a plan and roadmap to address these two items.
> Well, subversion was a better CVS, perhaps it is time for svn.Next to be a new subversion. It seems like the discussion on merge always comes with the baggage of the underlying design. I think the same issue occurred with CVS... it was just too much to retro fit.
>
> git is fine and all, but it is hard to get corporate buying... svn with a central repo that allows authorization and server side hooks and stuff is quite important there. It also seems that many people want offline commits and private repos too. However, svn.next could probably learn much from git and mecurial. It seems people don't have a problem with opinionated software if it is fast and works.
>
> BOb
>
If you want offline commit and private repositories, you can use git or 
mercurial.  We use both of them at Assembla.  Subversion with those 
features would not add much because it would look like a variant of 
mercurial.  It would lose one big advantage of subversion, which is 
simplicity for the user.  It would load the client up with new commands 
and mental models to move changes between the client repository and the 
server repository, and maintain local repositories.

I am proposing to preserve Subversion as a completely different kind of 
beast.  People seem to like it, because it is simple to use (checkout, 
update, commit, and it looks like a filesystem), and it supports big 
consolidated central repositories and big files, and you only need the 
working copy locally.  I am proposing to keep the server and central 
repository unchanged, and just make smarter clients.

This philosophy can be described as "dumb server, smart client".  You 
use the server, with the existing behavior, to keep revisions and serve 
up some patches.  Then you make clients that are smart enough to do 
things like merges, and save their own data structures in the 
revisions.  You can also add Web apps that do things like cloning and 
log/blame change reporting.  This preserves subversion as something 
unique, and it gives you an evolutionary path (just add clients) that is 
faster and less ponderous than revising all servers, and with them, all 
clients.  Using this philosophy, we can improve svn merge enough to 
enable some new, modern workflows.

Some of my suggestions come directly from this philosophy.
* Keeping merge_history as a normal file that servers and other clients 
do not need to understand.  We just use the server to update it 
incrementally.
* Handling file moves on the client side with pattern recognition and 
memory.  We don't try to fix the server side "copy + delete" operation.
* Adding a "clone" and foreign merge operations in Web applications and 
clients that work with existing svn server software.

-- 
Andy Singleton

RE: It's time to fix Subversion Merge

Posted by Bob Archer <Bo...@amsi.com>.

> If you want to fix Subversion merge there are two issues that have
> to
> be addressed.  If you are not addressing these issues you are just
> rearranging the deck chairs on the Titanic.
> 
> 1. Cyclic merges (the reason we added --reintegrate).
> 
> http://subversion.tigris.org/issues/show_bug.cgi?id=2837
> 
> This is really a core design issue in SVN that is going to be hard
> to
> work around.  Kamesh explored some new merge algorithm ideas in:
> 
> http://subversion.tigris.org/issues/show_bug.cgi?id=2897
> 
> He may or may not have been on the right track.  The solution is
> just
> so complicated no one was able to really review it.  Maybe a team
> of
> two or three people (so there there is peer review) starting over
> with
> the same basic idea could pull this off?  This seems the only
> possible
> way to solve this with our current design.
> 
> 2. Subversion does not handle move/rename well (tree conflicts)
> 
> This is not just a merge issue, update/switch are also impacted by
> this problem.  I do not know what the current state of the art
> thinking is on this problem.  Can we auto-resolve tree conflicts at
> some point?  When this problem was first approached (before we came
> up
> with tree conflicts) it hit a brick wall where it seemed a new
> repository design was needed:
> 
> http://subversion.tigris.org/issues/show_bug.cgi?id=898
> 
> Fix these two problems without killing performance and we would
> have
> solved the problem.  The syntactic sugar type changes could come
> later
> or on the side.  But I see little real benefit in any of those
> proposals without a plan and roadmap to address these two items.

Well, subversion was a better CVS, perhaps it is time for svn.Next to be a new subversion. It seems like the discussion on merge always comes with the baggage of the underlying design. I think the same issue occurred with CVS... it was just too much to retro fit. 

git is fine and all, but it is hard to get corporate buying... svn with a central repo that allows authorization and server side hooks and stuff is quite important there. It also seems that many people want offline commits and private repos too. However, svn.next could probably learn much from git and mecurial. It seems people don't have a problem with opinionated software if it is fast and works. 

BOb

Re: It's time to fix Subversion Merge

Posted by Mark Phippard <ma...@gmail.com>.

If you want to fix Subversion merge there are two issues that have to
be addressed.  If you are not addressing these issues you are just
rearranging the deck chairs on the Titanic.

1. Cyclic merges (the reason we added --reintegrate).

http://subversion.tigris.org/issues/show_bug.cgi?id=2837

This is really a core design issue in SVN that is going to be hard to
work around.  Kamesh explored some new merge algorithm ideas in:

http://subversion.tigris.org/issues/show_bug.cgi?id=2897

He may or may not have been on the right track.  The solution is just
so complicated no one was able to really review it.  Maybe a team of
two or three people (so there there is peer review) starting over with
the same basic idea could pull this off?  This seems the only possible
way to solve this with our current design.

2. Subversion does not handle move/rename well (tree conflicts)

This is not just a merge issue, update/switch are also impacted by
this problem.  I do not know what the current state of the art
thinking is on this problem.  Can we auto-resolve tree conflicts at
some point?  When this problem was first approached (before we came up
with tree conflicts) it hit a brick wall where it seemed a new
repository design was needed:

http://subversion.tigris.org/issues/show_bug.cgi?id=898

Fix these two problems without killing performance and we would have
solved the problem.  The syntactic sugar type changes could come later
or on the side.  But I see little real benefit in any of those
proposals without a plan and roadmap to address these two items.

Mark

Re: It's time to fix Subversion Merge

Posted by Stefan Sperling <st...@elego.de>.

On Mon, Jul 11, 2011 at 11:46:11AM -0400, Andy Singleton wrote:
>  Many developers are moving from Subversion to other SCM systems
> that have better merge capabilities. I have posted an article with a
> proposal to fix this problem, here:
> 
> http://blog.assembla.com/assemblablog/tabid/12618/bid/58122/It-s-Time-to-Fix-Subversion-Merge.aspx
> 
> SUMMARY

Interesting ideas, overall.

Have you considered trying to improve the existing merge code to
address specific problems you are having, instead of rewriting
it or making substantial changes?  I would guess that such an
approach could lead to improvements much sooner and would carry
less risk of destabilising the existing code base.

Are you aware of the various usability improvements we added to the
merge feature for Subversion 1.7?
See http://subversion.apache.org/docs/release-notes/1.7.html#merge-tracking-enhancements
They might not address all your problems. But I think they are good
examples of smalls steps that can be taken towards a much better
merge experience.

Have you considered how complex situations such as tree conflicts
will affect your design?

I am looking forward to seeing a prototype that implements your ideas.
Please feel free to ask questions on this list during design and
implementation of your prototype.

Re: It's time to fix Subversion Merge

Posted by Paul Burba <pt...@gmail.com>.

On Mon, Jul 11, 2011 at 11:46 AM, Andy Singleton <an...@assembla.com> wrote:
>  Many developers are moving from Subversion to other SCM systems that have
> better merge capabilities. I have posted an article with a proposal to fix
> this problem, here:
>
> http://blog.assembla.com/assemblablog/tabid/12618/bid/58122/It-s-Time-to-Fix-Subversion-Merge.aspx
>
> SUMMARY
> I propose to add a “newmerge” command, which will be compatible with
> existing svn servers and clients. It will be different from the existing
> merge command in the following ways:
>
> * It has a simple form “newmerge <from source>”, and a cherrypick form
> “newmerge <fromsource> <specific changes>. This pulls new changes from the
> source into the working copy, and merges them as automatically as possible.
> Changes can travel through multiple branches or repositories. The merge
> should work the same way, and work reliably, from branch to trunk, from
> branch to branch, or from trunk to branch.
>
> * We eliminate instructions like merge ranges, “reintegrate”, and “depth”.
> They add complexity and opportunities for human error.
>
> * It does not support subtree merges, merges to mixed working copies, or
> subtree/file merginfo. I think those cases cause a lot of complexity and are
> usually unwise. If you want to work on a subtree, then you can branch or
> clone the subtree and merge it back to the complete tree.
>
> * It can merge from foreign repositories, and track changes as they move
> through foreign repositories (eg clones). This is a common case in modern
> workflows. Changes have a global ID.
>
> * It will track mergeinfo (“merge_history”) in a versioned file that is
> committed into the branch. We want to have room to save any information that
> we need. It will have an extensible data structure, so that we can
> continuously improve this type of merge.
>
> I think that we can build a newmerge prototype by stripping down the
> existing merge to remove the subtree options, and moving to the extensible
> merginfo format. It will be useful to get advice about this from experienced
> team members.

Hi Andy,

I don't have a lot to add to what Stefan and Mark have already said,
but I do have a question:

Can't much of what you desire be achieved simply through policy, e.g.
merges are only done at the branch root level, no shallow depth
merges, no mixed-rev merge targets (already prevented by default in
1.7)?  Or is it that you want these policies enforced?

One thing you can do that will help regardless of whether we go in the
direction you propose or not: Submit some new test patches.  A few
well written tests that clearly demonstrate where the current design
falls down will help the rest of us understand what it is you are
trying to accomplish (and might get some parts of what you desire
fixed within the current design).

Paul

Re: It's time to fix Subversion Merge

Posted by Paul Burba <pt...@gmail.com>.

On Mon, Jul 11, 2011 at 1:57 PM, Andy Singleton <an...@assembla.com> wrote:
>  I received a lot of good comments, and I will batch up my responses in this
> note.
>
> From Stefan, essentially "Can you improve the existing merge"?  Yes, I think
> that we can start with the existing merge code.
>
> However, I also think that any implementation that uses subtree merginfo,
> and does not have extensible merginfo, is doomed.  Too much effort goes into
> fixing up the subtree merge feature, and it makes the tree change problems
> insoluble.  So, we need to decisively cut off the subtree options and move
> to a bigger and more extensible data structure.  That's why I proposed
> adding a new command, "newmerge".  The existing code won't be destabilized.
>
> Paul notes that we need test cases. Yes, exactly.  The first step in this
> project is to make some test cases, and see how they perform with the
> existing merge, and describe what users report as the problem with these
> cases.  This will settle the debate about whether the existing merge is good
> enough.  We can classify an alternate merge implementation according to how
> many additional cases it handles correctly.  I think a test cases is more
> than a patch.  It is a series of commit and merge operations.

Hi Andy,

To be clear: I mean new tests for our test suite that demonstrate the
problems you allude to (assuming we don't already have a test).  I'm
not sure what you mean by "I think a test cases is more than a patch.
It is a series of commit and merge operations." could you clarify?

Paul

> Mark and C. Micheal Plato raise the most serious issue.  Subversion merge
> problems come from the core architecture and have persisted over many years.
>  A complete fix may require a more radical change. And, it is possible that
> SVN needs a bigger redesign even to meet the goals I put out today.  You
> have more experience with that than I do.  We will see.  At this point, I
> think that merge can be significantly improved for the existing server
> architecture.
>
> Yes, the "cyclic merge" problem is a big one, and along with the tree change
> problem, it accounts for most of the frustrating behavior of Subversion
> merge - http://subversion.tigris.org/issues/show_bug.cgi?id=2837
>
> I believe that cyclic merges can be handled with a bigger merge_history /
> merginfo file. When you do a merge, you make some edits to resolve problems.
>  Then, you commit the changes - all of the merged changesets, plus the
> edits.  You also write the instructions for resolving this merge into the
> merge_history / merginfo file.  The next time you go to do a merge, you can
> replay any of the changes that you need. The new merge_history will be a big
> file with a complete history.
>
> This won't be a simple implementation, but the inside of a merge is never
> simple.  We need to add intelligence to the merge so that it looks simple to
> the user.  This intelligence can be incrementally improved through test
> cases and the open source process.
>
> New architecture might be required for handling moved and renamed paths.
>  This is a problem that comes up frequently in merges.  However, it also
> comes up in normal updates.  From a merge point of view, moved files should
> actually move and drag their changes with them, rather than appear as new
> files with copy+delete.
> * After we map to new files (manually, or with an algorithm) in an update or
> a merge, we should remember the change in the merge_history.  That's why we
> make the history extensible.
> * To automate this process, I think that moved files should be identified by
> filename and tree structure, not by file ID.  Yes, this is a change in the
> way that Subversion thinks, but it is clearly a problem that needs to be
> fixed.  Other SCM systems like git use an algorithm that makes a best guess
> on tree matches.  As noted by Greg, git doesn't do any other type of move
> tracking, and git merge works well.
>
> The work noted by Stefan on truMerge is a good example of this strategy.  We
> can do the same thing - http://trumerge.open.collab.net/ . I completely
> agree with the major points in this implementation:
> 1) It uses "heuristics" to map trees together
> 2) "All merges are done at the root of the branch" and "All merges are
> complete (no merges in sparse working copies, etc.)"
>
> You can see that getting rid of the subtree merges is a necessary and
> probably sufficient step for fixing the tree change problems.
>
> Mark asks where we get the GUID/UUID for foreign merges.  It already exists,
> because we have a server UUID, as Daniel wrote:
> <repository_UUID-revision_number>.  We just need to keep track of it.
>
> In systems like git, if the user wants to cherrypick, the user must enter
> the complete GUID/UUID.  However, it is probably not relevant for
> Subversion.  You can only cherrypick complete commits from the source, not
> from other sources.  So, you can leave out the UUID and just specify the
> revision number.  You can get complete merge commits with this technique.
>  Unfortunately, you are not guaranteed to have access to individual commits
> that were inside the merge. Because of this, changesets inside merge commits
> will be vulnerable to "conflation", you will have to sort through cases
> where you already have some but not all of the changes that were in a merge
> commit you are merging, and you won't be able to cherrypick inside the merge
> commit.  I need to think more about this case, and whether we should track
> individual commits that were merged.  That could be an extension.
>
>
> On 7/11/2011 12:51 PM, C. Michael Pilato wrote:
>>
>> On 07/11/2011 11:46 AM, Andy Singleton wrote:
>>>
>>>  Many developers are moving from Subversion to other SCM systems that
>>> have
>>> better merge capabilities. I have posted an article with a proposal to
>>> fix
>>> this problem, here:
>>>
>>>
>>> http://blog.assembla.com/assemblablog/tabid/12618/bid/58122/It-s-Time-to-Fix-Subversion-Merge.aspx
>>
>> [...]
>>
>>> I think that we can build a newmerge prototype by stripping down the
>>> existing merge to remove the subtree options, and moving to the
>>> extensible
>>> merginfo format. It will be useful to get advice about this from
>>> experienced
>>> team members.
>>
>> Your optimism is lovely (and welcome, even!), but I am not as convinced as
>> you that the reason why Subversion's merge functionality is subpar is as
>> superficial as the items you call out (and which are implied by your
>> prototyping plan above).
>>
>> Very little (if anything) about your proposal touches on the *real*
>> problems, such as Subversion's handling of moved/renamed objects, tree
>> conflict detection/handling/resolution, changeset conflation caused by the
>> fundamental diff+patch approach Subversion takes to merges rather than
>> first-class changeset support), etc.  These real problems with merging
>> were
>> documented many years before the merge tracking feature was ever
>> conceived,
>> and neither that feature nor its skin-deep-only warts you aim to address
>> made a dent in solving those very real problems.
>>
>> I don't aim to discourage -- far from it!  On the contrary, I want to
>> encourage a deeper review of the situation.  It's entirely possible that,
>> in
>> doing so, you will find solutions for the deeper core problems here, and
>> obviously the Subversion community (devs and users alike) would love that!
>>
>> -- C-Mike
>>
>> [1] I'll grant that in your blog post, you at least acknowledge the tree
>> changes problem and place great stock in your extensible merge tracking
>> format toward some future solution.
>>
>
>
> --
> Andy Singleton
> Founder/CEO, Assembla Online: http://www.assembla.com
> Phone: 781-328-2241
> Skype: andysingleton
>

Re: It's time to fix Subversion Merge

Posted by Andy Singleton <an...@assembla.com>.

  It would be quite nice to have a complete record of all patches 
included in a merge.  This would be real changeset passing.  However, 
this would obviously be big and start to reproduce the complete 
repository.  I think that cyclic merges could be improved by recording 
only the edits that you make on the merge.  So, if you merge r2 and r3, 
make some edits, and commit them, the merge_history would show that you 
merged r2 and r3, and it would include a diff, but only for your new 
edits.  It's worth a try.  The ticket about problems with cyclical 
merges specifically mentions the problem with tracking new edits on the 
merged working copy.

On 7/11/2011 2:42 PM, Daniel Shahaf wrote:
> Andy Singleton wrote on Mon, Jul 11, 2011 at 13:57:58 -0400:
>> Yes, the "cyclic merge" problem is a big one, and along with the
>> tree change problem, it accounts for most of the frustrating
>> behavior of Subversion merge -
>> http://subversion.tigris.org/issues/show_bug.cgi?id=2837
>>
>> I believe that cyclic merges can be handled with a bigger
>> merge_history / merginfo file. When you do a merge, you make some
>> edits to resolve problems.  Then, you commit the changes - all of
>> the merged changesets, plus the edits.  You also write the
>> instructions
> Define "instructions".
>
>
> If the algorithm is
>
>     When trying to merge to branch T a patch rN from branch S, where rN
>     added mergeinfo identifying changes that are already present on T,
>     diff (the tree resulting from merging the mergeinfo-delta described
>     by rN to S) to S@rN and apply the resulting patch to T,
>
> then perhaps you mean, precompute the parenthetical part at merge time
> and record it somewhere in the repository...?
>
>
>> for resolving this merge into the merge_history /
>> merginfo file.  The next time you go to do a merge, you can replay
>> any of the changes that you need. The new merge_history will be a
>> big file with a complete history.

-- 
Andy Singleton
Founder/CEO, Assembla Online: http://www.assembla.com
Phone: 781-328-2241
Skype: andysingleton

Re: It's time to fix Subversion Merge

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Andy Singleton wrote on Mon, Jul 11, 2011 at 13:57:58 -0400:
> Yes, the "cyclic merge" problem is a big one, and along with the
> tree change problem, it accounts for most of the frustrating
> behavior of Subversion merge -
> http://subversion.tigris.org/issues/show_bug.cgi?id=2837
> 
> I believe that cyclic merges can be handled with a bigger
> merge_history / merginfo file. When you do a merge, you make some
> edits to resolve problems.  Then, you commit the changes - all of
> the merged changesets, plus the edits.  You also write the
> instructions

Define "instructions".


If the algorithm is

   When trying to merge to branch T a patch rN from branch S, where rN
   added mergeinfo identifying changes that are already present on T,
   diff (the tree resulting from merging the mergeinfo-delta described
   by rN to S) to S@rN and apply the resulting patch to T,

then perhaps you mean, precompute the parenthetical part at merge time
and record it somewhere in the repository...?


> for resolving this merge into the merge_history /
> merginfo file.  The next time you go to do a merge, you can replay
> any of the changes that you need. The new merge_history will be a
> big file with a complete history.

Re: It's time to fix Subversion Merge

Posted by Andy Singleton <an...@assembla.com>.

  I received a lot of good comments, and I will batch up my responses in 
this note.

 From Stefan, essentially "Can you improve the existing merge"?  Yes, I 
think that we can start with the existing merge code.

However, I also think that any implementation that uses subtree 
merginfo, and does not have extensible merginfo, is doomed.  Too much 
effort goes into fixing up the subtree merge feature, and it makes the 
tree change problems insoluble.  So, we need to decisively cut off the 
subtree options and move to a bigger and more extensible data 
structure.  That's why I proposed adding a new command, "newmerge".  The 
existing code won't be destabilized.

Paul notes that we need test cases. Yes, exactly.  The first step in 
this project is to make some test cases, and see how they perform with 
the existing merge, and describe what users report as the problem with 
these cases.  This will settle the debate about whether the existing 
merge is good enough.  We can classify an alternate merge implementation 
according to how many additional cases it handles correctly.  I think a 
test cases is more than a patch.  It is a series of commit and merge 
operations.

Mark and C. Micheal Plato raise the most serious issue.  Subversion 
merge problems come from the core architecture and have persisted over 
many years.  A complete fix may require a more radical change. And, it 
is possible that SVN needs a bigger redesign even to meet the goals I 
put out today.  You have more experience with that than I do.  We will 
see.  At this point, I think that merge can be significantly improved 
for the existing server architecture.

Yes, the "cyclic merge" problem is a big one, and along with the tree 
change problem, it accounts for most of the frustrating behavior of 
Subversion merge - http://subversion.tigris.org/issues/show_bug.cgi?id=2837

I believe that cyclic merges can be handled with a bigger merge_history 
/ merginfo file. When you do a merge, you make some edits to resolve 
problems.  Then, you commit the changes - all of the merged changesets, 
plus the edits.  You also write the instructions for resolving this 
merge into the merge_history / merginfo file.  The next time you go to 
do a merge, you can replay any of the changes that you need. The new 
merge_history will be a big file with a complete history.

This won't be a simple implementation, but the inside of a merge is 
never simple.  We need to add intelligence to the merge so that it looks 
simple to the user.  This intelligence can be incrementally improved 
through test cases and the open source process.

New architecture might be required for handling moved and renamed 
paths.  This is a problem that comes up frequently in merges.  However, 
it also comes up in normal updates.  From a merge point of view, moved 
files should actually move and drag their changes with them, rather than 
appear as new files with copy+delete.
* After we map to new files (manually, or with an algorithm) in an 
update or a merge, we should remember the change in the merge_history.  
That's why we make the history extensible.
* To automate this process, I think that moved files should be 
identified by filename and tree structure, not by file ID.  Yes, this is 
a change in the way that Subversion thinks, but it is clearly a problem 
that needs to be fixed.  Other SCM systems like git use an algorithm 
that makes a best guess on tree matches.  As noted by Greg, git doesn't 
do any other type of move tracking, and git merge works well.

The work noted by Stefan on truMerge is a good example of this 
strategy.  We can do the same thing - http://trumerge.open.collab.net/ . 
I completely agree with the major points in this implementation:
1) It uses "heuristics" to map trees together
2) "All merges are done at the root of the branch" and "All merges are 
complete (no merges in sparse working copies, etc.)"

You can see that getting rid of the subtree merges is a necessary and 
probably sufficient step for fixing the tree change problems.

Mark asks where we get the GUID/UUID for foreign merges.  It already 
exists, because we have a server UUID, as Daniel wrote: 
<repository_UUID-revision_number>.  We just need to keep track of it.

In systems like git, if the user wants to cherrypick, the user must 
enter the complete GUID/UUID.  However, it is probably not relevant for 
Subversion.  You can only cherrypick complete commits from the source, 
not from other sources.  So, you can leave out the UUID and just specify 
the revision number.  You can get complete merge commits with this 
technique.  Unfortunately, you are not guaranteed to have access to 
individual commits that were inside the merge. Because of this, 
changesets inside merge commits will be vulnerable to "conflation", you 
will have to sort through cases where you already have some but not all 
of the changes that were in a merge commit you are merging, and you 
won't be able to cherrypick inside the merge commit.  I need to think 
more about this case, and whether we should track individual commits 
that were merged.  That could be an extension.

On 7/11/2011 12:51 PM, C. Michael Pilato wrote:
> On 07/11/2011 11:46 AM, Andy Singleton wrote:
>>   Many developers are moving from Subversion to other SCM systems that have
>> better merge capabilities. I have posted an article with a proposal to fix
>> this problem, here:
>>
>> http://blog.assembla.com/assemblablog/tabid/12618/bid/58122/It-s-Time-to-Fix-Subversion-Merge.aspx
> [...]
>
>> I think that we can build a newmerge prototype by stripping down the
>> existing merge to remove the subtree options, and moving to the extensible
>> merginfo format. It will be useful to get advice about this from experienced
>> team members.
> Your optimism is lovely (and welcome, even!), but I am not as convinced as
> you that the reason why Subversion's merge functionality is subpar is as
> superficial as the items you call out (and which are implied by your
> prototyping plan above).
>
> Very little (if anything) about your proposal touches on the *real*
> problems, such as Subversion's handling of moved/renamed objects, tree
> conflict detection/handling/resolution, changeset conflation caused by the
> fundamental diff+patch approach Subversion takes to merges rather than
> first-class changeset support), etc.  These real problems with merging were
> documented many years before the merge tracking feature was ever conceived,
> and neither that feature nor its skin-deep-only warts you aim to address
> made a dent in solving those very real problems.
>
> I don't aim to discourage -- far from it!  On the contrary, I want to
> encourage a deeper review of the situation.  It's entirely possible that, in
> doing so, you will find solutions for the deeper core problems here, and
> obviously the Subversion community (devs and users alike) would love that!
>
> -- C-Mike
>
> [1] I'll grant that in your blog post, you at least acknowledge the tree
> changes problem and place great stock in your extensible merge tracking
> format toward some future solution.
>

-- 
Andy Singleton
Founder/CEO, Assembla Online: http://www.assembla.com
Phone: 781-328-2241
Skype: andysingleton

Re: It's time to fix Subversion Merge

Posted by "C. Michael Pilato" <cm...@collab.net>.

On 07/12/2011 10:52 AM, Julian Foad wrote:
> Note that there are other possible ways of working, which can be tried
> as well as or instead of the branch.  An external script is another good
> option, like the way 'svnmerge.py' existed before the current built-in
> merge tracking.  I was talking to Andy the other day and I got the
> strong impression that his focus (and the reason he mentioned an
> "extensible" data format and some implementation details such as using a
> file rather than a property) was specifically to encourage
> experimentation and extension and participation, including by people who
> might not normally work in the core C code.

I almost suggested a script as a form of prototype earlier in the thread,
but got derailed by my concerns that, again, only the most superficial of
merge problems could be reasonably, performantly solved by a wrapper script.
 Still, it's a fine approach if it fits the plan of action.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand

Re: It's time to fix Subversion Merge

Posted by Geoff Rowell <ge...@gmail.com>.

On Jul 12, 2011, at 11:29 AM, Andy Singleton <an...@assembla.com> wrote:

> My original idea was to make a new executable file called "newmerge".  It would be an external script, and if you want to use it, you just need that extra script.  However, I was planning on building it from the C code that is in "svn merge" right now, rather than python or perl.  Using the existing code will put us ahead on patch libraries and protocols.  It will also give us access to the experts that have worked on "svn merge" in the past.  And, it will be easier to integrate into GUI clients.  So, the code would start as a cut-down version of "svn merge".  Later it could be added back as "svn newmerge".
> 
> If you want to keep it as a mergeable branch (clearly relevant), then maybe it's better just to add on as "svn newmerge" from the beginning.  If that approach is recommended, then maybe someone can help by adding the stub for this command to "svn".
> 
> I don't think that we will need to force anyone to give up the old merge.  If and when the newmerge is better, they will migrate on their own.  I think merge is an important concern for many users and they will migrate quickly if they can get a better result.
> 
> 
> On 7/12/2011 11:00 AM, Mark Phippard wrote:
>> On Tue, Jul 12, 2011 at 10:52 AM, Julian Foad<ju...@wandisco.com>  wrote:
>> 
>>> A script has the advantage that it could be tried and even rolled out by
>>> people who are still using 1.6.x.
>>> 
>>> None of that is a reason not to start a branch.
>> +1 on the branch.  But wouldn't it make sense to defer creating the
>> branch for as long as possible?  If Andy's team wants to explore
>> scripts first why create a branch that is just going to immediately
>> start getting stale?  Maybe we should start with patches to trunk and
>> make the branch when there is something to not include on trunk?
>> 
>> FWIW, I would also like to see more dev@ discussion and less private
>> discussion.  Before we start working on a new command seems we ought
>> to discuss the ideas more.  As I mentioned a new command is going to
>> need a way to force users to use the new command or else all the same
>> issues have to be addressed.  If we can force users to do something,
>> then I am not sure we need a new command.  We can just have a way to
>> not allow users to use the features of existing merge that we do not
>> want them to use.  The existing merge command already supports the
>> proposed simple syntax.
>> 
According to that vision, what happens when everyone decides to use "newmerge"? What happens to the old "merge" command?

-Geoff

Re: It's time to fix Subversion Merge

Posted by Paul Burba <pt...@gmail.com>.

On Tue, Jul 12, 2011 at 11:55 AM, Andy Singleton <an...@assembla.com> wrote:
>  On 7/12/2011 11:43 AM, Stefan Sperling wrote:
>>
>> On Tue, Jul 12, 2011 at 11:29:57AM -0400, Andy Singleton wrote:
>>>
>>> If you want to keep it as a mergeable branch (clearly relevant),
>>> then maybe it's better just to add on as "svn newmerge" from the
>>> beginning.  If that approach is recommended, then maybe someone can
>>> help by adding the stub for this command to "svn".
>>
>> Adding such a stub will be the easiest of all tasks that lie before
>> you and your team. So if you need a new subcommand, I would suggest
>> to add the necessary stub code yourself, if only to get your feet
>> wet with the Subversion code base.
>>
>> I don't mind new subcommands in principle, but I would oppose
>> 'svn newmerge' as a name for a new subcommand. 'newmerge' is a
>> good working title but not a good name for a subcommand.
>>
>> Overall, I'd prefer solutions which change 'svn merge' in a
>> backwards-compatible manner.
>
> In my proposal, to if you decide to switch from "merge" (with subtree
> merginfo properties) to "newmerge" (with merge_history file), you would just
> make a new branch that has the merge_history, and then use newmerge from
> that point on.  Three points help you make the transition:

Hi Andy,

I'm curious as to what others think, but in my experience both of
these first two points are not universally valid and thus not very
safe assumptions:

> * On most Subversion teams merging is done by relative experts.  They can
> make a choice about which merge to use.

I've seen plenty of cases where the personnel charged with merging are
anything but experts.  If the 'mergers' are experts why can't a policy
solution of "only merge to branch roots' solve most of the problems
you raise?  Because if you do that the subtree mergeinfo problem
neatly disappears (because there is none[1]).  Ok, I won't ask this
question again (today at least ;-)

[1] I'm hand waving a bit if you have pre-existing subtree mergeinfo,
but for newly created branches that shouldn't be an issue.

> * Branches in svn are often fairly short-lived, because of the cyclic merge
> problem. So, you frequently get an opportunity to make this change.

Maybe in terms of sheer branch count, but there are plenty of folks
using long-lived feature branches.

> It is not possible to make the new architecture compatible with the old
> merginfo.  The subtree merginfo has been the cause of many problems,
> complexities, and fixes.

I'm still not sure what the existing problems inherent to subtree
mergeinfo you are talking about.  Have you tried some of your problem
cases with a recent trunk (1.7) client?  This is why I asked for some
new tests for our test suite.

At the start of your thread you said, "* It does not support subtree
merges, merges to mixed working copies, or subtree/file merginfo. I
think those cases cause a lot of complexity and are usually unwise. If
you want to work on a subtree, then you can branch or clone the
subtree and merge it back to the complete tree."

I agree on the complexity, that is why we suggest merging to the roots
of branches only.  I'll even agree on using "subtree mergeinfo" as a
curses, but I'd like to see more examples of why subtree merges are
"unwise" in 1.7.

Thanks,

Paul

> It is not extensible to track more information for
> cyclic merges or tree mapping.  It makes tree mapping more difficult.
>  Abandoning merginfo will lift a big weight from the people working on
> merge, and make them more successful.
>
> It is possible to force the conversion by detecting old merginfo and writing
> some of it into the new branch-based merge_history file.
>
> This Apache team will decide when to deprecate the merge with the
> old/existing merginfo format.  In 1.7 you have a warning about merging to
> mixed-revision working copies.  You could use a similar approach for
> merging.
>
> Yes, I think it is important for users to be able to make their own
> variations and improvements of merge, especially early in the lifecycle.  Is
> it easier for people to build "svn" with "svn  newmerge", or is it easier
> for them to build from the packaging as a stand-alone executable?
>
> --
> Andy Singleton
> Founder/CEO, Assembla Online: http://www.assembla.com
> Phone: 781-328-2241
> Skype: andysingleton
>

Re: It's time to fix Subversion Merge

Posted by Andy Singleton <an...@assembla.com>.

  On 7/12/2011 11:43 AM, Stefan Sperling wrote:
> On Tue, Jul 12, 2011 at 11:29:57AM -0400, Andy Singleton wrote:
>> If you want to keep it as a mergeable branch (clearly relevant),
>> then maybe it's better just to add on as "svn newmerge" from the
>> beginning.  If that approach is recommended, then maybe someone can
>> help by adding the stub for this command to "svn".
> Adding such a stub will be the easiest of all tasks that lie before
> you and your team. So if you need a new subcommand, I would suggest
> to add the necessary stub code yourself, if only to get your feet
> wet with the Subversion code base.
>
> I don't mind new subcommands in principle, but I would oppose
> 'svn newmerge' as a name for a new subcommand. 'newmerge' is a
> good working title but not a good name for a subcommand.
>
> Overall, I'd prefer solutions which change 'svn merge' in a
> backwards-compatible manner.

In my proposal, to if you decide to switch from "merge" (with subtree 
merginfo properties) to "newmerge" (with merge_history file), you would 
just make a new branch that has the merge_history, and then use newmerge 
from that point on.  Three points help you make the transition:
* On most Subversion teams merging is done by relative experts.  They 
can make a choice about which merge to use.
* Branches in svn are often fairly short-lived, because of the cyclic 
merge problem. So, you frequently get an opportunity to make this change.

It is not possible to make the new architecture compatible with the old 
merginfo.  The subtree merginfo has been the cause of many problems, 
complexities, and fixes.  It is not extensible to track more information 
for cyclic merges or tree mapping.  It makes tree mapping more 
difficult.  Abandoning merginfo will lift a big weight from the people 
working on merge, and make them more successful.

It is possible to force the conversion by detecting old merginfo and 
writing some of it into the new branch-based merge_history file.

This Apache team will decide when to deprecate the merge with the 
old/existing merginfo format.  In 1.7 you have a warning about merging 
to mixed-revision working copies.  You could use a similar approach for 
merging.

Yes, I think it is important for users to be able to make their own 
variations and improvements of merge, especially early in the 
lifecycle.  Is it easier for people to build "svn" with "svn  newmerge", 
or is it easier for them to build from the packaging as a stand-alone 
executable?

-- 
Andy Singleton
Founder/CEO, Assembla Online: http://www.assembla.com
Phone: 781-328-2241
Skype: andysingleton

Re: It's time to fix Subversion Merge

Posted by Stefan Sperling <st...@elego.de>.

On Tue, Jul 12, 2011 at 11:29:57AM -0400, Andy Singleton wrote:
> If you want to keep it as a mergeable branch (clearly relevant),
> then maybe it's better just to add on as "svn newmerge" from the
> beginning.  If that approach is recommended, then maybe someone can
> help by adding the stub for this command to "svn".

Adding such a stub will be the easiest of all tasks that lie before
you and your team. So if you need a new subcommand, I would suggest
to add the necessary stub code yourself, if only to get your feet
wet with the Subversion code base.

I don't mind new subcommands in principle, but I would oppose
'svn newmerge' as a name for a new subcommand. 'newmerge' is a
good working title but not a good name for a subcommand.

Overall, I'd prefer solutions which change 'svn merge' in a
backwards-compatible manner.

Re: It's time to fix Subversion Merge

Posted by Mark Phippard <ma...@gmail.com>.

On Tue, Jul 12, 2011 at 12:40 PM, Andy Singleton <an...@assembla.com> wrote:

> Good point.  If we allow foreign merges, then "log" and "blame" are not
> going to work well.  They will show changes coming from the merge, rather
> than from the original commit.  Fine.  I'm willing to give up those details,
> because merge is important.  People will be happy with that if they give up
> some detail and get a merge that works.  If you want log and blame details,
> then don't do foreign merges.

I was not even speaking about foreign merges.  I would be fine with
supporting those but not with log/blame.  I am just saying if you
invent a new merge tracking format that you have to support it in
log/blame/mergeinfo too.

> I have made a specific proposal for handling moves and renames that is not
> very complicated.  Moves and renames can be identified by file name pattern
> matching.  This is the tactic that is used by git and subversion trumerge,
> so we know it works well in practice.  Once moves are spotted by the merge,
> we can write the changes back in our merge commit, using the existing
> copy+delete mechanism for writing moves.  And, we can record the move in our
> merge_history file so that changes can be mapped automatically in future
> merges between between branches that don't have the move, and branches that
> do have the move.

Hmm, I guess I did not see your proposal as being specific.  For
example, trumerge works by doing some fairly extensive examinations of
the entire repository history to build a cache of moves.  It then
constructs a merge script that it runs that does many merges to
implement it.  I am still not convinced this is easy.  That said, IMO,
this is the most important feature we need.  So I am totally willing
to give you some runway to go solve it.

> This proposal is straightforward, but is not compatible with subtree
> merginfo.  It's pretty easy to do this mapping on one branch.  It starts to
> get complex and slow and problematic if you have to do it recursively on
> subtrees.  And, we can't record the mapping in the existing merginfo format.

I get that you do not want to support subtree mergeinfo.  I am simply
saying why can't we invent a mechanism for a project to indicate that
it does not want to allow subtree merges so that they can enjoy these
new merge features?  I think you have to invent this even if you
create a new command, and I am just saying that once you invent this,
you do not need a new command.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

Re: It's time to fix Subversion Merge

Posted by Andy Singleton <an...@assembla.com>.

  On 7/12/2011 12:25 PM, Mark Phippard wrote:
> On Tue, Jul 12, 2011 at 12:16 PM, Andy Singleton<an...@assembla.com>  wrote:
>
>> Mark, I agree with you that the existing merge will work better if we apply
>> some restrictions.  I can see that the project is already going that way,
>> and maybe it is good to continue in that direction.  As a user, I would find
>> that helpful.
>>
>> However, we cannot get good results with the subtree merginfo format.  It is
>> a failed architecture.  A lot of smart people have worked on it, and they
>> have not achieved good results.  It is the source of many annoying problems.
>>   Even on this list, nobody defends it.  They only express a desire for
>> "compatibility."  You yourself have argued that basically, it can't be done,
>> that cyclic merge won't work.  Yes, if you stick with the subtree merginfo,
>> IT CAN'T BE DONE.  You are guaranteed to be right.  However, if we free
>> ourselves from this one restriction, we can do something good.  I invite
>> everyone to stop beating their heads against this wall.
>>
>> In my proposal, you can basically keep everything about Subversion the same:
>> the servers, most of the clients, the old merge command if you choose to use
>> it.  It's all compatible.  I am only asking you to give up ONE THING:
>> subtree merginfo.  To succeed, we have to get rid of both parts of it.  We
>> have to get rid of the subtree info, and we have to get rid of the fussy
>> little merginfo format.  If newmerge is successful and we want people to
>> move to it, we can force the conversion by asking "svn" to read the old
>> merginfo and write some information into the new format.
> I really do not know if I agree with this, and I think you might be
> being a bit naive if you think that simply by creating a new format
> you are going to not have problems and you are going to even create an
> improvement.  That said, I think some of the ideas around storing
> patches is at least interesting.
>
> I actually do not think mergeinfo is our problem or even our limiter.
> I agree it would have been easier to implement if we did not support
> all the odd cases we do, but at the same time, I think all of those
> odd cases are working pretty well now.  So at best, removing support
> for that stuff might just make future work a bit easier.
>
> The problem in merge is in core SVN.  Merge tracking was always based
> on the premise that merge in SVN worked fine, but it was annoying to
> have to manually track merges and specify the revisions to merge.  So
> merge tracking tracks that and allows the simple merge syntax you are
> advocating.  In that sense, merge tracking accomplished its goal and
> as of 1.7 the last of the warts in the implementation should be
> resolved.
>
> The problem is that SVN merge does not really work fine.  The design
> of SVN makes cyclic merges really difficult and perhaps impossible.
> The lack of support for moves and renames makes merge not useful for
> people that refactor a lot etc.  None of these are problems with merge
> tracking.
>
> You seem to want to extend merge tracking to provide new ways to solve
> these problems.  Great, design it and propose it.  I do not
> necessarily see why that info cannot be stores in properties, or even
> the existing property.  Nor do I see why we cannot migrate the info we
> have to a new format.  We need real design to have something to
> discuss.
>
> Finally, in your new design do not forget about things like log -g and
> blame -g, as well as the mergeinfo command.  These features are all
> necessary parts of a merge tracking plan and must have answers from
> the first release.
>
Good point.  If we allow foreign merges, then "log" and "blame" are not 
going to work well.  They will show changes coming from the merge, 
rather than from the original commit.  Fine.  I'm willing to give up 
those details, because merge is important.  People will be happy with 
that if they give up some detail and get a merge that works.  If you 
want log and blame details, then don't do foreign merges.

I have made a specific proposal for handling moves and renames that is 
not very complicated.  Moves and renames can be identified by file name 
pattern matching.  This is the tactic that is used by git and subversion 
trumerge, so we know it works well in practice.  Once moves are spotted 
by the merge, we can write the changes back in our merge commit, using 
the existing copy+delete mechanism for writing moves.  And, we can 
record the move in our merge_history file so that changes can be mapped 
automatically in future merges between between branches that don't have 
the move, and branches that do have the move.

This proposal is straightforward, but is not compatible with subtree 
merginfo.  It's pretty easy to do this mapping on one branch.  It starts 
to get complex and slow and problematic if you have to do it recursively 
on subtrees.  And, we can't record the mapping in the existing merginfo 
format.

-- 
Andy Singleton
Founder/CEO, Assembla Online: http://www.assembla.com
Phone: 781-328-2241
Skype: andysingleton

RE: It's time to fix Subversion Merge

Posted by Bob Archer <Bo...@amsi.com>.

> On Tue, Jul 12, 2011 at 12:52 PM, Hyrum K Wright
> <hy...@hyrumwright.org> wrote:
> > On Tue, Jul 12, 2011 at 11:47 AM, Mark Phippard
> <ma...@gmail.com> wrote:
> >> On Tue, Jul 12, 2011 at 12:45 PM, Hyrum K Wright
> <hy...@hyrumwright.org> wrote:
> >>> On Tue, Jul 12, 2011 at 11:25 AM, Mark Phippard
> <ma...@gmail.com> wrote:
> >>>>
> >>>> Finally, in your new design do not forget about things like
> log -g and
> >>>> blame -g, as well as the mergeinfo command.  These features
> are all
> >>>> necessary parts of a merge tracking plan and must have answers
> from
> >>>> the first release.
> >>>
> >>> Really?  I think we should take whatever improvements we can
> get,
> >>> rather than saying "oh, and you need to support all this legacy
> >>> baggage as well."  While they are useful to some folks, I don't
> think
> >>> they are can't-live-without-absolutely-must-have features.  I'm
> mean,
> >>> if we're thinking outside the box, let's think Outside the BOX,
> and
> >>> try not to pigeon hole ourselves.
> >>>
> >>> It would be interesting to see the usage of 'log -g' and 'blame
> -g'.
> >>> I believe Tortoise uses them as the default under-the-hood, so
> that
> >>> probably inflates the actual usage numbers quite a bit.
> >>
> >> A new merge tracking design that does not support these
> features, or
> >> at least have a very definite plan for supporting then, would be
> dead
> >> on arrival.  If the design does not support these options then
> go back
> >> to the drawing board.
> >>
> >> These are absolute must have features.
> >
> > With all due respect, that's not your decision to make.  This
> > consensus-based community gets to determine what are must-have
> > features and what aren't.
> >
> > In reading this thread, it almost feels like there are two
> classes of
> > merge users: power users and others, and they have different sets
> of
> > requirements.  Certainly it's not a discrete set, but a
> continuum.
> > Unfortunately, Subversion tries to serve the needs of both with a
> > single paradigm, and it's not working too well.
> 
> I see us heading down the path of a classic engineering anti-
> pattern.
> There is a tough problem to solve, so you decide to rewrite and
> focus
> on that problem and along the way you toss aside existing things
> you
> already support because you think they are not important.

This is so true... one persons important is another person's too complex.

> 
> Users of Subversion before we had merge tracking, and users of
> Subversion since we have had merge tracking have made it very clear
> that these options are important to them.  You cannot come up with
> a
> new merge design that does not account for these needs.

This has been much talk on this list on finding a way to define a project root and only allowing certain commands to work at root, or work as if they were issued at root. Actually, I expect (I could be wrong) that many subtree merges are just a mistake. People say, but I just want to merge the changes in this one file. Ok, so just merge that one changeset... oh, the didn't realize they could do that. 

I wonder to if majority of people also want to copy/branch from project roots too. 

So, should there be consideration of only allowing merge/branch from project root... or defaulting to project root if you are in a child node? There would probably need to be some type of marker property on a project root. I'm not sure how that would get onto a project. Perhaps when you import or something. Or perhaps those command would give a warning:

svn merge /branch/version1.2.5
(if your target isn't a project root or one isn't found walking up the tree)
 error: No project root defined in merge target
(if your merge source isn't a target root)
error: No project root defined in merge source

Then copy/merge could have a -force command that would allow them to not work on a project root (current behavior).

The issue of cylic merges seems highly complicated. I don't see an easy way you guys could solve that.

Granted, I haven't done and svn coding... but from reading the dev list for a while and as a user it seems that there could be much less change to the code base if:

1. There was a way to designate a project root
2. svn copy and merge only worked on a project root walking up the WC tree to find one.
3. --force on svn copy / merge would work as it does now, no project root needed
4. Automatically add merge info to a merge source when you do a reintegrate merge from it so the branch is still usable.

I guess the above goes into the forcing people into the pit of success when they are branching and merging. But the --force command still allowing them to point that gun at their foot.

BOb

RE: It's time to fix Subversion Merge

Posted by Bob Jenkins <rj...@collab.net>.

I am reticent to comment to the developer community directly (and you
can see it has only happened a couple of times in 10 years), but as
someone who's been focused for the life of Subversion on its
implementation for enterprises and has been the source of impetus for
much of what CollabNet has contributed post 1.1 (including to some
extent the push to implement merge tracking as the key feature of 1.5);
I feel compelled to speak to this issue.

Merge tracking was implemented out a need from enterprises to create an
audit trail and to have the tool make merging require less user
knowledge to execute (within obvious limits). It was a logical next step
to going beyond the original focus (being a viable replacement for CVS)
to becoming a tool that supported the additional needs that enterprises
have that open source development either doesn't have or doesn't have to
the same extent.

While in hindsight I still regret not understanding the tree conflict
issue and getting it addressed first, there is no question that the
presence of this functionality has opened up a larger community of users
for Subversion than it could have had without it. For all its warts (and
believe me I appreciate the extent of its warts), it has made a HUGE
difference in Subversion's adoption.

It is also important to note that there has been a significant shift
over the history of this project from being very open source focused and
heavily utilized for that purpose to being largely utilized by
enterprise customers. That inherently brings many use cases that those
focused on open source development don't face and commonly struggle to
appreciate (I recall originally educating committers on what tree
conflicts were and why fixing them was critical to enterprises). That is
completely understandable, but the project must provide for its
user/customer base and for Subversion that is enterprises.

It must also be understood that this customer base is not appropriately
represented by voices in the community because of a reluctance by many
enterprises to be public about their use of open source technology (or
frankly even proprietary technology though in that case the issue of
representation is handled very differently). There are some who've
commented that there must not be pent up demand for 1.7 because there
isn't a huge outcry for its continued delay. This is a
misinterpretation. Enterprises desperately want to see Subversion move
forward and are frustrated with the amount of time that 1.7 has taken,
but again, the users in those enterprises are constrained by those
enterprises from publicly expressing that. Putting forth designs and
functionality suggestions are going to get the same responses (i.e.,
more open source development feedback) even though enterprises have
strong opinions. This list is not going to be indicative, at all, of
what the user community necessarily feels about topics like this (beyond
the obvious, we want merging to be faster and work better).

I see a lot of dismissing of various pieces of existing functionality
that I don't think reflect what the true user base uses and requires.
Simplicity is a wonderful goal and wherever possible it should be
pursued, but Subversion should not abandon its users just because
something is complex as long as it is achievable. As has been suggested
before, best practices should be trumpeted (I know we do at CollabNet)
and potentially even enforced when an organization chooses to enforce
it. Removing functionality is something that must not be taken lightly
and must be done only after diligent efforts to truly understand the
will of the user community.

I have great admiration for Subversion's developers past and present,
but with the widespread use of Subversion today, impactful changes to
functionality need to be taken cautiously and with feedback sought at
the idea, design, and implementation levels. There are many features
that bring value to some, but others will want to be able to opt out of
those features (which is to their credit as many might just try to block
the features they don't want to see used in their enterprise). The
ability to do that I see as a requirement to many things that people
want to implement and for me, a precursor to adding other functionality.

I am not opposed to new approaches and ideas to merges and merge
tracking. I welcome new input on these topics and look forward to
improvements as a result of them. I am opposed to dropping functionality
without extended efforts to get feedback (not by just asking the members
of this email list) and without proven returns that include having
whatever functionality is determined to be absolutely required.

I am also a HUGE proponent of branching for this work as with most of
what is non-trivial development. Users are not happy with a release that
will take at least 28 months. We heard that cry when 1.5 took 21 months
and people swore that would not happen again. This project must find a
way to utilize branching and do reviews on branches versus trapping
releases by having to complete large pieces of functionality because
they are being developed on the trunk. Releases must come at a more
predicable and reasonable pace. I've had to even respond to whether
Subversion has sunsetted due to the long wait for this release. I know
this is not a popular idea, but we can't expect different results by
doing things the same way.

Thanks for listening,

Bob Jenkins
Director, Subversion Services
CollabNet

Re: It's time to fix Subversion Merge

Posted by Mark Phippard <ma...@gmail.com>.

On Tue, Jul 12, 2011 at 1:05 PM, Andy Singleton <an...@assembla.com> wrote:
>  Log and blame will not be problems.  My proposal does not change log or
> blame.  They will still work fine if you apply newmerge.  The revisions and
> authors and commit messages are still in the repository in the same place,
> and log and blame will still show them.

I am specifically talking about the -g options.  I should be able to
run blame -g and see the revision and author that changed the line of
code, not the revision and author that merged the changes to the
current branch.  Likewise, when I run log, I want the revisions that
are merges to be able to show me the details of the revisions that
were merged.  Users have been clear that these are essential features
of merge.

Likewise, the existing mergeinfo command (which could be improved) is
necessary.  It lets someone ask what has been merged to this branch or
what is eligible to merge to my branch.  I know there are some people
that use the command line, but the GUI tools like Subclipse, AnkhSVN
and TortoiseSVN use the underlying API to drive their merge-friendly
GUI tools.

I am only pointing these out so that you plan for them in your design.

> There are only two cases that seem relevant to the newmerge proposal:
> * If you do a move, you lose history prior to the move.  This is not new.

Huh? I am not aware of this.  This is the only feature that SVN
actually supports.  It does NOT lose history.  When SVN 1.0 bragged
about its merge support it was because of this feature.  The problem
with move is that we do not have any real record of where something
was moved to, only where it was moved from.  So do not have a good way
to show moves and of course we do not automatically handle them in
update and merge.

> * If you have foreign merges, you can lose some detail about individual
> commits that were done in the foreign repository.  Instead, you might get
> information about the merge commit that included them.  This is graceful
> degradation, and it's less than people are already tolerating with moves.

Yes, and fwiw, I am fine with the limitations in this case.  Improving
this in the long term would be cool, but I do not personally care if
we make any improvements in this area in the near term.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

Re: It's time to fix Subversion Merge

Posted by Andy Singleton <an...@assembla.com>.

  Log and blame will not be problems.  My proposal does not change log 
or blame.  They will still work fine if you apply newmerge.  The 
revisions and authors and commit messages are still in the repository in 
the same place, and log and blame will still show them.

There are only two cases that seem relevant to the newmerge proposal:
* If you do a move, you lose history prior to the move.  This is not 
new.  This is a problem with the existing subversion implementation of 
file moves.  So, if people are living with it now, they will live with 
the same thing in the future.  It only becomes more obvious if merge can 
follow these changes, but log and blame cannot.  You could upgrade log 
and blame to follow the changes, and then the system will be better than 
it is now.

* If you have foreign merges, you can lose some detail about individual 
commits that were done in the foreign repository.  Instead, you might 
get information about the merge commit that included them.  This is 
graceful degradation, and it's less than people are already tolerating 
with moves.  There are ways to get the information back if you want to 
do some work on merge and log and blame.

On 7/12/2011 12:58 PM, Mark Phippard wrote:
> On Tue, Jul 12, 2011 at 12:52 PM, Hyrum K Wright<hy...@hyrumwright.org>  wrote:
>> On Tue, Jul 12, 2011 at 11:47 AM, Mark Phippard<ma...@gmail.com>  wrote:
>>> On Tue, Jul 12, 2011 at 12:45 PM, Hyrum K Wright<hy...@hyrumwright.org>  wrote:
>>>> On Tue, Jul 12, 2011 at 11:25 AM, Mark Phippard<ma...@gmail.com>  wrote:
>>>>> Finally, in your new design do not forget about things like log -g and
>>>>> blame -g, as well as the mergeinfo command.  These features are all
>>>>> necessary parts of a merge tracking plan and must have answers from
>>>>> the first release.
>>>> Really?  I think we should take whatever improvements we can get,
>>>> rather than saying "oh, and you need to support all this legacy
>>>> baggage as well."  While they are useful to some folks, I don't think
>>>> they are can't-live-without-absolutely-must-have features.  I'm mean,
>>>> if we're thinking outside the box, let's think Outside the BOX, and
>>>> try not to pigeon hole ourselves.
>>>>
>>>> It would be interesting to see the usage of 'log -g' and 'blame -g'.
>>>> I believe Tortoise uses them as the default under-the-hood, so that
>>>> probably inflates the actual usage numbers quite a bit.
>>> A new merge tracking design that does not support these features, or
>>> at least have a very definite plan for supporting then, would be dead
>>> on arrival.  If the design does not support these options then go back
>>> to the drawing board.
>>>
>>> These are absolute must have features.
>> With all due respect, that's not your decision to make.  This
>> consensus-based community gets to determine what are must-have
>> features and what aren't.
>>
>> In reading this thread, it almost feels like there are two classes of
>> merge users: power users and others, and they have different sets of
>> requirements.  Certainly it's not a discrete set, but a continuum.
>> Unfortunately, Subversion tries to serve the needs of both with a
>> single paradigm, and it's not working too well.
> I see us heading down the path of a classic engineering anti-pattern.
> There is a tough problem to solve, so you decide to rewrite and focus
> on that problem and along the way you toss aside existing things you
> already support because you think they are not important.
>
> Users of Subversion before we had merge tracking, and users of
> Subversion since we have had merge tracking have made it very clear
> that these options are important to them.  You cannot come up with a
> new merge design that does not account for these needs.
>

-- 
Andy Singleton
Founder/CEO, Assembla Online: http://www.assembla.com
Phone: 781-328-2241
Skype: andysingleton

Re: It's time to fix Subversion Merge

Posted by Mark Phippard <ma...@gmail.com>.

On Tue, Jul 12, 2011 at 12:52 PM, Hyrum K Wright <hy...@hyrumwright.org> wrote:
> On Tue, Jul 12, 2011 at 11:47 AM, Mark Phippard <ma...@gmail.com> wrote:
>> On Tue, Jul 12, 2011 at 12:45 PM, Hyrum K Wright <hy...@hyrumwright.org> wrote:
>>> On Tue, Jul 12, 2011 at 11:25 AM, Mark Phippard <ma...@gmail.com> wrote:
>>>>
>>>> Finally, in your new design do not forget about things like log -g and
>>>> blame -g, as well as the mergeinfo command.  These features are all
>>>> necessary parts of a merge tracking plan and must have answers from
>>>> the first release.
>>>
>>> Really?  I think we should take whatever improvements we can get,
>>> rather than saying "oh, and you need to support all this legacy
>>> baggage as well."  While they are useful to some folks, I don't think
>>> they are can't-live-without-absolutely-must-have features.  I'm mean,
>>> if we're thinking outside the box, let's think Outside the BOX, and
>>> try not to pigeon hole ourselves.
>>>
>>> It would be interesting to see the usage of 'log -g' and 'blame -g'.
>>> I believe Tortoise uses them as the default under-the-hood, so that
>>> probably inflates the actual usage numbers quite a bit.
>>
>> A new merge tracking design that does not support these features, or
>> at least have a very definite plan for supporting then, would be dead
>> on arrival.  If the design does not support these options then go back
>> to the drawing board.
>>
>> These are absolute must have features.
>
> With all due respect, that's not your decision to make.  This
> consensus-based community gets to determine what are must-have
> features and what aren't.
>
> In reading this thread, it almost feels like there are two classes of
> merge users: power users and others, and they have different sets of
> requirements.  Certainly it's not a discrete set, but a continuum.
> Unfortunately, Subversion tries to serve the needs of both with a
> single paradigm, and it's not working too well.

I see us heading down the path of a classic engineering anti-pattern.
There is a tough problem to solve, so you decide to rewrite and focus
on that problem and along the way you toss aside existing things you
already support because you think they are not important.

Users of Subversion before we had merge tracking, and users of
Subversion since we have had merge tracking have made it very clear
that these options are important to them.  You cannot come up with a
new merge design that does not account for these needs.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

Re: It's time to fix Subversion Merge

Posted by Hyrum K Wright <hy...@hyrumwright.org>.

On Tue, Jul 12, 2011 at 11:47 AM, Mark Phippard <ma...@gmail.com> wrote:
> On Tue, Jul 12, 2011 at 12:45 PM, Hyrum K Wright <hy...@hyrumwright.org> wrote:
>> On Tue, Jul 12, 2011 at 11:25 AM, Mark Phippard <ma...@gmail.com> wrote:
>>>
>>> Finally, in your new design do not forget about things like log -g and
>>> blame -g, as well as the mergeinfo command.  These features are all
>>> necessary parts of a merge tracking plan and must have answers from
>>> the first release.
>>
>> Really?  I think we should take whatever improvements we can get,
>> rather than saying "oh, and you need to support all this legacy
>> baggage as well."  While they are useful to some folks, I don't think
>> they are can't-live-without-absolutely-must-have features.  I'm mean,
>> if we're thinking outside the box, let's think Outside the BOX, and
>> try not to pigeon hole ourselves.
>>
>> It would be interesting to see the usage of 'log -g' and 'blame -g'.
>> I believe Tortoise uses them as the default under-the-hood, so that
>> probably inflates the actual usage numbers quite a bit.
>
> A new merge tracking design that does not support these features, or
> at least have a very definite plan for supporting then, would be dead
> on arrival.  If the design does not support these options then go back
> to the drawing board.
>
> These are absolute must have features.

With all due respect, that's not your decision to make.  This
consensus-based community gets to determine what are must-have
features and what aren't.

In reading this thread, it almost feels like there are two classes of
merge users: power users and others, and they have different sets of
requirements.  Certainly it's not a discrete set, but a continuum.
Unfortunately, Subversion tries to serve the needs of both with a
single paradigm, and it's not working too well.

-Hyrum

Re: It's time to fix Subversion Merge

Posted by Mark Phippard <ma...@gmail.com>.

On Tue, Jul 12, 2011 at 12:45 PM, Hyrum K Wright <hy...@hyrumwright.org> wrote:
> On Tue, Jul 12, 2011 at 11:25 AM, Mark Phippard <ma...@gmail.com> wrote:
>>
>> Finally, in your new design do not forget about things like log -g and
>> blame -g, as well as the mergeinfo command.  These features are all
>> necessary parts of a merge tracking plan and must have answers from
>> the first release.
>
> Really?  I think we should take whatever improvements we can get,
> rather than saying "oh, and you need to support all this legacy
> baggage as well."  While they are useful to some folks, I don't think
> they are can't-live-without-absolutely-must-have features.  I'm mean,
> if we're thinking outside the box, let's think Outside the BOX, and
> try not to pigeon hole ourselves.
>
> It would be interesting to see the usage of 'log -g' and 'blame -g'.
> I believe Tortoise uses them as the default under-the-hood, so that
> probably inflates the actual usage numbers quite a bit.

A new merge tracking design that does not support these features, or
at least have a very definite plan for supporting then, would be dead
on arrival.  If the design does not support these options then go back
to the drawing board.

These are absolute must have features.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

Re: It's time to fix Subversion Merge

Posted by Hyrum K Wright <hy...@hyrumwright.org>.

On Tue, Jul 12, 2011 at 11:25 AM, Mark Phippard <ma...@gmail.com> wrote:
>
> Finally, in your new design do not forget about things like log -g and
> blame -g, as well as the mergeinfo command.  These features are all
> necessary parts of a merge tracking plan and must have answers from
> the first release.

Really?  I think we should take whatever improvements we can get,
rather than saying "oh, and you need to support all this legacy
baggage as well."  While they are useful to some folks, I don't think
they are can't-live-without-absolutely-must-have features.  I'm mean,
if we're thinking outside the box, let's think Outside the BOX, and
try not to pigeon hole ourselves.

It would be interesting to see the usage of 'log -g' and 'blame -g'.
I believe Tortoise uses them as the default under-the-hood, so that
probably inflates the actual usage numbers quite a bit.

-Hyrum

Re: It's time to fix Subversion Merge

Posted by Mark Phippard <ma...@gmail.com>.

On Tue, Jul 12, 2011 at 12:16 PM, Andy Singleton <an...@assembla.com> wrote:

> Mark, I agree with you that the existing merge will work better if we apply
> some restrictions.  I can see that the project is already going that way,
> and maybe it is good to continue in that direction.  As a user, I would find
> that helpful.
>
> However, we cannot get good results with the subtree merginfo format.  It is
> a failed architecture.  A lot of smart people have worked on it, and they
> have not achieved good results.  It is the source of many annoying problems.
>  Even on this list, nobody defends it.  They only express a desire for
> "compatibility."  You yourself have argued that basically, it can't be done,
> that cyclic merge won't work.  Yes, if you stick with the subtree merginfo,
> IT CAN'T BE DONE.  You are guaranteed to be right.  However, if we free
> ourselves from this one restriction, we can do something good.  I invite
> everyone to stop beating their heads against this wall.
>
> In my proposal, you can basically keep everything about Subversion the same:
> the servers, most of the clients, the old merge command if you choose to use
> it.  It's all compatible.  I am only asking you to give up ONE THING:
> subtree merginfo.  To succeed, we have to get rid of both parts of it.  We
> have to get rid of the subtree info, and we have to get rid of the fussy
> little merginfo format.  If newmerge is successful and we want people to
> move to it, we can force the conversion by asking "svn" to read the old
> merginfo and write some information into the new format.

I really do not know if I agree with this, and I think you might be
being a bit naive if you think that simply by creating a new format
you are going to not have problems and you are going to even create an
improvement.  That said, I think some of the ideas around storing
patches is at least interesting.

I actually do not think mergeinfo is our problem or even our limiter.
I agree it would have been easier to implement if we did not support
all the odd cases we do, but at the same time, I think all of those
odd cases are working pretty well now.  So at best, removing support
for that stuff might just make future work a bit easier.

The problem in merge is in core SVN.  Merge tracking was always based
on the premise that merge in SVN worked fine, but it was annoying to
have to manually track merges and specify the revisions to merge.  So
merge tracking tracks that and allows the simple merge syntax you are
advocating.  In that sense, merge tracking accomplished its goal and
as of 1.7 the last of the warts in the implementation should be
resolved.

The problem is that SVN merge does not really work fine.  The design
of SVN makes cyclic merges really difficult and perhaps impossible.
The lack of support for moves and renames makes merge not useful for
people that refactor a lot etc.  None of these are problems with merge
tracking.

You seem to want to extend merge tracking to provide new ways to solve
these problems.  Great, design it and propose it.  I do not
necessarily see why that info cannot be stores in properties, or even
the existing property.  Nor do I see why we cannot migrate the info we
have to a new format.  We need real design to have something to
discuss.

Finally, in your new design do not forget about things like log -g and
blame -g, as well as the mergeinfo command.  These features are all
necessary parts of a merge tracking plan and must have answers from
the first release.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

Re: It's time to fix Subversion Merge

Posted by Paul Burba <pt...@gmail.com>.

On Tue, Jul 12, 2011 at 1:12 PM, Julian Foad <ju...@wandisco.com> wrote:
> I see three main thrusts behind the whole proposal, that are each much
> more significant than any of the specific concrete ideas.  The first is
> that it's time to try some experiments in merge tracking.

+1

> The second is
> that restricting merges (primarily to the scope of "a whole branch") is
> key to reducing complexity and thus both design/implementation error and
> user error.

Once again I agree on the complexity, but there seems to be a general
understanding that subtree merges/mergeinfo are catastrophically
broken; all I'd like is some specifics before we use that as one of
the primary reasons for a complete merge revamp.

Paul

> The third is that we need to make it all much easier to
> *develop* before we can make much progress.
>
> What we have been doing so far is improving the merge incrementally
> while trying to keep it at all times backward compatible and almost
> completely flexible.  We have quite a good result now, it is extremely
> useful in the real world, and we have learned a lot from it.  But it has
> reached the point where it isn't inviting any new development because
> the bar to entry is much too high.  By starting work on something that
> we call "new", although it may start off as a copy of the old, we avoid
> the requirement to keep it working, and so make it inviting and feasible
> to work on.  Of course it wouldn't become an official replacement for
> the old one until it became mature and stable and had some kind of
> upgrade path, but it would be available as an unofficial alternative for
> those who want to try it.
>
> I think all the suggestions about data format, scripts, eliminating big
> chunks of code complexity, and so on not hard requirements but simply
> ideas for how we developers can make it easier for ourselves to
> understand what's left and try out improvements.  And to be able to do
> that without holding up a 1.8 release until it's "finished".  That's the
> attraction of using either a branch or a parallel subcommand.
>
> After reaching this state from which we can go forward, that's when I
> see the "foreign merges" proposal could be tackled as a possible first
> extension.  Alternatively at that time we might tackle some kind of
> rename handling or some other improvement.
>
> - Julian
>
>
>

Re: It's time to fix Subversion Merge

Posted by Julian Foad <ju...@wandisco.com>.

I see three main thrusts behind the whole proposal, that are each much
more significant than any of the specific concrete ideas.  The first is
that it's time to try some experiments in merge tracking.  The second is
that restricting merges (primarily to the scope of "a whole branch") is
key to reducing complexity and thus both design/implementation error and
user error.  The third is that we need to make it all much easier to
*develop* before we can make much progress.

What we have been doing so far is improving the merge incrementally
while trying to keep it at all times backward compatible and almost
completely flexible.  We have quite a good result now, it is extremely
useful in the real world, and we have learned a lot from it.  But it has
reached the point where it isn't inviting any new development because
the bar to entry is much too high.  By starting work on something that
we call "new", although it may start off as a copy of the old, we avoid
the requirement to keep it working, and so make it inviting and feasible
to work on.  Of course it wouldn't become an official replacement for
the old one until it became mature and stable and had some kind of
upgrade path, but it would be available as an unofficial alternative for
those who want to try it.

I think all the suggestions about data format, scripts, eliminating big
chunks of code complexity, and so on not hard requirements but simply
ideas for how we developers can make it easier for ourselves to
understand what's left and try out improvements.  And to be able to do
that without holding up a 1.8 release until it's "finished".  That's the
attraction of using either a branch or a parallel subcommand.

After reaching this state from which we can go forward, that's when I
see the "foreign merges" proposal could be tackled as a possible first
extension.  Alternatively at that time we might tackle some kind of
rename handling or some other improvement.

- Julian

Re: It's time to fix Subversion Merge

Posted by Geoff Rowell <ge...@gmail.com>.

On Jul 12, 2011, at 12:34 PM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> Why can't you --- as Paul already said --- just enforce a policy "Don't
> do subtree merges"?
> 
>> If newmerge is successful and we want people to
>> move to it, we can force the conversion by asking "svn" to read the
>> old merginfo and write some information into the new format.
> 
> And rewriting of the mergeinfo throughout the repository's history?

I implemented this as a pre-commit hook script, then manually cleaned up the mergeinfo. That solved it for my users.

-Geoff

Re: It's time to fix Subversion Merge

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Andy Singleton wrote on Tue, Jul 12, 2011 at 12:16:26 -0400:
>  I am only asking you to give up ONE THING: subtree merginfo.  To
>  succeed, we have to get rid of both parts of it.  We have to get rid
>  of the subtree info, and we have to get rid of the fussy little
>  merginfo format.

Why is the /format/ relevant?

Why can't you --- as Paul already said --- just enforce a policy "Don't
do subtree merges"?

>  If newmerge is successful and we want people to
>  move to it, we can force the conversion by asking "svn" to read the
>  old merginfo and write some information into the new format.

And rewriting of the mergeinfo throughout the repository's history?

Re: It's time to fix Subversion Merge

Posted by Andy Singleton <an...@assembla.com>.

  On 7/12/2011 11:54 AM, Mark Phippard wrote:
> On Tue, Jul 12, 2011 at 11:29 AM, Andy Singleton<an...@assembla.com>  wrote:
>
>> I don't think that we will need to force anyone to give up the old merge.
>>   If and when the newmerge is better, they will migrate on their own.  I
>> think merge is an important concern for many users and they will migrate
>> quickly if they can get a better result.
> Setting aside for a minute the global ID idea which I realize is
> probably important to you to support your proprietary repository fork
> solution, the only real change in your proposal is to limit what
> someone can do in merge.  IOW, do not allow subtree merges, switched
> merges etc.  The other stuff you proposed can already be done and is
> already widely used today.
>
> I assume you would agree that you cannot have a situation where there
> is a "newmerge" command that one person uses and someone else uses
> "oldmerge" to do a subtree merge with the same tree.  So you have to
> come up with some way for a given project to decide they want to use
> "newmerge" and disallow "oldmerge".
>
> All I am saying is that if you figure out a way to impose these
> restrictions we do not need a new command or subcommand.   We can
> simply make the existing merge honor those restrictions and you have
> accomplished the goal of the simplified syntax.  This is something we
> have already discussed on this list in the past in the context of
> adding an svn branch command to mark the root of a project tree.
>
> To me, this is probably step 1 (and it is a fairly big but important
> first step).  Once we have this, we could start examining the
> mergeinfo syntax and where/how it is best stored and how it could
> possibly be supplemented to support reflective merges or renames or
> ...
>
Mark, I agree with you that the existing merge will work better if we 
apply some restrictions.  I can see that the project is already going 
that way, and maybe it is good to continue in that direction.  As a 
user, I would find that helpful.

However, we cannot get good results with the subtree merginfo format.  
It is a failed architecture.  A lot of smart people have worked on it, 
and they have not achieved good results.  It is the source of many 
annoying problems.  Even on this list, nobody defends it.  They only 
express a desire for "compatibility."  You yourself have argued that 
basically, it can't be done, that cyclic merge won't work.  Yes, if you 
stick with the subtree merginfo, IT CAN'T BE DONE.  You are guaranteed 
to be right.  However, if we free ourselves from this one restriction, 
we can do something good.  I invite everyone to stop beating their heads 
against this wall.

In my proposal, you can basically keep everything about Subversion the 
same: the servers, most of the clients, the old merge command if you 
choose to use it.  It's all compatible.  I am only asking you to give up 
ONE THING: subtree merginfo.  To succeed, we have to get rid of both 
parts of it.  We have to get rid of the subtree info, and we have to get 
rid of the fussy little merginfo format.  If newmerge is successful and 
we want people to move to it, we can force the conversion by asking 
"svn" to read the old merginfo and write some information into the new 
format.

Re: It's time to fix Subversion Merge

Posted by Mark Phippard <ma...@gmail.com>.

On Tue, Jul 12, 2011 at 11:29 AM, Andy Singleton <an...@assembla.com> wrote:

> I don't think that we will need to force anyone to give up the old merge.
>  If and when the newmerge is better, they will migrate on their own.  I
> think merge is an important concern for many users and they will migrate
> quickly if they can get a better result.

Setting aside for a minute the global ID idea which I realize is
probably important to you to support your proprietary repository fork
solution, the only real change in your proposal is to limit what
someone can do in merge.  IOW, do not allow subtree merges, switched
merges etc.  The other stuff you proposed can already be done and is
already widely used today.

I assume you would agree that you cannot have a situation where there
is a "newmerge" command that one person uses and someone else uses
"oldmerge" to do a subtree merge with the same tree.  So you have to
come up with some way for a given project to decide they want to use
"newmerge" and disallow "oldmerge".

All I am saying is that if you figure out a way to impose these
restrictions we do not need a new command or subcommand.   We can
simply make the existing merge honor those restrictions and you have
accomplished the goal of the simplified syntax.  This is something we
have already discussed on this list in the past in the context of
adding an svn branch command to mark the root of a project tree.

To me, this is probably step 1 (and it is a fairly big but important
first step).  Once we have this, we could start examining the
mergeinfo syntax and where/how it is best stored and how it could
possibly be supplemented to support reflective merges or renames or
...

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

Re: It's time to fix Subversion Merge

Posted by Andy Singleton <an...@assembla.com>.

  My original idea was to make a new executable file called "newmerge".  
It would be an external script, and if you want to use it, you just need 
that extra script.  However, I was planning on building it from the C 
code that is in "svn merge" right now, rather than python or perl.  
Using the existing code will put us ahead on patch libraries and 
protocols.  It will also give us access to the experts that have worked 
on "svn merge" in the past.  And, it will be easier to integrate into 
GUI clients.  So, the code would start as a cut-down version of "svn 
merge".  Later it could be added back as "svn newmerge".

If you want to keep it as a mergeable branch (clearly relevant), then 
maybe it's better just to add on as "svn newmerge" from the beginning.  
If that approach is recommended, then maybe someone can help by adding 
the stub for this command to "svn".

I don't think that we will need to force anyone to give up the old 
merge.  If and when the newmerge is better, they will migrate on their 
own.  I think merge is an important concern for many users and they will 
migrate quickly if they can get a better result.

On 7/12/2011 11:00 AM, Mark Phippard wrote:
> On Tue, Jul 12, 2011 at 10:52 AM, Julian Foad<ju...@wandisco.com>  wrote:
>
>> A script has the advantage that it could be tried and even rolled out by
>> people who are still using 1.6.x.
>>
>> None of that is a reason not to start a branch.
> +1 on the branch.  But wouldn't it make sense to defer creating the
> branch for as long as possible?  If Andy's team wants to explore
> scripts first why create a branch that is just going to immediately
> start getting stale?  Maybe we should start with patches to trunk and
> make the branch when there is something to not include on trunk?
>
> FWIW, I would also like to see more dev@ discussion and less private
> discussion.  Before we start working on a new command seems we ought
> to discuss the ideas more.  As I mentioned a new command is going to
> need a way to force users to use the new command or else all the same
> issues have to be addressed.  If we can force users to do something,
> then I am not sure we need a new command.  We can just have a way to
> not allow users to use the features of existing merge that we do not
> want them to use.  The existing merge command already supports the
> proposed simple syntax.
>

-- 
Andy Singleton
Founder/CEO, Assembla Online: http://www.assembla.com
Phone: 781-328-2241
Skype: andysingleton

Re: It's time to fix Subversion Merge

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.

Mark Phippard wrote on Tue, Jul 12, 2011 at 11:00:10 -0400:
> We can just have a way to not allow users to use the features of
> existing merge that we do not want them to use.  The existing merge
> command already supports the proposed simple syntax.

+1

Re: It's time to fix Subversion Merge

Posted by Mark Phippard <ma...@gmail.com>.

On Tue, Jul 12, 2011 at 10:52 AM, Julian Foad <ju...@wandisco.com> wrote:

> A script has the advantage that it could be tried and even rolled out by
> people who are still using 1.6.x.
>
> None of that is a reason not to start a branch.

+1 on the branch.  But wouldn't it make sense to defer creating the
branch for as long as possible?  If Andy's team wants to explore
scripts first why create a branch that is just going to immediately
start getting stale?  Maybe we should start with patches to trunk and
make the branch when there is something to not include on trunk?

FWIW, I would also like to see more dev@ discussion and less private
discussion.  Before we start working on a new command seems we ought
to discuss the ideas more.  As I mentioned a new command is going to
need a way to force users to use the new command or else all the same
issues have to be addressed.  If we can force users to do something,
then I am not sure we need a new command.  We can just have a way to
not allow users to use the features of existing merge that we do not
want them to use.  The existing merge command already supports the
proposed simple syntax.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

Re: It's time to fix Subversion Merge

Posted by Julian Foad <ju...@wandisco.com>.

C. Michael Pilato wrote:
> On 07/12/2011 09:40 AM, Hyrum K Wright wrote:
> > We should probably consider having Andrew and his group do their work
> > on a branch in our repository.
> 
> +1

+1.  I can create a branch, called ... well, the basic nature of this
proposal is simplifying the scope of what merging is allowed/supported,
with the aim of simplifying the lives of users who currently sometimes
mess up, yes?  So the branch could be called "simpler-merge".

Note that there are other possible ways of working, which can be tried
as well as or instead of the branch.  An external script is another good
option, like the way 'svnmerge.py' existed before the current built-in
merge tracking.  I was talking to Andy the other day and I got the
strong impression that his focus (and the reason he mentioned an
"extensible" data format and some implementation details such as using a
file rather than a property) was specifically to encourage
experimentation and extension and participation, including by people who
might not normally work in the core C code.

A script has the advantage that it could be tried and even rolled out by
people who are still using 1.6.x.

None of that is a reason not to start a branch.

- Julian

Re: It's time to fix Subversion Merge

Posted by "C. Michael Pilato" <cm...@collab.net>.

On 07/12/2011 09:40 AM, Hyrum K Wright wrote:
> We should probably consider having Andrew and his group do their work
> on a branch in our repository.

+1

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand

Re: It's time to fix Subversion Merge

Posted by Hyrum K Wright <hy...@hyrumwright.org>.

On Mon, Jul 11, 2011 at 11:51 AM, C. Michael Pilato <cm...@collab.net> wrote:
> On 07/11/2011 11:46 AM, Andy Singleton wrote:
>>
>>  Many developers are moving from Subversion to other SCM systems that have
>> better merge capabilities. I have posted an article with a proposal to fix
>> this problem, here:
>>
>> http://blog.assembla.com/assemblablog/tabid/12618/bid/58122/It-s-Time-to-Fix-Subversion-Merge.aspx
>
> [...]
>
>> I think that we can build a newmerge prototype by stripping down the
>> existing merge to remove the subtree options, and moving to the extensible
>> merginfo format. It will be useful to get advice about this from experienced
>> team members.
>
> Your optimism is lovely (and welcome, even!), but I am not as convinced as
> you that the reason why Subversion's merge functionality is subpar is as
> superficial as the items you call out (and which are implied by your
> prototyping plan above).
>
> Very little (if anything) about your proposal touches on the *real*
> problems, such as Subversion's handling of moved/renamed objects, tree
> conflict detection/handling/resolution, changeset conflation caused by the
> fundamental diff+patch approach Subversion takes to merges rather than
> first-class changeset support), etc.  These real problems with merging were
> documented many years before the merge tracking feature was ever conceived,
> and neither that feature nor its skin-deep-only warts you aim to address
> made a dent in solving those very real problems.
>
> I don't aim to discourage -- far from it!  On the contrary, I want to
> encourage a deeper review of the situation.  It's entirely possible that, in
> doing so, you will find solutions for the deeper core problems here, and
> obviously the Subversion community (devs and users alike) would love that!
>
> -- C-Mike
>
> [1] I'll grant that in your blog post, you at least acknowledge the tree
> changes problem and place great stock in your extensible merge tracking
> format toward some future solution.

Improving merge correctness, performance and ease is one of WANdisco's
priorities over the 1.8 release cycle.  I don't know that anybody
knows what the ideal solution is here, but Andrew's ideas are at least
worth exploring.

We should probably consider having Andrew and his group do their work
on a branch in our repository.

[ None of this in any way disparages the work done by folks on the
current merge and merge tracking implementations.  I know that Paul in
particular has done yeoman's work in maintaining that dark corner of
our code, and I have great respect for that.  I hope we can improve on
those efforts, not throw them away. ]

-Hyrum

Re: It's time to fix Subversion Merge

Posted by "C. Michael Pilato" <cm...@collab.net>.

On 07/11/2011 11:46 AM, Andy Singleton wrote:
> 
>  Many developers are moving from Subversion to other SCM systems that have
> better merge capabilities. I have posted an article with a proposal to fix
> this problem, here:
> 
> http://blog.assembla.com/assemblablog/tabid/12618/bid/58122/It-s-Time-to-Fix-Subversion-Merge.aspx

[...]

> I think that we can build a newmerge prototype by stripping down the
> existing merge to remove the subtree options, and moving to the extensible
> merginfo format. It will be useful to get advice about this from experienced
> team members.

Your optimism is lovely (and welcome, even!), but I am not as convinced as
you that the reason why Subversion's merge functionality is subpar is as
superficial as the items you call out (and which are implied by your
prototyping plan above).

Very little (if anything) about your proposal touches on the *real*
problems, such as Subversion's handling of moved/renamed objects, tree
conflict detection/handling/resolution, changeset conflation caused by the
fundamental diff+patch approach Subversion takes to merges rather than
first-class changeset support), etc.  These real problems with merging were
documented many years before the merge tracking feature was ever conceived,
and neither that feature nor its skin-deep-only warts you aim to address
made a dent in solving those very real problems.

I don't aim to discourage -- far from it!  On the contrary, I want to
encourage a deeper review of the situation.  It's entirely possible that, in
doing so, you will find solutions for the deeper core problems here, and
obviously the Subversion community (devs and users alike) would love that!

-- C-Mike

[1] I'll grant that in your blog post, you at least acknowledge the tree
changes problem and place great stock in your extensible merge tracking
format toward some future solution.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand