You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Paul Burba <pt...@gmail.com> on 2009/07/09 21:17:34 UTC

[RFC] Greatly improve merge performance, but break(?) edge cases

A thinly veiled plea for some eyes on issue #3443...

Currently merge performance is quite poor when merging to a target
with lots (think 100's+) of subtrees with explicit mergeinfo. One of
the main reasons for this is because the implicit mergeinfo of each
subtree is obtained from the server. This is described in detail in
issue #3443 http://subversion.tigris.org/issues/show_bug.cgi?id=3443
and http://svn.collab.net/repos/svn/branches/subtree-mergeinfo/notes/subtree-mergeinfo/the-performance-problem.txt
but the essential bit is this:

When performing a merge tracking aware merge to a target with subtrees
with explicit mergeinfo, the merge logic looks at the explicit
mergeinfo on the merge target and each subtree:

* For forward merges, any revisions from the merge source
that are already represented are *not* merged.

* For reverse merges, only revisions from the merge source
which are represented are reverse merged.

The logic also considers the implicit mergeinfo (a.k.a. natural history) of each
target and subtree in a similar way. The problem is, for each subtree this
means a call to svn_client__get_history_as_mergeinfo() and the expense of a
network round trip. The slower the network and/or the more subtrees with
mergeinfo, the slower the merge becomes. If the merge target has hundreds or
thousands of subtrees with explicit mergeinfo then even simple merges can become
excruciatingly slow.

In issue #3443 I've posted a patch in which the subtrees inherit the
implicit mergeinfo of the target root the same way a subtree without
explicit mergeinfo would inherit explicit mergeinfo from a parent.
This patch passes all of our tests and can result in dramatic merge
performance improvements - again see issue #3443 for an example.

The only problem is that this patch changes the behavior of merge when
a subtree with explicit mergeinfo has different history, e.g. a
subtree is deleted and replaced with a copy from a different branch or
from the same branch but at a different point in it's history. In
these cases revisions might be included or exlcuded for merging
differently than if we ask the repos for the actual implicit mergeinfo
(what we do today).

I'm left with two basic questions:

1) Are the cases where subtrees in a merge target have different
implicit mergeinfo than the rest of the target common use cases or
highly contrived edge cases?. I think the latter is true, certainly
the vanilla release and feature branch models aren't affected, but am
I missing something obvious? I've spent so much time close to this
that I might be missing the forest for the trees.

2) Assuming subtrees with differing implicit mergeinfo are edge cases,
is there any reason *not* to make the changes suggested by the patch
in issue #3443?

Any thoughts are appreciated,

Paul

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2369501

Re: Report mergeinfo-only changes less verbosely [was: [RFC] Greatly improve merge performance...]

Posted by "Hyrum K. Wright" <hy...@hyrumwright.org>.

>  -------Original Message-------
>  From: Julian Foad <ju...@btopenworld.com>
>  Subject: Report mergeinfo-only changes less verbosely [was: [RFC] *Greatly* improve merge performance...]
>  Sent: 10 Jul '09 07:31
>  
>  Geoff Rowell wrote:
>  > Aside from performance, I wouldn't have such a problem with distributed
>  > mergeinfo if it didn't tend to alarm my users. They try to merge a few
>  > files and Subversion reports a massive number of changes.
>  >
>  > A serious review of where and when mergeinfo property changes are
>  > reported is needed.
>  
>  We talked once about hiding mergeinfo-only changes from the user, and
>  that was rightly rejected. Did we talk about "sidelining" or
>  "summarising" their display?
>  
>  It seems clear now, with hindsight and lots of mergeinfo, that UIs, both
>  our "svn" CLI and GUIs like TortoiseSvn, should help the user to see the
>  significant changes without the clutter of mergeinfo-only changes.
>  
>  "svn status" could by default summarize all the mergeinfo-only changes
>  in one line at the end:
>  
>  [[[
>  $ svn status
>  M    myfile
>  M    dir/myfile2
>  43 items with changes to mergeinfo are also present but not shown.
>  ]]]
>  
>  "svn commit" when generating a log message could list them separately:
>  
>  [[[
>  
>  --This line, and those below, will be ignored--
>  
>  M    myfile
>  MM   dir/myfile2
>  
>  The following items have only mergeinfo changes:
>  M   .
>  M   file1
>  M   dir
>  M   dir/file4
>  M   dir/file5
>  M   dir/file6
>  ...
>  M   dir/file43
>  ]]]
>  
>  This would require modifications to the "status" and "commit"
>  notification methods, of course. And there are command-line
>  compatibility questions, but it's not a big deal to decide on a solution
>  to those.

I started hacking some of these ideas several months ago on the ignore-mergeinfo branch.  I did some client-side filtering, but stopped short of extending this to the repository APIs.  It could be done, though.  Feel free to use the branch if you want to start working on this idea.

>  Worth doing?
>  
>  Where are the other places that should have this separation/summarising?
>  
>  - Julian
>  
>  ------------------------------------------------------
>  http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2369736
>

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2371537

Re: confidential mails -- was: Re: Report mergeinfo-only changes less verbosely [was: [RFC] Greatlyimprove merge performance...]

Posted by Erik Huelsmann <eh...@gmail.com>.

On Tue, Jul 21, 2009 at 10:17 PM, Stefan Sperling<st...@elego.de> wrote:
> On Tue, Jul 21, 2009 at 03:55:19PM +0200, Neels Janosch Hofmeyr wrote:
>> > This email and any files transmitted with it are confidential and intended
>> > solely for the use of the individual or entity to whom they are addressed.
>> > If you have received this email in error please notify the sender.
>>
>> Consider yourself notified.
>>
>> I've seen this before and the answer was "yes, sorry, it's company policy,
>> can't get rid of it." It bugs me nevertheless.
>
> See also http://producingoss.com/en/communications.html#face
>
>> It seems to me that, with a mail footer like this, your company policy is
>> incompatible with contributing to dev@ discussions, which should be
>> addressed. A private mail account could be used, for example.
>
> Some companies lock their people in so much that they can't even
> get at other mail at work...

In which case they're probably not meant to communicate outside the
company. I suggest they do not do so, or revert contributions to this
group to their evenings where they hopefully do have the rights to log
in to hotmail, yahoo or gmail.


Bye,

Erik.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2374399

Re: confidential mails -- was: Re: Report mergeinfo-only changes less verbosely [was: [RFC] Greatlyimprove merge performance...]

Posted by Stefan Sperling <st...@elego.de>.

On Tue, Jul 21, 2009 at 03:55:19PM +0200, Neels Janosch Hofmeyr wrote:
> > This email and any files transmitted with it are confidential and intended
> > solely for the use of the individual or entity to whom they are addressed.
> > If you have received this email in error please notify the sender.
>
> Consider yourself notified.
> 
> I've seen this before and the answer was "yes, sorry, it's company policy,
> can't get rid of it." It bugs me nevertheless.

See also http://producingoss.com/en/communications.html#face
 
> It seems to me that, with a mail footer like this, your company policy is
> incompatible with contributing to dev@ discussions, which should be
> addressed. A private mail account could be used, for example.

Some companies lock their people in so much that they can't even
get at other mail at work...

Stefan

confidential mails -- was: Re: Report mergeinfo-only changes less verbosely [was: [RFC] Greatlyimprove merge performance...]

Posted by Neels Janosch Hofmeyr <ne...@elego.de>.

Sorry if I'm interrupting "important stuff", but sending this to an
open-source mailing list simply makes absolutely no sense at all:

Geoff Rowell wrote:
> This email and any files transmitted with it are confidential and intended
> solely for the use of the individual or entity to whom they are addressed.
> If you have received this email in error please notify the sender.
> 
> ------------------------------------------------------
> http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2369778

Consider yourself notified.

I've seen this before and the answer was "yes, sorry, it's company policy,
can't get rid of it." It bugs me nevertheless.

It seems to me that, with a mail footer like this, your company policy is
incompatible with contributing to dev@ discussions, which should be
addressed. A private mail account could be used, for example.

Couldn't this actually be causing a legal nightmare at some point?

~Neels

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2372981

Re: Report mergeinfo-only changes less verbosely [was: [RFC] Greatlyimprove merge performance...]

Posted by David Glasser <gl...@davidglasser.net>.

On Fri, Jul 10, 2009 at 8:09 AM, Geoff Rowell<ge...@varolii.com> wrote:
> Julian Foad wrote:
>> Geoff Rowell wrote:
>> > Aside from performance, I wouldn't have such a problem with
> distributed
>> > mergeinfo if it didn't tend to alarm my users. They try to merge a
> few
>> > files and Subversion reports a massive number of changes.
>> >
>> > A serious review of where and when mergeinfo property changes are
>> > reported is needed.
>>
>> We talked once about hiding mergeinfo-only changes from the user, and
>> that was rightly rejected. Did we talk about "sidelining" or
>> "summarising" their display?
>>
>> It seems clear now, with hindsight and lots of mergeinfo, that UIs,
> both
>> our "svn" CLI and GUIs like TortoiseSvn, should help the user to see
> the
>> significant changes without the clutter of mergeinfo-only changes.
>>
>> "svn status" could by default summarize all the mergeinfo-only changes
>> in one line at the end:
>>
>> [[[
>> $ svn status
>> M    myfile
>> M    dir/myfile2
>> 43 items with changes to mergeinfo are also present but not shown.
>> ]]]
>>
>> "svn commit" when generating a log message could list them separately:
>>
>> [[[
>>
>> --This line, and those below, will be ignored--
>>
>> M    myfile
>> MM   dir/myfile2
>>
>> The following items have only mergeinfo changes:
>>  M   .
>>  M   file1
>>  M   dir
>>  M   dir/file4
>>  M   dir/file5
>>  M   dir/file6
>> ...
>>  M   dir/file43
>> ]]]
>>
>> This would require modifications to the "status" and "commit"
>> notification methods, of course. And there are command-line
>> compatibility questions, but it's not a big deal to decide on a
> solution
>> to those.
>>
>> Worth doing?
>>
>> Where are the other places that should have this
> separation/summarising?
>>
>
> I'd also suggest modifying the output of "svn log -v".

That was my first thought as well.  That does ~require a repository
format change, though.

--dave

-- 
glasser@davidglasser.net | langtonlabs.org | flickr.com/photos/glasser/

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2369944

RE: Report mergeinfo-only changes less verbosely [was: [RFC] Greatlyimprove merge performance...]

Posted by Geoff Rowell <ge...@varolii.com>.

Julian Foad wrote:
> Geoff Rowell wrote:
> > Aside from performance, I wouldn't have such a problem with
distributed
> > mergeinfo if it didn't tend to alarm my users. They try to merge a
few
> > files and Subversion reports a massive number of changes.
> > 
> > A serious review of where and when mergeinfo property changes are
> > reported is needed.
> 
> We talked once about hiding mergeinfo-only changes from the user, and
> that was rightly rejected. Did we talk about "sidelining" or
> "summarising" their display?
> 
> It seems clear now, with hindsight and lots of mergeinfo, that UIs,
both
> our "svn" CLI and GUIs like TortoiseSvn, should help the user to see
the
> significant changes without the clutter of mergeinfo-only changes.
> 
> "svn status" could by default summarize all the mergeinfo-only changes
> in one line at the end:
> 
> [[[
> $ svn status
> M    myfile
> M    dir/myfile2
> 43 items with changes to mergeinfo are also present but not shown.
> ]]]
> 
> "svn commit" when generating a log message could list them separately:
> 
> [[[
> 
> --This line, and those below, will be ignored--
> 
> M    myfile
> MM   dir/myfile2
> 
> The following items have only mergeinfo changes:
>  M   .
>  M   file1
>  M   dir
>  M   dir/file4
>  M   dir/file5
>  M   dir/file6
> ...
>  M   dir/file43
> ]]]
> 
> This would require modifications to the "status" and "commit"
> notification methods, of course. And there are command-line
> compatibility questions, but it's not a big deal to decide on a
solution
> to those.
> 
> Worth doing?
> 
> Where are the other places that should have this
separation/summarising?
>

I'd also suggest modifying the output of "svn log -v".

-Geoff


This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2369778

Report mergeinfo-only changes less verbosely [was: [RFC] Greatly improve merge performance...]

Posted by Julian Foad <ju...@btopenworld.com>.

Geoff Rowell wrote:
> Aside from performance, I wouldn't have such a problem with distributed
> mergeinfo if it didn't tend to alarm my users. They try to merge a few
> files and Subversion reports a massive number of changes.
> 
> A serious review of where and when mergeinfo property changes are
> reported is needed.

We talked once about hiding mergeinfo-only changes from the user, and
that was rightly rejected. Did we talk about "sidelining" or
"summarising" their display?

It seems clear now, with hindsight and lots of mergeinfo, that UIs, both
our "svn" CLI and GUIs like TortoiseSvn, should help the user to see the
significant changes without the clutter of mergeinfo-only changes.

"svn status" could by default summarize all the mergeinfo-only changes
in one line at the end:

[[[
$ svn status
M    myfile
M    dir/myfile2
43 items with changes to mergeinfo are also present but not shown.
]]]

"svn commit" when generating a log message could list them separately:

[[[

--This line, and those below, will be ignored--

M    myfile
MM   dir/myfile2

The following items have only mergeinfo changes:
 M   .
 M   file1
 M   dir
 M   dir/file4
 M   dir/file5
 M   dir/file6
...
 M   dir/file43
]]]

This would require modifications to the "status" and "commit"
notification methods, of course. And there are command-line
compatibility questions, but it's not a big deal to decide on a solution
to those.

Worth doing?

Where are the other places that should have this separation/summarising?

- Julian

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2369736

Re: [RFC] Greatly improve merge performance, but break(?) edge cases

Posted by Julian Foad <ju...@btopenworld.com>.

I (Julian Foad) wrote:
[...]
> I think therefore it's right to do this. If it should cause any problem,
> it will almost certainly be a problem that already existed in some
> corner cases (where there's differing implicit mergeinfo but no explicit
> mergeinfo), and we would want to know about such a problem anyway, and
> this will make it easier to find and diagnose.

Put simply:
Consistency is a necessary step on the road to correctness.

- Julian

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2370930

Re: [RFC] Greatly improve merge performance, but break(?) edge cases

Posted by Paul Burba <pt...@gmail.com>.

On Mon, Jul 13, 2009 at 5:25 PM, Paul Burba<pt...@gmail.com> wrote:

> I'll commit this change later tonight unless I hear any objections.

Done r38417.

> Paul
>

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2371141

Re: [RFC] Greatly improve merge performance, but break(?) edge cases

Posted by Paul Burba <pt...@gmail.com>.

On Mon, Jul 13, 2009 at 9:42 AM, Mark Phippard<ma...@gmail.com> wrote:

> On Fri, Jul 10, 2009 at 12:01 PM, Paul Burba<pt...@gmail.com> wrote:
>
>> TODAY - If a subtree has no explicit mergeinfo and differing implicit
>> mergeinfo, the difference is ignored.  If a subtree has explicit
>> mergeinfo and differing implicit mergeinfo, the difference is
>> accounted for.  This means that merging into a target with
>> semantically equivalent subtree mergeinfo can produce different
>> results.
>>
>> MY PATCH - If a subtree has differing implicit mergeinfo this
>> difference is always ignored, regardless of whether the subtree has
>> explicit mergeinfo or not.
>>
>> So the real question becomes: Should we consider differences in a
>> subtree's implicit mergeinfo when calculating what needs to be merged?
>
> It almost sounds like it would be better to apply your patch so that
> merge is consistent.  If some real world use cases start to show up
> where the implicit mergeinfo of subtrees needs to be considered, then
> it would make sense to tackle this for all merge scenarios.  It sounds
> like today we have the worse possible solution:
>
> 1) If the implicit mergeinfo really needs to be calculated, then in
> many merges it isn't, so users are probably not consistently
> experiencing or noticing problems, which means it will also be more
> difficult for us to recognize that there is a problem.  We do not know
> if users are experiencing this problem.

Exactly!  I guess I finally explained this issue correctly :-P

> 2) If the implicit mergeinfo typically does not need to be calculated,
> then users are experiencing a huge performance cost for no real
> reason.  We know for certain that users are experiencing this problem.
>
> Given how bad we know that #2 is for people that experience it, the
> idea of expanding this logic to cover all merge scenarios does not
> seem like the right solution to put in place.  So I'd be in favor of
> applying your patch so that there is consistency and we can learn if
> #1 is really an issue or not.

On Mon, Jul 13, 2009 at 10:33 AM, Julian Foad<ju...@btopenworld.com> wrote:

> Yes, this sounds better than keeping the present (inconsistent and slow)
> behaviour. (My earlier reply was based on misunderstood implications.)
>
> The only thing to beware of is if the real-world use cases that create
> differing implicit mergeinfo also happen to be the cases that create
> subtree mergeinfo. If that were so, then the (inconsistent) algorithm
> would have actually been producing consistently correct results in those
> cases, whereas changing it might produce wrong results. The cases that
> generate differing implicit mergeinfo are copies, I think. Copies used
> to create sub-tree mergeinfo, but not any more. That means... I don't
> know: that's as far as my brain goes today.

Copies haven't created mergeinfo since 1.5.5 (12/22/08), so it has
probably been long enough that if users were getting bit by differing
subtree implicit mergeinfo, that we probably would have started to
hear about it by now.

> I think therefore it's right to do this. If it should cause any problem,
> it will almost certainly be a problem that already existed in some
> corner cases (where there's differing implicit mergeinfo but no explicit
> mergeinfo), and we would want to know about such a problem anyway, and
> this will make it easier to find and diagnose.

I'll commit this change later tonight unless I hear any objections.

Paul

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2371107

Re: [RFC] Greatly improve merge performance, but break(?) edge cases

Posted by Julian Foad <ju...@btopenworld.com>.

On Mon, 2009-07-13 at 09:42 -0400, Mark Phippard wrote:
> On Fri, Jul 10, 2009 at 12:01 PM, Paul Burba<pt...@gmail.com> wrote:
> 
> > TODAY - If a subtree has no explicit mergeinfo and differing implicit
> > mergeinfo, the difference is ignored.  If a subtree has explicit
> > mergeinfo and differing implicit mergeinfo, the difference is
> > accounted for.  This means that merging into a target with
> > semantically equivalent subtree mergeinfo can produce different
> > results.
> >
> > MY PATCH - If a subtree has differing implicit mergeinfo this
> > difference is always ignored, regardless of whether the subtree has
> > explicit mergeinfo or not.
> >
> > So the real question becomes: Should we consider differences in a
> > subtree's implicit mergeinfo when calculating what needs to be merged?
> 
> It almost sounds like it would be better to apply your patch so that
> merge is consistent.  If some real world use cases start to show up
> where the implicit mergeinfo of subtrees needs to be considered, then
> it would make sense to tackle this for all merge scenarios.  It sounds
> like today we have the worse possible solution:
> 
> 1) If the implicit mergeinfo really needs to be calculated, then in
> many merges it isn't, so users are probably not consistently
> experiencing or noticing problems, which means it will also be more
> difficult for us to recognize that there is a problem.  We do not know
> if users are experiencing this problem.
> 
> 2) If the implicit mergeinfo typically does not need to be calculated,
> then users are experiencing a huge performance cost for no real
> reason.  We know for certain that users are experiencing this problem.
> 
> Given how bad we know that #2 is for people that experience it, the
> idea of expanding this logic to cover all merge scenarios does not
> seem like the right solution to put in place.  So I'd be in favor of
> applying your patch so that there is consistency and we can learn if
> #1 is really an issue or not.

Yes, this sounds better than keeping the present (inconsistent and slow)
behaviour. (My earlier reply was based on misunderstood implications.)

The only thing to beware of is if the real-world use cases that create
differing implicit mergeinfo also happen to be the cases that create
subtree mergeinfo. If that were so, then the (inconsistent) algorithm
would have actually been producing consistently correct results in those
cases, whereas changing it might produce wrong results. The cases that
generate differing implicit mergeinfo are copies, I think. Copies used
to create sub-tree mergeinfo, but not any more. That means... I don't
know: that's as far as my brain goes today.

I think therefore it's right to do this. If it should cause any problem,
it will almost certainly be a problem that already existed in some
corner cases (where there's differing implicit mergeinfo but no explicit
mergeinfo), and we would want to know about such a problem anyway, and
this will make it easier to find and diagnose.

- Julian

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2370928

Re: [RFC] Greatly improve merge performance, but break(?) edge cases

Posted by Mark Phippard <ma...@gmail.com>.

On Fri, Jul 10, 2009 at 12:01 PM, Paul Burba<pt...@gmail.com> wrote:

> TODAY - If a subtree has no explicit mergeinfo and differing implicit
> mergeinfo, the difference is ignored.  If a subtree has explicit
> mergeinfo and differing implicit mergeinfo, the difference is
> accounted for.  This means that merging into a target with
> semantically equivalent subtree mergeinfo can produce different
> results.
>
> MY PATCH - If a subtree has differing implicit mergeinfo this
> difference is always ignored, regardless of whether the subtree has
> explicit mergeinfo or not.
>
> So the real question becomes: Should we consider differences in a
> subtree's implicit mergeinfo when calculating what needs to be merged?

It almost sounds like it would be better to apply your patch so that
merge is consistent.  If some real world use cases start to show up
where the implicit mergeinfo of subtrees needs to be considered, then
it would make sense to tackle this for all merge scenarios.  It sounds
like today we have the worse possible solution:

1) If the implicit mergeinfo really needs to be calculated, then in
many merges it isn't, so users are probably not consistently
experiencing or noticing problems, which means it will also be more
difficult for us to recognize that there is a problem.  We do not know
if users are experiencing this problem.

2) If the implicit mergeinfo typically does not need to be calculated,
then users are experiencing a huge performance cost for no real
reason.  We know for certain that users are experiencing this problem.

Given how bad we know that #2 is for people that experience it, the
idea of expanding this logic to cover all merge scenarios does not
seem like the right solution to put in place.  So I'd be in favor of
applying your patch so that there is consistency and we can learn if
#1 is really an issue or not.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2370911

Re: [RFC] Greatly improve merge performance, but break(?) edge cases

Posted by Paul Burba <pt...@gmail.com>.

Thanks all for your time and thoughts.

Julian - re other solutions, I agree.  In pursuing this particular
path I was trying to find a solution that could be backported to 1.6.
Other solutions to this and other general merge problems are
definitely being considered:
http://svn.collab.net/repos/svn/branches/subtree-mergeinfo/notes/subtree-mergeinfo/solutions-whiteboard.txt.

Re everyone's concern about correctness: I do agree that sacrificing
correctness on the alter of performance is wrong.  However, I'm not
sure the current behavior actually is correct.  My wording here was
poorly chosen:

> The only problem is that this patch changes the behavior of merge when
> a subtree with explicit mergeinfo has different history, e.g. a
> subtree is deleted and replaced with a copy from a different branch or
> from the same branch but at a different point in it's history.  In
> these cases revisions might be included or excluded for merging
> differently than if we ask the repos for the actual implicit mergeinfo
> (what we do today).

I didn't mean that with the patch might be inconsistent in its
behavior, I meant that it will differ from our current behavior (which
is inconsistent).

As I pointed out in issue #3443, currently implicit subtree mergeinfo
which differs from the root of the merge target's implicit mergeinfo
is *only* considered *if* the subtree has explicit mergeinfo.  It is
quite possible for a subtree's implicit mergeinfo to differ but for
the subtree to not have explicit mergeinfo, in which case the implicit
mergeinfo difference is completely ignored.  Put another way, the
current trunk/1.6 behavior is inconsistent in this situation.  So no
matter how we define the correct behavior, at least some of the time,
merge presently does the wrong thing in this "edge" case.

My change would make every subtree's implicit mergeinfo a function of
the merge target's implicit mergeinfo.  A subtree's implicit mergeinfo
would not potentially change, as it effectively does today, depending
on whether the subtree has explicit mergeinfo or not.  (/me wonders if
anyone actually follows this)

So the patch brings consistency...

...but maybe it makes things consistently wrong.  As I rethink this,
I'm hard pressed to say if this is better or worse than how things
work today (ignoring the performance gains).

TODAY - If a subtree has no explicit mergeinfo and differing implicit
mergeinfo, the difference is ignored.  If a subtree has explicit
mergeinfo and differing implicit mergeinfo, the difference is
accounted for.  This means that merging into a target with
semantically equivalent subtree mergeinfo can produce different
results.

MY PATCH - If a subtree has differing implicit mergeinfo this
difference is always ignored, regardless of whether the subtree has
explicit mergeinfo or not.

So the real question becomes: Should we consider differences in a
subtree's implicit mergeinfo when calculating what needs to be merged?

If the answer is yes, then my patch is wrong as it would take us from
sometimes wrong, to always wrong.  If the answer is no, then it makes
things always right.

Beware the easy "yes" and keep in mind that a subtree may *easily*
have differing implicit mergeinfo but have no explicit mergeinfo.
That means every path under a merge target potentially has differing
implicit mergeinfo and we need to detect this*

Paul

* That said, the answer probably is "yes" and this whole avenue is a
dead end (ending at a cliff edge above a river of lava).

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2369822

RE: [RFC] Greatly improve merge performance, but break(?) edge cases

Posted by Geoff Rowell <ge...@varolii.com>.

On Fri, Jul 10, 2009 at 9:32 AM, Stefan Sperling wrote:
> Since in the end of the day, all Trumerge is doing is using
Subversion's
> merge feature in a way that is possible and technically correct.
Albeit
> this use case has not been envisioned by the designers of the
merge-tracking
> feature. But the fact that what Trumerge does is technically correct
is
> quite important once you look at things in a larger perspective and
put
> merge-tracking and tree-conflict detection next to each other and ask
> yourself if those two features are 100% compatible at the design
level.
> Because eventually, we hope to bring Subversion's tree-conflict
detection
> on par with trumerge (or make it even better). And if we can't do that
> without causing even more performance problems for Subversion, well,
> people won't like it because Subversion will become really unusable.
> 
> I'd rather hope that trumerge can be changed to not require subtree
> mergeinfo on every file though, as this thread seems to indicate that
> fixing the subtree mergeinfo problem at the merge-tracking design
> level is hard without harming merge correctness.
> 

Aside from performance, I wouldn't have such a problem with distributed
mergeinfo if it didn't tend to alarm my users. They try to merge a few
files and Subversion reports a massive number of changes.

A serious review of where and when mergeinfo property changes are
reported is needed.

-Geoff


This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2369718

Re: [RFC] Greatly improve merge performance, but break(?) edge cases

Posted by "C. Michael Pilato" <cm...@collab.net>.

Stefan Sperling wrote:
> Paul, I don't think that taking the risk of merging incorrect changes
> into subtress, or not merging changes into them which should be merged,
> is a good answer to the performance problem...
> I think we should rather look into minimising network round trips and
> see how much that helps.

I agree.  For obvious reasons, I'm sympathetic to the performance issues
that plague TruMerge's user base, but I can't endorse sacrificing
correctness for (mere) speed.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2369715

Re: [RFC] Greatly improve merge performance, but break(?) edge cases

Posted by Stefan Sperling <st...@elego.de>.

On Fri, Jul 10, 2009 at 07:52:12AM -0400, Greg Troxel wrote:
> Is there an example somewhere on the web about a use case where large
> amounts of subtree mergeinfo really makes sense?

The way e.g. TruMerge does merging (one merge per file) will cause
subtree mergeinfo all over the place. http://trumerge.open.collab.net

Trumerge runs a Subversion merge in such a manner that tree conflicts
are more likely to be detected than during a standard Subversion merge.
This is a desirable property if you have large trees and many branches
where things get moved around a lot and you do a lot of merging between
branches. Trumerge also treats renames much smarter than Subversion does.

This is probably the most "edge-case" use of Subversion's merge feature
we've ever heard of. They have actually hit the theoretical worst-case in
practice: virtually every file has subtree mergeinfo.

Trumerge's authors have informed us of huge performance problems they
are seeing when using trumerge, and this caused Paul to look more closely
into what could be done about it.

Since in the end of the day, all Trumerge is doing is using Subversion's
merge feature in a way that is possible and technically correct. Albeit
this use case has not been envisioned by the designers of the merge-tracking
feature. But the fact that what Trumerge does is technically correct is
quite important once you look at things in a larger perspective and put
merge-tracking and tree-conflict detection next to each other and ask
yourself if those two features are 100% compatible at the design level.
Because eventually, we hope to bring Subversion's tree-conflict detection
on par with trumerge (or make it even better). And if we can't do that
without causing even more performance problems for Subversion, well,
people won't like it because Subversion will become really unusable.

I'd rather hope that trumerge can be changed to not require subtree
mergeinfo on every file though, as this thread seems to indicate that
fixing the subtree mergeinfo problem at the merge-tracking design
level is hard without harming merge correctness.

Paul, I don't think that taking the risk of merging incorrect changes
into subtress, or not merging changes into them which should be merged,
is a good answer to the performance problem...
I think we should rather look into minimising network round trips and
see how much that helps.

Stefan

Re: [RFC] Greatly improve merge performance, but break(?) edge cases

Posted by Greg Troxel <gd...@ir.bbn.com>.

Julian Foad <ju...@btopenworld.com> writes:

> So basically, IF I understood you correctly and IF the behaviour today
> is definitely correct and the proposed behaviour definitely wrong, then
> I am not in favour of such a change. I value the ability to hit "merge"
> and trust the tool to do its job more than I value the ability to hit
> "merge" and have the result quickly but then have to worry about whether
> it did what I expected.

+1

Also, in my world I essentially have a rule against subtree mergeinfo -
we have TTB points for each module, and all tagging/branching/merging is
supposed to be at those points.  So while I value correctness in all
cases, I am only concerned about efficiency for reasonable use cases.

Is there an example somewhere on the web about a use case where large
amounts of subtree mergeinfo really makes sense?

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2369678

Re: [RFC] Greatly improve merge performance, but break(?) edge cases

Posted by Julian Foad <ju...@btopenworld.com>.

Paul Burba wrote:
> A thinly veiled plea for some eyes on issue #3443...

[...]
> 1) Are the cases where subtrees in a merge target have different
> implicit mergeinfo than the rest of the target common use cases or
> highly contrived edge cases?.  I think the latter is true, certainly
> the vanilla release and feature branch models aren't affected, but am
> I missing something obvious?  I've spent so much time close to this
> that I might be missing the forest for the trees.
> 
> 2) Assuming subtrees with differing implicit mergeinfo are edge cases,
> is there any reason *not* to make the changes suggested by the patch
> in issue #3443?

Hi Paul.

I can't answer (1) with empirical evidence. All I can do is offer my
perspective. Please forgive me if I'm labouring my point in this long
email.

Before you read on, be aware that I am not 100% sure that the behaviour
change you propose is actually a change from correctness to
incorrectness, but it sounds like it is so that's what I am assuming.

We are in a position where we fully support a certain limited kind of
merging scenario, and we also claim that Subversion can to some extent
cope with a much wider range of scenarios, and we hope to improve this.

When a tool does a complex job (merging) and is used in a complex
setting (software development, typically), users need to understand as
best they can what it does, and then apply it over and over. Because
they had to choose from the set of available version control systems,
none of which completely fulfills all of their needs, they are
inevitably using Subversion in some capacity that is a stretch for its
suitability in one way or another. Some of them are stretching the merge
capabilities and not able to stick to the most basic and fully supported
merging scenarios.

We tell them it has this complex but describable behaviour. If we then
tell them that in another scenario, with similar recent history but
different ancient history, it may omit parts of the merge (without even
telling the user that anything is amiss), then the user will rightly
lose confidence, or will go ahead and use it and then be bitterly
disappointed one day when they discover the problem. We can say, "You
liked it because it did the merges quickly." And they may reply, "It was
working fine, and then we used it in a an obviously similar scenario and
apparently it didn't work, so now we have to recall and fix our product.
We know that it has limitations, but we've discovered you did this
deliberately. Thanks a bunch."

When a software tool doesn't behave predictably and consistently, it
frustrates efforts to integrate it into a larger system. I would say
building GUIs around Subversion is such a case.

I think there is an equally important point on the Subversion
development side, too. From my point of view as a Subversion developer,
I think designing WHAT it does - the behaviour - is the more complex
task, and implementing that behaviour, though time consuming and still
complex, is the more accessible task.

If we design and implement the right behaviour, but it is slow, a new
developer can look at it and find ways to implement that same behaviour
more efficiently. (For example, pipelining the network requests to
eliminate the multiple-turnaround delay.) In our line of work, almost
any algorithm that executes slowly can be speeded up immensely by a
better implementation. That may be a complex task to many of us, but
there are a reasonable number of people who step up to do such tasks. It
is a self-contained task, with regression tests already in place, and
the roll-out brings only goodness to the users.

If we knowingly implement something that is not the correct behaviour,
then it is a much harder task for anyone later to redesign the correct
behaviour while also having to contemplate the effects on the user base
of making such a change. It is not sufficient to leave a comment in the
source saying, "The correct behavior would be X. Here, we knowingly do
Y." Changing the behaviour requires not just changing the source but
also testing, writing new tests, understanding old tests, and managing
users' experiences with the upgrade. And the re-implementation of X
would surely be as difficult as it was the first time around.

The definition of an "edge case" is that which we don't see happening
very often or which can easily be avoided. The trouble with dismissing
"edge cases" is that (a) although rare as an average across the
experience we have encountered so far, when a user hits one it often
becomes common for that user; and (b) when an edge case occurs in one
sub-set of a set of complex behaviours, as the combinations multiply up,
a whole large class of scenarios becomes unusable or unreliable.

The fact that the current regression tests don't catch the shortcomings
is just a result of their simplicity and we can be sure that real users
will run into problems. Haven't you spent, um, a *bit* of time fixing
"edge cases" that turned out to be actually rather important even if
they are still uncommon on average?

You have not found a way to speed it up immensely while keeping the
behaviour, but I think you should put this proposal out as a kick-off
point for other people to tackle that problem, and you should not feel
you have to implement something fast even at the expense of correctness.

All the foregoing assumes that the current behaviour w.r.t. sub-trees is
"correct" and skipping them is "wrong". That is a definition we could
consider changing. One alternative to keeping the "correct" behaviour is
to explicitly not support arbitrary merges with mixed sub-tree
mergeinfo. I can't see us choosing that option but maybe the line of
thought will lead somewhere.

Or maybe there are opportunities for higher level redesign of the
mergeinfo storage scheme that would make the implementation easier. For
example, if we were to rearrange the mergeinfo so that it explicitly
holds some info about "implicit" mergeinfo as well, maybe that could
enable major optimisations to be made in the look-up algorithms. (That's
not a thought-out proposal, just a top-of-of-my-head example.)

So basically, IF I understood you correctly and IF the behaviour today
is definitely correct and the proposed behaviour definitely wrong, then
I am not in favour of such a change. I value the ability to hit "merge"
and trust the tool to do its job more than I value the ability to hit
"merge" and have the result quickly but then have to worry about whether
it did what I expected.

Regards,
- Julian

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2369619

[RFC] *Greatly* improve merge performance, but break(?) edge cases

Re: Report mergeinfo-only changes less verbosely [was: [RFC] *Greatly* improve merge performance...]

Re: confidential mails -- was: Re: Report mergeinfo-only changes less verbosely [was: [RFC] *Greatly*improve merge performance...]

Re: confidential mails -- was: Re: Report mergeinfo-only changes less verbosely [was: [RFC] *Greatly*improve merge performance...]

confidential mails -- was: Re: Report mergeinfo-only changes less verbosely [was: [RFC] *Greatly*improve merge performance...]

Re: Report mergeinfo-only changes less verbosely [was: [RFC] *Greatly*improve merge performance...]

RE: Report mergeinfo-only changes less verbosely [was: [RFC] *Greatly*improve merge performance...]

Report mergeinfo-only changes less verbosely [was: [RFC] *Greatly* improve merge performance...]

Re: [RFC] *Greatly* improve merge performance, but break(?) edge cases

Re: [RFC] *Greatly* improve merge performance, but break(?) edge cases

Re: [RFC] *Greatly* improve merge performance, but break(?) edge cases

Re: [RFC] *Greatly* improve merge performance, but break(?) edge cases

Re: [RFC] *Greatly* improve merge performance, but break(?) edge cases

Re: [RFC] *Greatly* improve merge performance, but break(?) edge cases

RE: [RFC] *Greatly* improve merge performance, but break(?) edge cases

Re: [RFC] *Greatly* improve merge performance, but break(?) edge cases

Re: [RFC] *Greatly* improve merge performance, but break(?) edge cases

Re: [RFC] *Greatly* improve merge performance, but break(?) edge cases

Re: [RFC] *Greatly* improve merge performance, but break(?) edge cases

[RFC] Greatly improve merge performance, but break(?) edge cases

Re: Report mergeinfo-only changes less verbosely [was: [RFC] Greatly improve merge performance...]

Re: confidential mails -- was: Re: Report mergeinfo-only changes less verbosely [was: [RFC] Greatlyimprove merge performance...]

Re: confidential mails -- was: Re: Report mergeinfo-only changes less verbosely [was: [RFC] Greatlyimprove merge performance...]

confidential mails -- was: Re: Report mergeinfo-only changes less verbosely [was: [RFC] Greatlyimprove merge performance...]

Re: Report mergeinfo-only changes less verbosely [was: [RFC] Greatlyimprove merge performance...]

RE: Report mergeinfo-only changes less verbosely [was: [RFC] Greatlyimprove merge performance...]

Report mergeinfo-only changes less verbosely [was: [RFC] Greatly improve merge performance...]

Re: [RFC] Greatly improve merge performance, but break(?) edge cases

Re: [RFC] Greatly improve merge performance, but break(?) edge cases

Re: [RFC] Greatly improve merge performance, but break(?) edge cases

Re: [RFC] Greatly improve merge performance, but break(?) edge cases

Re: [RFC] Greatly improve merge performance, but break(?) edge cases

Re: [RFC] Greatly improve merge performance, but break(?) edge cases

RE: [RFC] Greatly improve merge performance, but break(?) edge cases

Re: [RFC] Greatly improve merge performance, but break(?) edge cases

Re: [RFC] Greatly improve merge performance, but break(?) edge cases

Re: [RFC] Greatly improve merge performance, but break(?) edge cases

Re: [RFC] Greatly improve merge performance, but break(?) edge cases