You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Paul Burba <pt...@gmail.com> on 2009/07/10 16:01:39 UTC

Re: [RFC] *Greatly* improve merge performance, but break(?) edge cases

Thanks all for your time and thoughts.

Julian - re other solutions, I agree.  In pursuing this particular
path I was trying to find a solution that could be backported to 1.6.
Other solutions to this and other general merge problems are
definitely being considered:
http://svn.collab.net/repos/svn/branches/subtree-mergeinfo/notes/subtree-mergeinfo/solutions-whiteboard.txt.

Re everyone's concern about correctness: I do agree that sacrificing
correctness on the alter of performance is wrong.  However, I'm not
sure the current behavior actually is correct.  My wording here was
poorly chosen:

> The only problem is that this patch changes the behavior of merge when
> a subtree with explicit mergeinfo has different history, e.g. a
> subtree is deleted and replaced with a copy from a different branch or
> from the same branch but at a different point in it's history.  In
> these cases revisions might be included or excluded for merging
> differently than if we ask the repos for the actual implicit mergeinfo
> (what we do today).

I didn't mean that with the patch might be inconsistent in its
behavior, I meant that it will differ from our current behavior (which
is inconsistent).

As I pointed out in issue #3443, currently implicit subtree mergeinfo
which differs from the root of the merge target's implicit mergeinfo
is *only* considered *if* the subtree has explicit mergeinfo.  It is
quite possible for a subtree's implicit mergeinfo to differ but for
the subtree to not have explicit mergeinfo, in which case the implicit
mergeinfo difference is completely ignored.  Put another way, the
current trunk/1.6 behavior is inconsistent in this situation.  So no
matter how we define the correct behavior, at least some of the time,
merge presently does the wrong thing in this "edge" case.

My change would make every subtree's implicit mergeinfo a function of
the merge target's implicit mergeinfo.  A subtree's implicit mergeinfo
would not potentially change, as it effectively does today, depending
on whether the subtree has explicit mergeinfo or not.  (/me wonders if
anyone actually follows this)

So the patch brings consistency...

...but maybe it makes things consistently wrong.  As I rethink this,
I'm hard pressed to say if this is better or worse than how things
work today (ignoring the performance gains).

TODAY - If a subtree has no explicit mergeinfo and differing implicit
mergeinfo, the difference is ignored.  If a subtree has explicit
mergeinfo and differing implicit mergeinfo, the difference is
accounted for.  This means that merging into a target with
semantically equivalent subtree mergeinfo can produce different
results.

MY PATCH - If a subtree has differing implicit mergeinfo this
difference is always ignored, regardless of whether the subtree has
explicit mergeinfo or not.

So the real question becomes: Should we consider differences in a
subtree's implicit mergeinfo when calculating what needs to be merged?

If the answer is yes, then my patch is wrong as it would take us from
sometimes wrong, to always wrong.  If the answer is no, then it makes
things always right.

Beware the easy "yes" and keep in mind that a subtree may *easily*
have differing implicit mergeinfo but have no explicit mergeinfo.
That means every path under a merge target potentially has differing
implicit mergeinfo and we need to detect this*

Paul

* That said, the answer probably is "yes" and this whole avenue is a
dead end (ending at a cliff edge above a river of lava).

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2369822

Re: [RFC] *Greatly* improve merge performance, but break(?) edge cases

Posted by Julian Foad <ju...@btopenworld.com>.
I (Julian Foad) wrote:
[...]
> I think therefore it's right to do this. If it should cause any problem,
> it will almost certainly be a problem that already existed in some
> corner cases (where there's differing implicit mergeinfo but no explicit
> mergeinfo), and we would want to know about such a problem anyway, and
> this will make it easier to find and diagnose.

Put simply:
Consistency is a necessary step on the road to correctness.

- Julian

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2370930

Re: [RFC] *Greatly* improve merge performance, but break(?) edge cases

Posted by Paul Burba <pt...@gmail.com>.
On Mon, Jul 13, 2009 at 5:25 PM, Paul Burba<pt...@gmail.com> wrote:

> I'll commit this change later tonight unless I hear any objections.

Done r38417.

> Paul
>

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2371141

Re: [RFC] *Greatly* improve merge performance, but break(?) edge cases

Posted by Paul Burba <pt...@gmail.com>.
On Mon, Jul 13, 2009 at 9:42 AM, Mark Phippard<ma...@gmail.com> wrote:

> On Fri, Jul 10, 2009 at 12:01 PM, Paul Burba<pt...@gmail.com> wrote:
>
>> TODAY - If a subtree has no explicit mergeinfo and differing implicit
>> mergeinfo, the difference is ignored.  If a subtree has explicit
>> mergeinfo and differing implicit mergeinfo, the difference is
>> accounted for.  This means that merging into a target with
>> semantically equivalent subtree mergeinfo can produce different
>> results.
>>
>> MY PATCH - If a subtree has differing implicit mergeinfo this
>> difference is always ignored, regardless of whether the subtree has
>> explicit mergeinfo or not.
>>
>> So the real question becomes: Should we consider differences in a
>> subtree's implicit mergeinfo when calculating what needs to be merged?
>
> It almost sounds like it would be better to apply your patch so that
> merge is consistent.  If some real world use cases start to show up
> where the implicit mergeinfo of subtrees needs to be considered, then
> it would make sense to tackle this for all merge scenarios.  It sounds
> like today we have the worse possible solution:
>
> 1) If the implicit mergeinfo really needs to be calculated, then in
> many merges it isn't, so users are probably not consistently
> experiencing or noticing problems, which means it will also be more
> difficult for us to recognize that there is a problem.  We do not know
> if users are experiencing this problem.

Exactly!  I guess I finally explained this issue correctly :-P

> 2) If the implicit mergeinfo typically does not need to be calculated,
> then users are experiencing a huge performance cost for no real
> reason.  We know for certain that users are experiencing this problem.
>
> Given how bad we know that #2 is for people that experience it, the
> idea of expanding this logic to cover all merge scenarios does not
> seem like the right solution to put in place.  So I'd be in favor of
> applying your patch so that there is consistency and we can learn if
> #1 is really an issue or not.

On Mon, Jul 13, 2009 at 10:33 AM, Julian Foad<ju...@btopenworld.com> wrote:

> Yes, this sounds better than keeping the present (inconsistent and slow)
> behaviour. (My earlier reply was based on misunderstood implications.)
>
> The only thing to beware of is if the real-world use cases that create
> differing implicit mergeinfo also happen to be the cases that create
> subtree mergeinfo. If that were so, then the (inconsistent) algorithm
> would have actually been producing consistently correct results in those
> cases, whereas changing it might produce wrong results. The cases that
> generate differing implicit mergeinfo are copies, I think. Copies used
> to create sub-tree mergeinfo, but not any more. That means... I don't
> know: that's as far as my brain goes today.

Copies haven't created mergeinfo since 1.5.5 (12/22/08), so it has
probably been long enough that if users were getting bit by differing
subtree implicit mergeinfo, that we probably would have started to
hear about it by now.

> I think therefore it's right to do this. If it should cause any problem,
> it will almost certainly be a problem that already existed in some
> corner cases (where there's differing implicit mergeinfo but no explicit
> mergeinfo), and we would want to know about such a problem anyway, and
> this will make it easier to find and diagnose.

I'll commit this change later tonight unless I hear any objections.

Paul

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2371107

Re: [RFC] *Greatly* improve merge performance, but break(?) edge cases

Posted by Julian Foad <ju...@btopenworld.com>.
On Mon, 2009-07-13 at 09:42 -0400, Mark Phippard wrote:
> On Fri, Jul 10, 2009 at 12:01 PM, Paul Burba<pt...@gmail.com> wrote:
> 
> > TODAY - If a subtree has no explicit mergeinfo and differing implicit
> > mergeinfo, the difference is ignored.  If a subtree has explicit
> > mergeinfo and differing implicit mergeinfo, the difference is
> > accounted for.  This means that merging into a target with
> > semantically equivalent subtree mergeinfo can produce different
> > results.
> >
> > MY PATCH - If a subtree has differing implicit mergeinfo this
> > difference is always ignored, regardless of whether the subtree has
> > explicit mergeinfo or not.
> >
> > So the real question becomes: Should we consider differences in a
> > subtree's implicit mergeinfo when calculating what needs to be merged?
> 
> It almost sounds like it would be better to apply your patch so that
> merge is consistent.  If some real world use cases start to show up
> where the implicit mergeinfo of subtrees needs to be considered, then
> it would make sense to tackle this for all merge scenarios.  It sounds
> like today we have the worse possible solution:
> 
> 1) If the implicit mergeinfo really needs to be calculated, then in
> many merges it isn't, so users are probably not consistently
> experiencing or noticing problems, which means it will also be more
> difficult for us to recognize that there is a problem.  We do not know
> if users are experiencing this problem.
> 
> 2) If the implicit mergeinfo typically does not need to be calculated,
> then users are experiencing a huge performance cost for no real
> reason.  We know for certain that users are experiencing this problem.
> 
> Given how bad we know that #2 is for people that experience it, the
> idea of expanding this logic to cover all merge scenarios does not
> seem like the right solution to put in place.  So I'd be in favor of
> applying your patch so that there is consistency and we can learn if
> #1 is really an issue or not.

Yes, this sounds better than keeping the present (inconsistent and slow)
behaviour. (My earlier reply was based on misunderstood implications.)

The only thing to beware of is if the real-world use cases that create
differing implicit mergeinfo also happen to be the cases that create
subtree mergeinfo. If that were so, then the (inconsistent) algorithm
would have actually been producing consistently correct results in those
cases, whereas changing it might produce wrong results. The cases that
generate differing implicit mergeinfo are copies, I think. Copies used
to create sub-tree mergeinfo, but not any more. That means... I don't
know: that's as far as my brain goes today.

I think therefore it's right to do this. If it should cause any problem,
it will almost certainly be a problem that already existed in some
corner cases (where there's differing implicit mergeinfo but no explicit
mergeinfo), and we would want to know about such a problem anyway, and
this will make it easier to find and diagnose.

- Julian

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2370928

Re: [RFC] *Greatly* improve merge performance, but break(?) edge cases

Posted by Mark Phippard <ma...@gmail.com>.
On Fri, Jul 10, 2009 at 12:01 PM, Paul Burba<pt...@gmail.com> wrote:

> TODAY - If a subtree has no explicit mergeinfo and differing implicit
> mergeinfo, the difference is ignored.  If a subtree has explicit
> mergeinfo and differing implicit mergeinfo, the difference is
> accounted for.  This means that merging into a target with
> semantically equivalent subtree mergeinfo can produce different
> results.
>
> MY PATCH - If a subtree has differing implicit mergeinfo this
> difference is always ignored, regardless of whether the subtree has
> explicit mergeinfo or not.
>
> So the real question becomes: Should we consider differences in a
> subtree's implicit mergeinfo when calculating what needs to be merged?

It almost sounds like it would be better to apply your patch so that
merge is consistent.  If some real world use cases start to show up
where the implicit mergeinfo of subtrees needs to be considered, then
it would make sense to tackle this for all merge scenarios.  It sounds
like today we have the worse possible solution:

1) If the implicit mergeinfo really needs to be calculated, then in
many merges it isn't, so users are probably not consistently
experiencing or noticing problems, which means it will also be more
difficult for us to recognize that there is a problem.  We do not know
if users are experiencing this problem.

2) If the implicit mergeinfo typically does not need to be calculated,
then users are experiencing a huge performance cost for no real
reason.  We know for certain that users are experiencing this problem.

Given how bad we know that #2 is for people that experience it, the
idea of expanding this logic to cover all merge scenarios does not
seem like the right solution to put in place.  So I'd be in favor of
applying your patch so that there is consistency and we can learn if
#1 is really an issue or not.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=462&dsMessageId=2370911