You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Greg Hudson <gh...@MIT.EDU> on 2005/02/27 11:47:19 UTC

Auto-merging without ancestry checks

If you've been following
http://svn.haxx.se/dev/archive-2005-02/0897.shtml ("ugly problem found
while trying to test KDE SVN"), then you've seen me conclude that our
auto-merging code can be ruinously inefficient on big repositories.
Although Eric has fixed the auto-merge memory leaks in FSFS (his
change will need to be back-ported to BDB), doing multiple complete
predecessor walks in a repository like kde-i18n is prohibitively slow.

I'm not prepared to rewrite the auto-merging code.  But I can present
a design which I think is simpler, more maintainable, and doesn't
involve asking whether any node is an ancestor of any other node.

Here it is, then.  We want to merge three directories: ANCESTOR (which
is the revision our transaction is based off of), SOURCE (which is the
current head revision, containing changes made to the repository while
the transaction was in progress), and TARGET (which is the transaction
to be committed).

  For each entry NAME in the directory ANCESTOR:

    Let ANCESTOR-ENTRY, SOURCE-ENTRY, and TARGET-ENTRY be the IDs of
    the name within ANCESTOR, SOURCE, and TARGET respectively.
    (Possibly null if NAME does not exist in SOURCE or TARGET.)

    If ANCESTOR-ENTRY == SOURCE-ENTRY, then:
      No changes were made to this entry while the transaction was in
      progress, so do nothing to the target.

    Else if ANCESTOR-ENTRY == TARGET-ENTRY, then:
      A change was made to this entry while the transaction was in
      process, but the transaction did not touch this entry.  Replace
      TARGET-ENTRY with SOURCE-ENTRY.

    Else:
      Changes were made to this entry both within the transaction and
      to the repository while the transaction was in progress.  They
      must be merged or declared to be in conflict.

      If SOURCE-ID and TARGET-ID are both null, do nothing.  (This is
      a double delete.  We could flag a conflict, but the current
      auto-merge allows this case.)  If either SOURCE-ID or TARGET-ID
      is null but not the other one, flag a conflict.  A deletion is
      incompatible with a modification.

      If any of the three entries is of type file, declare a conflict.

      If either SOURCE-ENTRY or TARGET-ENTRY is not a direct
      modification of ANCESTOR-ENTRY (determine by comparing the
      node-id and copy-id fields), declare a conflict.  A replacement
      is incompatible with a modification or other replacement--even
      an identical replacement.

      Direct modifications were made to the directory ANCESTOR-ENTRY
      in both SOURCE and TARGET.  Recursively merge these
      modifications.

  For each leftover entry NAME in the directory SOURCE:

    If NAME exists in TARGET, declare a conflict.  Even if SOURCE and
    TARGET are adding exactly the same thing, two additions are not
    auto-mergeable with each other.  (This would be a change from the
    current auto-merge code, which allows adds of identical
    node-revs.)

    Add NAME to TARGET with the entry from SOURCE.

  Now that we are done merging the changes from SOURCE into the
  directory TARGET, update TARGET's predecessor to be SOURCE.

If you can follow the above algorithm, it most likely makes sense, but
you're probably wondering why the existing auto-merge code asks
questions about ancestry.  Here is what I could determine:

  * If changes were made to an entry in both SOURCE and TARGET, the
    current auto-merge code checks if TARGET is a descendent of
    SOURCE, and does nothing in that case.  But that's essentially
    impossible; it could only happen if SOURCE replaces the entry with
    blah and TARGET replaces the entry with some descendent of blah,
    and that really ought to be a conflict.

  * The current auto-merge code only updates the predecessor of a
    merged directory if there is no existing ordering between those
    two entries.  Since my algorithm only attempts to merge
    directories if SOURCE and TARGET are direct modifications of
    ANCESTOR, there is never an existing ordering between SOURCE and
    TARGET.  So there is no need to check.

Another thing I've omitted from my algorithm is described in issue
#418 (http://subversion.tigris.org/issues/show_bug.cgi?id=418).  For
reasons I do not understand, the current auto-merge code allows a
deletion in SOURCE to be merged with a replacement in TARGET (by
ignoring the deletion).  Supposedly, this is for ease of use, but
neither the code nor issue #418 describes the use case which is being
made easier.  Perhaps this dates back to when commits were made
against the working copy base revision, not against HEAD?  If we do
need the allowance described in issue #418, it doesn't require any
ancestry checks, just a relatedness check.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Auto-merging without ancestry checks

Posted by Greg Hudson <gh...@MIT.EDU>.
On Sun, 2005-02-27 at 16:21, Eric Gillespie wrote:
> > I'm not prepared to rewrite the auto-merging code.
> 
> Since the fix you propose is more involved than either of us care
> to take on at the moment, do you mind if i go ahead and commit
> the pools usage fix?

Not at all.

> I also want to nominate this for 1.1.x.

I agree.  The algorithmic rewrite would have to wait until 1.2 (not for
compatibility reasons, just due to the impact), and fixing the memory
leak will go a long way.

(I'll reiterate that someone should backport the memory leak fixes to
BDB and nominate that for 1.1.x as well.)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Auto-merging without ancestry checks

Posted by Eric Gillespie <ep...@pretzelnet.org>.
Ben Collins-Sussman <su...@collab.net> writes:

> Can you port it to libsvn_fs_base also?  The general logic
> should be the same (since fs_fs essentially started life as a
> branch of fs_base).  I expect the port would be trivial.

This is volunteer time for me, not paid time, and i have zero
interest in bdb :)  But i will take a quick look.

--  
Eric Gillespie <*> epg@pretzelnet.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Auto-merging without ancestry checks

Posted by Ben Collins-Sussman <su...@collab.net>.
On Feb 27, 2005, at 3:21 PM, Eric Gillespie wrote:
>
> Since the fix you propose is more involved than either of us care
> to take on at the moment, do you mind if i go ahead and commit
> the pools usage fix?
>

Can you port it to libsvn_fs_base also?  The general logic should be 
the same (since fs_fs essentially started life as a branch of fs_base). 
  I expect the port would be trivial.



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Auto-merging without ancestry checks

Posted by John Szakmeister <jo...@szakmeister.net>.
On Sunday 27 February 2005 16:21, Eric Gillespie wrote:
> Greg Hudson <gh...@MIT.EDU> writes:
> > Although Eric has fixed the auto-merge memory leaks in FSFS (his
> > change will need to be back-ported to BDB), doing multiple complete
> > predecessor walks in a repository like kde-i18n is prohibitively
> > slow.
> >
> > I'm not prepared to rewrite the auto-merging code.
>
> Since the fix you propose is more involved than either of us care
> to take on at the moment, do you mind if i go ahead and commit
> the pools usage fix?

Julian Foad had some concern about the pool usage.  You probably want to 
check his follow up to your patch before committing.

-John

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Auto-merging without ancestry checks

Posted by Eric Gillespie <ep...@pretzelnet.org>.
Greg Hudson <gh...@MIT.EDU> writes:

> Although Eric has fixed the auto-merge memory leaks in FSFS (his
> change will need to be back-ported to BDB), doing multiple complete
> predecessor walks in a repository like kde-i18n is prohibitively slow.
> 
> I'm not prepared to rewrite the auto-merging code.

Since the fix you propose is more involved than either of us care
to take on at the moment, do you mind if i go ahead and commit
the pools usage fix?

As for being prohibitively slow, we have 27,000 revisions of a
70,000 file tree, and the auto-merge only takes a few seconds
after my change.  Sure, it could be better, but this is a MAJOR
improvement over spinning for nearly a minute and then dumping
core, as it does today.

I also want to nominate this for 1.1.x.

(And no, i haven't forgotten about my Darwin DSO patch, i'll
commit that soon).

--  
Eric Gillespie <*> epg@pretzelnet.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Auto-merging without ancestry checks

Posted by Greg Hudson <gh...@MIT.EDU>.
On Sun, 2005-02-27 at 09:42, Ben Collins-Sussman wrote:
> Greg, thanks not only for doing the legwork in identifying the problem, 
> but for proposing a new algorithm.  I have to admit that I'm really 
> surprised that such a large performance/scalability bug exists in such 
> a fundamental algorithm right at the core of our commit logic... I'm 
> equally surprised that we've never noticed it.

Well, it only happens when two commits happen at the same time.  Our
command-line client test suite can't trigger that (fs-test has some
merge tests, I think, but not many), so we have very little coverage of
it.

And while the auto-merge scales poorly, computers are fast and huge
workloads are rare.  Still, I think we have seen occasional user reports
of ginormous memory usage on the server, which we've been unable to
diagnose.

> I remember Karl and Mike spending a good couple of days (back in
> early 2001) discussing and carefully implementing the current
> algorithm [...]

Upon further consideration, I think a lot of the complexity in the
auto-merge comes from life being harder by then.  Instead of committing
against the base revision, we committed against the base rev of the
working copy (if I understand right).  So issue #418 was probably
addressing the case where you delete a file, commit, and then (without
updating your working directory) attempt to copy a new file into place. 
The auto-merge would then, in the past commit world, be faced with a
directory ancestor where the file still exists--which is not the proper
ancestor from the perspective of the target transaction!  With today's
commit system, I think the auto-merge can be much simpler.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Auto-merging without ancestry checks

Posted by Ben Collins-Sussman <su...@collab.net>.
On Feb 27, 2005, at 5:47 AM, Greg Hudson wrote:

> If you've been following
> http://svn.haxx.se/dev/archive-2005-02/0897.shtml ("ugly problem found
> while trying to test KDE SVN"), then you've seen me conclude that our
> auto-merging code can be ruinously inefficient on big repositories.
> Although Eric has fixed the auto-merge memory leaks in FSFS (his
> change will need to be back-ported to BDB), doing multiple complete
> predecessor walks in a repository like kde-i18n is prohibitively slow.
>

Greg, thanks not only for doing the legwork in identifying the problem, 
but for proposing a new algorithm.  I have to admit that I'm really 
surprised that such a large performance/scalability bug exists in such 
a fundamental algorithm right at the core of our commit logic... I'm 
equally surprised that we've never noticed it.  I remember Karl and 
Mike spending a good couple of days (back in early 2001) discussing and 
carefully implementing the current algorithm, so maybe they'll be able 
to comment on those questions you asked.

Wow.  Has an issue been filed?  This seems pretty high-priority to me.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Auto-merging without ancestry checks

Posted by Eric Gillespie <ep...@pretzelnet.org>.
The FSFS change is committed, and i have a preliminary patch for
libsvn_fs_base.  The test suite is running now, and i'm about to
try my simultaneous commit scenarios on it.  If you want to see
the as-yet-untested patch:

  http://pretzelnet.org/~epg/tmp/auto-merge.diff

Alternatively, since i wore my dunce cap home from work, you can
look at r13224.

--  
Eric Gillespie <*> epg@pretzelnet.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org