You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Stefan Sperling <st...@elego.de> on 2010/03/23 13:28:09 UTC

Re: Why merge is so slow? (was Re: svn commit: r926210 - /subversion/trunk/notes/meetings/svn-vision-agenda)

On Tue, Mar 23, 2010 at 01:13:11PM +0100, Johan Corveleyn wrote:
> On Mon, Mar 22, 2010 at 11:03 PM, Ivan Zhakov <iv...@visualsvn.com> wrote:
> > On Tue, Mar 23, 2010 at 00:53, Mark Phippard <ma...@gmail.com> wrote:
> >> On Mon, Mar 22, 2010 at 5:14 PM, Ivan Zhakov <iv...@visualsvn.com> wrote:
> >>> On Mon, Mar 22, 2010 at 20:37,  <hw...@apache.org> wrote:
> >>>> +Some other random stuff Hyrum would like to talk about:
> >>>> + * Why is merge slow (compared to $OTHER_SYSTEM)?
> >>>> +   - Is it endemic to Subversion's architecture, or can it be fixed?
> >>> My opinion that merge is slow because it's client driven. Client
> >>> perform a lot of requests to decide what revisions and files to merge.
> >>> Just an idea: move this logic to server side and use slightly extended
> >>> reporter/editor to apply changes on client.
> >>
> >> Whether it is merge or blame or something else, the reason I have
> >> heard given in the past is that SVN was designed this way for
> >> scalability.  The server was supposed to just serve up revisions and
> >> leave the more expensive parts for the client.  Given the amount of
> >> RAM the client can spike to at times, I cannot see this ever scaling
> >> if it were done on the server.
> >>
> > Scalability is a good reason to move operations to client and I
> > understand how blame operation will impact server. But I don't see
> > reasons why merge should take more resource than update/switch/diff
> > operations. As I understand during merge we retrieve mergeinfo for
> > from several locations then perform some set math on them and apply
> > revisions to working tree.
> 
> I agree. I can certainly understand that general design principle, but
> I think in general the answer is: it depends. Obviously it pays off
> that the server does _some_ work, and doesn't shove everything off to
> the client (otherwise, the server could also just stream the entire
> repository to the client for every read operation, and let it sort out
> for itself which revisions it needs, and what parts of it ;), then it
> would hardly use any RAM on the server).
> 
> So I think that, for every use case, one needs to carefully balance
> the scalability of the server against the efficiency and performance
> of the operation as a whole.

In most setups I've seen the server hardware is much beefier than
the client hardware, so unless we do things that scale really badly
(say more than O(n^2)) I don't see a problem.

It looks like we cannot avoid pushing more work on the server anyway
in the long run.
E.g. with editorv2, assuming we don't store copy-to information somehow,
the server will have to do some rename maths on revision ranges it
serves so it can tell the client whether a delete is part of a move,
and what the other half of the move is. This will have to be done on the
server if the editorv2 api stays as it currently stands (it's still being
designed).
This might involve the server having to keep track of a mapping
{deleted paths -> added paths} while driving the editor, i.e. while
a client operation like merge or update is running. But I guess we
can get that to scale well if we do it right, even for very busy
repositories.

Stefan

Re: Why merge is so slow? (was Re: svn commit: r926210 - /subversion/trunk/notes/meetings/svn-vision-agenda)

Posted by "C. Michael Pilato" <cm...@collab.net>.
Bert Huijben wrote:
> If we don't want to change the editor just now we could just use the 'entry
> property' infrastructure to communicate the information that a specific copy
> is actually a move. (This would fix the cases of moving files and doesn't
> require any editor or implementation fixes.  Directory moves are not
> communicated as copies over the update editor in the current editorV1 code,
> so that would require a separate fix. But we can easily work around this
> using the capability negotiation we added in 1.5).

I've only just now gotten around to reading this thread, but I'm chuckling
over here at this part because I suggested exactly the same thing in NYC
last week!  It's not the cleanest way to communicate non-editor info from
the server to the client, but we have precedent for it already.  I'd be +1
on tossing an extra "entry prop" into the protocol if it means helping out
the tree conflict stuffs while stopping well short of a massive editor v2
rewrite.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand


Re: Why merge is so slow? (was Re: svn commit: r926210 - /subversion/trunk/notes/meetings/svn-vision-agenda)

Posted by "C. Michael Pilato" <cm...@collab.net>.
Bert Huijben wrote:
> If we don't want to change the editor just now we could just use the 'entry
> property' infrastructure to communicate the information that a specific copy
> is actually a move. (This would fix the cases of moving files and doesn't
> require any editor or implementation fixes.  Directory moves are not
> communicated as copies over the update editor in the current editorV1 code,
> so that would require a separate fix. But we can easily work around this
> using the capability negotiation we added in 1.5).

I've only just now gotten around to reading this thread, but I'm chuckling
over here at this part because I suggested exactly the same thing in NYC
last week!  It's not the cleanest way to communicate non-editor info from
the server to the client, but we have precedent for it already.  I'd be +1
on tossing an extra "entry prop" into the protocol if it means helping out
the tree conflict stuffs while stopping well short of a massive editor v2
rewrite.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand


RE: Why merge is so slow? (was Re: svn commit: r926210 - /subversion/trunk/notes/meetings/svn-vision-agenda)

Posted by Bert Huijben <be...@qqmail.nl>.

> -----Original Message-----
> From: Stefan Sperling [mailto:stsp@elego.de]
> Sent: dinsdag 23 maart 2010 14:15
> To: Mark Phippard
> Cc: Johan Corveleyn; Ivan Zhakov; hwright@apache.org;
> dev@subversion.apache.org
> Subject: Re: Why merge is so slow? (was Re: svn commit: r926210 -
> /subversion/trunk/notes/meetings/svn-vision-agenda)
> 
> On Tue, Mar 23, 2010 at 08:32:32AM -0400, Mark Phippard wrote:
> > On Tue, Mar 23, 2010 at 8:28 AM, Stefan Sperling <st...@elego.de> wrote:
> >
> > > In most setups I've seen the server hardware is much beefier than
> > > the client hardware, so unless we do things that scale really badly
> > > (say more than O(n^2)) I don't see a problem.
> >
> > Think of a hosting site like sf.net with thousands of SVN repos being
> > hit by many thousands of users.  How many of these operations do you
> > think the Apache server could manage before it ran out of RAM?
> 
> If such sites run a single server only and don't use write-through
> proxies to balance the load their setup is seriously wrong.
> 
> And I'd say users will happily accept more load on the server if that
> means that they get working renames in return. You can throw more
> machines at the performance problem, but not at the rename problem.

Handling true renames is a completely different issue then this performance
issue and/or editor v2. We want all three issues to be resolved, but the
issues don't depend on each other.

WC-NG makes the working copy ready for recording moves, but what is still
missing is support of true moves in the repository and filesystem.


If we don't want to change the editor just now we could just use the 'entry
property' infrastructure to communicate the information that a specific copy
is actually a move. (This would fix the cases of moving files and doesn't
require any editor or implementation fixes.  Directory moves are not
communicated as copies over the update editor in the current editorV1 code,
so that would require a separate fix. But we can easily work around this
using the capability negotiation we added in 1.5).


But then we need to update merge itself to handle the renames. This is most
likely the biggest chunk of work, but I'm not a merge expert.

	Bert

Re: Why merge is so slow? (was Re: svn commit: r926210 - /subversion/trunk/notes/meetings/svn-vision-agenda)

Posted by Stefan Sperling <st...@elego.de>.
On Tue, Mar 23, 2010 at 08:32:32AM -0400, Mark Phippard wrote:
> On Tue, Mar 23, 2010 at 8:28 AM, Stefan Sperling <st...@elego.de> wrote:
> 
> > In most setups I've seen the server hardware is much beefier than
> > the client hardware, so unless we do things that scale really badly
> > (say more than O(n^2)) I don't see a problem.
> 
> Think of a hosting site like sf.net with thousands of SVN repos being
> hit by many thousands of users.  How many of these operations do you
> think the Apache server could manage before it ran out of RAM?

If such sites run a single server only and don't use write-through
proxies to balance the load their setup is seriously wrong.

And I'd say users will happily accept more load on the server if that
means that they get working renames in return. You can throw more
machines at the performance problem, but not at the rename problem.

Stefan

Re: Why merge is so slow? (was Re: svn commit: r926210 - /subversion/trunk/notes/meetings/svn-vision-agenda)

Posted by Johan Corveleyn <jc...@gmail.com>.
On Tue, Mar 23, 2010 at 1:32 PM, Mark Phippard <ma...@gmail.com> wrote:
> On Tue, Mar 23, 2010 at 8:28 AM, Stefan Sperling <st...@elego.de> wrote:
>
>> In most setups I've seen the server hardware is much beefier than
>> the client hardware, so unless we do things that scale really badly
>> (say more than O(n^2)) I don't see a problem.
>
> Think of a hosting site like sf.net with thousands of SVN repos being
> hit by many thousands of users.  How many of these operations do you
> think the Apache server could manage before it ran out of RAM?  I am
> not saying we cannot ask the server to do more, but if you start
> having it hang on to a list of paths, as you suggest in the rename
> case, I do not see how it will not run into scenarios where that does
> not involve significant amounts of memory usage.

But surely there are some operations right now, in current svn, that
already consume _some_  RAM. Maybe someone has an idea of who the
current "bad boys" are, and what kind of memory usage they have in
normal scenario's and in worst-case scenario's? Of course, it also
depends if it's something that every user does all the time, or
something that's only executed relatively rarely.

Personally, I think that e.g. merging currently falls into the
"relatively rarely" category. I'm guessing that no more than 0,1 %
(maybe even much less) of those thousands of users may be merging at
the same time. Maybe that could change if merge were (a lot) faster,
but then we've got a chicken-and-egg on our hands :).

-- 
Johan

Re: Why merge is so slow? (was Re: svn commit: r926210 - /subversion/trunk/notes/meetings/svn-vision-agenda)

Posted by Mark Phippard <ma...@gmail.com>.
On Tue, Mar 23, 2010 at 8:28 AM, Stefan Sperling <st...@elego.de> wrote:

> In most setups I've seen the server hardware is much beefier than
> the client hardware, so unless we do things that scale really badly
> (say more than O(n^2)) I don't see a problem.

Think of a hosting site like sf.net with thousands of SVN repos being
hit by many thousands of users.  How many of these operations do you
think the Apache server could manage before it ran out of RAM?  I am
not saying we cannot ask the server to do more, but if you start
having it hang on to a list of paths, as you suggest in the rename
case, I do not see how it will not run into scenarios where that does
not involve significant amounts of memory usage.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/