You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by "D.J. Heap" <dj...@shadyvale.net> on 2005/05/08 15:57:58 UTC

Quick analysis of svn log --limit

I'm not terribly familiar with the code in libsvn_repos, but there are two 
places that seem to be the source of slowness even with --limit turned on:

subversion/libsvn_repos/log.c:svn_repos_get_logs3 -
   1.  When passing one path such as /trunk (or / which mod_dav_svn always 
seems to do even if no deeper path was specified) it performs the 
svn_repos_history2 call and spends a lot of time there gathering all revs 
for the path (which is nearly all the revs in the repo for /trunk and *is* 
all revs in the repo for /).  So, svn_repos_history needs to be revved to 
take a limit parameter?

   2.  Further down in the for loop to process the revs, there is a check to 
see if the rev was one identified by the previous svn_repos_history2 call 
(line 348).  This check is a brute force linear search.  While probably not 
a huge issue, it seems like it could changed to use a binary search since 
the revs array has been sorted above?  It could make a noticeable difference 
on repos with many thousands of revisions.

It seems like #1 is the major perf killer since it is essentially reading 
almost all revs in the common case of 'svn log --limit 10' on a working copy 
of trunk.

This is confirmed by the fact that 'svn log --limit 10 
svn://localhost/svn-repo' returns nearly instantly, where as 'svn log 
--limit 10 svn://localhost/svn-repo/trunk' takes about 10 seconds for me. 
The first case is an empty path in svn_repos_get_logs3 and so skips the 
svn_repos_history2 call.  Curiously, mod_dav_svn always passes at least '/' 
in the path and so never bypasses the svn_repos_history2 call.

DJ


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Quick analysis of svn log --limit

Posted by Chia-liang Kao <cl...@clkao.org>.
Peter N. Lundblad <peter <at> famlundblad.se> writes:
> > A potentially better change would be to rewrite svn_repos_get_logs3 to
> > not use svn_repos_history at all.  The new design would use the
> > svn_fs_history interface directly.  It would create a history object for
> > each provided path, and would step through them in parallel to produce a
> > sequence of revisions, stopping when it hits the limit.  I think that
> > would require the least amount of I/O work.
> >
> That's the solution I've had in mind for some time... I like it. You need
> a pair of iteration pools per path, don't you?

I think you can that the repos_history to respect return value
from the callback, which you store the limit in baton. this seems
more general and is what I had:

http://cpansearch.bulknews.net/markup/SVK-0.994/lib/SVK/Util.pm#l784

Cheers,
CLK



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Quick analysis of svn log --limit

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Sun, 8 May 2005, Greg Hudson wrote:

> A potentially better change would be to rewrite svn_repos_get_logs3 to
> not use svn_repos_history at all.  The new design would use the
> svn_fs_history interface directly.  It would create a history object for
> each provided path, and would step through them in parallel to produce a
> sequence of revisions, stopping when it hits the limit.  I think that
> would require the least amount of I/O work.
>
That's the solution I've had in mind for some time... I like it. You need
a pair of iteration pools per path, don't you?

Best,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Quick analysis of svn log --limit

Posted by Greg Hudson <gh...@MIT.EDU>.
On Sun, 2005-05-08 at 09:57 -0600, D.J. Heap wrote:
> subversion/libsvn_repos/log.c:svn_repos_get_logs3 -
>    1.  When passing one path such as /trunk (or / which mod_dav_svn always 
> seems to do even if no deeper path was specified) it performs the 
> svn_repos_history2 call and spends a lot of time there gathering all revs 
> for the path (which is nearly all the revs in the repo for /trunk and *is* 
> all revs in the repo for /).  So, svn_repos_history needs to be revved to 
> take a limit parameter?

That would probably be the simplest change, although it's not as
efficient as it could be.  (If I provide fifty paths and one of them
is /trunk, and the limit is 10, there's no reason to chase down ten
revisions of /trunk/this/file/hasnt/been/changed/in/ages.)

A potentially better change would be to rewrite svn_repos_get_logs3 to
not use svn_repos_history at all.  The new design would use the
svn_fs_history interface directly.  It would create a history object for
each provided path, and would step through them in parallel to produce a
sequence of revisions, stopping when it hits the limit.  I think that
would require the least amount of I/O work.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Quick analysis of svn log --limit

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
D.J. Heap wrote:
> Garrett Rooney wrote:
> [snip]
> 
>>
>> That seems likely to fix the problem, or if we don't want to rev 
>> svn_repos_history2 (perhaps so that such a change could be merged into 
>> a subsequent 1.2.x release) we could also change svn_repos_get_logs3 
>> to call svn_repos_history2 iteratively with progressively further back 
>> ranges of revisions until it finds enough revisions to satisfy the 
>> limit argument.
>>
>> -garrett
> 
> 
> Yes, both your and Greg's ideas would remove the need for an interface 
> change and significantly improve efficiency.  It seems like Greg's 
> method would be somewhat more efficient although a slightly more complex 
> change?

Yes, that seems to be the case.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Quick analysis of svn log --limit

Posted by "D.J. Heap" <dj...@shadyvale.net>.
Garrett Rooney wrote:
[snip]
> 
> That seems likely to fix the problem, or if we don't want to rev 
> svn_repos_history2 (perhaps so that such a change could be merged into a 
> subsequent 1.2.x release) we could also change svn_repos_get_logs3 to 
> call svn_repos_history2 iteratively with progressively further back 
> ranges of revisions until it finds enough revisions to satisfy the limit 
> argument.
> 
> -garrett

Yes, both your and Greg's ideas would remove the need for an interface 
change and significantly improve efficiency.  It seems like Greg's method 
would be somewhat more efficient although a slightly more complex change?

I will work on this, but it will likely not be ready for a couple of days -- 
I have some experimenting/learning to do.  So if someone else wants to get 
it done sooner, feel free.

DJ

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Quick analysis of svn log --limit

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
D.J. Heap wrote:
> I'm not terribly familiar with the code in libsvn_repos, but there are 
> two places that seem to be the source of slowness even with --limit 
> turned on:
> 
> subversion/libsvn_repos/log.c:svn_repos_get_logs3 -
>   1.  When passing one path such as /trunk (or / which mod_dav_svn 
> always seems to do even if no deeper path was specified) it performs the 
> svn_repos_history2 call and spends a lot of time there gathering all 
> revs for the path (which is nearly all the revs in the repo for /trunk 
> and *is* all revs in the repo for /).  So, svn_repos_history needs to be 
> revved to take a limit parameter?

That seems likely to fix the problem, or if we don't want to rev 
svn_repos_history2 (perhaps so that such a change could be merged into a 
subsequent 1.2.x release) we could also change svn_repos_get_logs3 to 
call svn_repos_history2 iteratively with progressively further back 
ranges of revisions until it finds enough revisions to satisfy the limit 
argument.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org