You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Leonardo Fernandes <le...@outsystems.com> on 2008/06/17 14:37:31 UTC

log -g performance

Hi.
I just wanted to know why log -g operations are much slower than normal
log, even when the revisions don't have any merge-info change?

Just a quick statistic. Revision 46237 in my repository changed two
files, and didn't change any property at all. By the way, this is the
common case in development, so probably 90% (or even more) of the
revisions of a repository will fall into this category.

> log -r 46237 $REPO
takes less than a second

> log -g -r 46237 $REPO
takes 12 seconds, besides the server CPU hitting 100%

Is there any task planned to improve these statistics?
Thanks.
Leonardo Fernandes

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: log -g performance

Posted by Marc Strapetz <ma...@syntevo.com>.
Karl Fogel wrote:
> So when a commit does both (that is, commits the result of a merge or
> merges, *and* includes new changes), should the marker be set?  What's
> the use case?  (Quoted context didn't say.)

To summarize and motivate the RFE(s) in short: Especially for GUI
clients it can be useful to retrieve mergeinfo for a range of revisions
(even for the whole repository), however currently there is no efficient
way to do this. The best approach I can currently see is to use log -g
but it means a rather large overhead for this task. So following
enhancements could be helpful here (and they can be more or less used to
substitute each other):

* log (without -g) reports a "marker" for *every* revision whether
   mergeinfo is present in the repository for the logged path (and
   its children). In this way the client knows for which revisions it
   should query for mergeinfo.

* log -g has an additional --depth parameter, --depth=0 would report
   no merged revisions, but the same marker as suggested above.

* The get-mergeinfo command (I'm referring to that of the svnserve
   protocol and its DAV counterpart) supports fetching mergeinfo for
   a range of revisions.

--
Best regards,
Marc Strapetz
_____________
SyntEvo GmbH
www.syntevo.com



Karl Fogel wrote:
> Marc Strapetz <ma...@syntevo.com> writes:
>>> One of the things I was thinking was that it would be nice (assuming
>>> it is not expensive), if a normal svn log could just return some kind
>>> of boolean for each revision that indicates if the revision was the
>>> commit of a merge.
>> That would be helpful as well. The suggested depth parameter requires
>> a protocol change from client to server and in case of depth=0
>> probably also from server to client. How about this mergeinfo marker
>> -- can it be introduced safely without breaking older clients?
>>
>> Can we get this topic into the issue tracker?
> 
> Let's discuss it here first, and if we decide to do it, then file an
> issue.
> 
> So when a commit does both (that is, commits the result of a merge or
> merges, *and* includes new changes), should the marker be set?  What's
> the use case?  (Quoted context didn't say.)
> 
> Best,
> -Karl
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: log -g performance

Posted by Karl Fogel <kf...@red-bean.com>.
Marc Strapetz <ma...@syntevo.com> writes:
>> One of the things I was thinking was that it would be nice (assuming
>> it is not expensive), if a normal svn log could just return some kind
>> of boolean for each revision that indicates if the revision was the
>> commit of a merge.
>
> That would be helpful as well. The suggested depth parameter requires
> a protocol change from client to server and in case of depth=0
> probably also from server to client. How about this mergeinfo marker
> -- can it be introduced safely without breaking older clients?
>
> Can we get this topic into the issue tracker?

Let's discuss it here first, and if we decide to do it, then file an
issue.

So when a commit does both (that is, commits the result of a merge or
merges, *and* includes new changes), should the marker be set?  What's
the use case?  (Quoted context didn't say.)

Best,
-Karl

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: log -g performance

Posted by Marc Strapetz <ma...@syntevo.com>.
> One of the things I was thinking was that it would be nice (assuming
> it is not expensive), if a normal svn log could just return some kind
> of boolean for each revision that indicates if the revision was the
> commit of a merge.

That would be helpful as well. The suggested depth parameter requires a 
protocol change from client to server and in case of depth=0 probably 
also from server to client. How about this mergeinfo marker -- can it be 
introduced safely without breaking older clients?

Can we get this topic into the issue tracker?

--
Best regards,
Marc Strapetz
_____________
SyntEvo GmbH
www.syntevo.com



Mark Phippard wrote:
> On Wed, Jun 18, 2008 at 9:16 AM, Marc Strapetz
> <ma...@syntevo.com> wrote:
>>>> I just wanted to know why log -g operations are much slower than normal
>>>> log, even when the revisions don't have any merge-info change?
>>> I suspect this is just the overhead of checking -- for each revision --
>>> whether or not the revision has a mergeinfo change.
>> For SmartSVN's local log cache, we are using log -g to detect those
>> revisions which have mergeinfo, afterwards (resp. concurrently) we are
>> querying for the mergeinfo of that revisions. The actual merged revision
>> numbers are not used, so that's quite an overhead, but AFAIK there is no
>> more efficient way to retrieve complete mergeinfo for a range of revisions?
>>
>> In this case I would like to make an RFE to provide more efficient access to
>> the mergeinfo. Currently I can see two alternative approaches:
>>
>> (1) Have a range parameter for the "get-mergeinfo" command
>>
>> (2) Have an additional "recursion depth" parameter for log -g. Depth 0 would
>> mean that the server should just signal that there is mergeinfo resp. there
>> are merged revision but should not send them.
>>
>> While (1) would probably be more efficient, (2) would definitely be a
>> surplus for command line users too: From the (rather bare) experience we
>> have made with log -g so far, it's likely going to report more than a user
>> needs. For example, we have a shared code base with three active and
>> persisting "feature branches". A normal log reports 462 revisions, log -g
>> reports 19542 revision, up to recursion depth 8 or so ...
> 
> One of the things I was thinking was that it would be nice (assuming
> it is not expensive), if a normal svn log could just return some kind
> of boolean for each revision that indicates if the revision was the
> commit of a merge.  In the normal output, maybe an * could be added
> somewhere.  Anyway, API users could use this boolean to do a followup
> request for those revisions which are merges to get the mergeinfo for
> that revision.  Presumably this would make it easier to build a
> responsive UI as you could defer getting the mergeinfo until you
> needed it.
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: log -g performance

Posted by Mark Phippard <ma...@gmail.com>.
On Wed, Jun 18, 2008 at 9:16 AM, Marc Strapetz
<ma...@syntevo.com> wrote:
>>> I just wanted to know why log -g operations are much slower than normal
>>> log, even when the revisions don't have any merge-info change?
>>
>> I suspect this is just the overhead of checking -- for each revision --
>> whether or not the revision has a mergeinfo change.
>
> For SmartSVN's local log cache, we are using log -g to detect those
> revisions which have mergeinfo, afterwards (resp. concurrently) we are
> querying for the mergeinfo of that revisions. The actual merged revision
> numbers are not used, so that's quite an overhead, but AFAIK there is no
> more efficient way to retrieve complete mergeinfo for a range of revisions?
>
> In this case I would like to make an RFE to provide more efficient access to
> the mergeinfo. Currently I can see two alternative approaches:
>
> (1) Have a range parameter for the "get-mergeinfo" command
>
> (2) Have an additional "recursion depth" parameter for log -g. Depth 0 would
> mean that the server should just signal that there is mergeinfo resp. there
> are merged revision but should not send them.
>
> While (1) would probably be more efficient, (2) would definitely be a
> surplus for command line users too: From the (rather bare) experience we
> have made with log -g so far, it's likely going to report more than a user
> needs. For example, we have a shared code base with three active and
> persisting "feature branches". A normal log reports 462 revisions, log -g
> reports 19542 revision, up to recursion depth 8 or so ...

One of the things I was thinking was that it would be nice (assuming
it is not expensive), if a normal svn log could just return some kind
of boolean for each revision that indicates if the revision was the
commit of a merge.  In the normal output, maybe an * could be added
somewhere.  Anyway, API users could use this boolean to do a followup
request for those revisions which are merges to get the mergeinfo for
that revision.  Presumably this would make it easier to build a
responsive UI as you could defer getting the mergeinfo until you
needed it.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: log -g performance

Posted by Marc Strapetz <ma...@syntevo.com>.
>> I just wanted to know why log -g operations are much slower than normal
>> log, even when the revisions don't have any merge-info change?
> 
> I suspect this is just the overhead of checking -- for each revision -- 
> whether or not the revision has a mergeinfo change.

For SmartSVN's local log cache, we are using log -g to detect those 
revisions which have mergeinfo, afterwards (resp. concurrently) we are 
querying for the mergeinfo of that revisions. The actual merged revision 
numbers are not used, so that's quite an overhead, but AFAIK there is no 
more efficient way to retrieve complete mergeinfo for a range of revisions?

In this case I would like to make an RFE to provide more efficient 
access to the mergeinfo. Currently I can see two alternative approaches:

(1) Have a range parameter for the "get-mergeinfo" command

(2) Have an additional "recursion depth" parameter for log -g. Depth 0 
would mean that the server should just signal that there is mergeinfo 
resp. there are merged revision but should not send them.

While (1) would probably be more efficient, (2) would definitely be a 
surplus for command line users too: From the (rather bare) experience we 
have made with log -g so far, it's likely going to report more than a 
user needs. For example, we have a shared code base with three active 
and persisting "feature branches". A normal log reports 462 revisions, 
log -g reports 19542 revision, up to recursion depth 8 or so ...

--
Best regards,
Marc Strapetz
_____________
SyntEvo GmbH
www.syntevo.com



C. Michael Pilato wrote:
> Leonardo Fernandes wrote:
>> Hi.
>> I just wanted to know why log -g operations are much slower than normal
>> log, even when the revisions don't have any merge-info change?
> 
> I suspect this is just the overhead of checking -- for each revision -- 
> whether or not the revision has a mergeinfo change.  This penalty is 
> small where no mergeinfo exists at all, but gets larger once mergeinfo 
> appears in the repository (and grows depending on the depth, 
> directory-wise, of the places where mergeinfo is set).
> 
> (Is that right, David?)
> 
> I think we could improve this by changing the way we detect these 
> differences.  Today, we do so by fetching all the mergeinfo in one 
> revision, then all the mergeinfo in another revision, and then comparing 
> them.  We'd probably be better served by adding an 
> svn_fs_mergeinfo_diff() function that could avoid crawling into 
> unchanged regions of the respective revision trees to even find 
> mergeinfo (since the mergeinfo found in both subtrees would be 
> identical, and thus not present in a diff of the two anyway).
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: log -g performance

Posted by "C. Michael Pilato" <cm...@collab.net>.
By the way, I've filed issue #3220 to track improvements to 'svn log -g' 
performance.


C. Michael Pilato wrote:
> Leonardo Fernandes wrote:
>> Hi.
>> I just wanted to know why log -g operations are much slower than normal
>> log, even when the revisions don't have any merge-info change?
> 
> I suspect this is just the overhead of checking -- for each revision -- 
> whether or not the revision has a mergeinfo change.  This penalty is 
> small where no mergeinfo exists at all, but gets larger once mergeinfo 
> appears in the repository (and grows depending on the depth, 
> directory-wise, of the places where mergeinfo is set).
> 
> (Is that right, David?)
> 
> I think we could improve this by changing the way we detect these 
> differences.  Today, we do so by fetching all the mergeinfo in one 
> revision, then all the mergeinfo in another revision, and then comparing 
> them.  We'd probably be better served by adding an 
> svn_fs_mergeinfo_diff() function that could avoid crawling into 
> unchanged regions of the respective revision trees to even find 
> mergeinfo (since the mergeinfo found in both subtrees would be 
> identical, and thus not present in a diff of the two anyway).
> 
> 


-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand


Re: log -g performance

Posted by David Glasser <gl...@davidglasser.net>.
On Tue, Jun 17, 2008 at 7:51 AM, C. Michael Pilato <cm...@collab.net> wrote:
> Leonardo Fernandes wrote:
>>
>> Hi.
>> I just wanted to know why log -g operations are much slower than normal
>> log, even when the revisions don't have any merge-info change?
>
> I suspect this is just the overhead of checking -- for each revision --
> whether or not the revision has a mergeinfo change.  This penalty is small
> where no mergeinfo exists at all, but gets larger once mergeinfo appears in
> the repository (and grows depending on the depth, directory-wise, of the
> places where mergeinfo is set).
>
> (Is that right, David?)

I don't actually know much about log -g (other than making sure that
the APIs it used at one point existed); Hyrum probably has more
insight here.

> I think we could improve this by changing the way we detect these
> differences.  Today, we do so by fetching all the mergeinfo in one revision,
> then all the mergeinfo in another revision, and then comparing them.  We'd
> probably be better served by adding an svn_fs_mergeinfo_diff() function that
> could avoid crawling into unchanged regions of the respective revision trees
> to even find mergeinfo (since the mergeinfo found in both subtrees would be
> identical, and thus not present in a diff of the two anyway).

That wouldn't be too tough to implement and, if you're accurately
diagnosing the problem, would probably help a lot.

--dave

-- 
David Glasser | glasser@davidglasser.net | http://www.davidglasser.net/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: log -g performance

Posted by "C. Michael Pilato" <cm...@collab.net>.
Leonardo Fernandes wrote:
> Hi.
> I just wanted to know why log -g operations are much slower than normal
> log, even when the revisions don't have any merge-info change?

I suspect this is just the overhead of checking -- for each revision -- 
whether or not the revision has a mergeinfo change.  This penalty is small 
where no mergeinfo exists at all, but gets larger once mergeinfo appears in 
the repository (and grows depending on the depth, directory-wise, of the 
places where mergeinfo is set).

(Is that right, David?)

I think we could improve this by changing the way we detect these 
differences.  Today, we do so by fetching all the mergeinfo in one revision, 
then all the mergeinfo in another revision, and then comparing them.  We'd 
probably be better served by adding an svn_fs_mergeinfo_diff() function that 
could avoid crawling into unchanged regions of the respective revision trees 
to even find mergeinfo (since the mergeinfo found in both subtrees would be 
identical, and thus not present in a diff of the two anyway).


-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand