You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@subversion.apache.org by Daniel Berlin <db...@dberlin.org> on 2007/05/09 15:44:50 UTC

log --limit not as good as it should be?

(This is running against a 1.5.0 dev build server, so the server
definitely supports the limit stuff.  The same behavior is also found
running against http)

svn log -r 1:400 --limit 400 svn://gcc.gnu.org/svn/gcc/trunk

This will respond immediately and produce 400 revisions of log output

svn log -r 1:HEAD --limit 400 svn://gcc.gnu.org/svn/gcc/trunk

This will take about 3-4 minutes before starting to respond, and
produce the same 400 revisions of log output.

It looks like something is touching all the revisions that *might* be
logged before we start producing log output at all.

This is stlll true even if I use pegrevs (my first guess was that we
were trying to figure out what paths to actually log before we logged
them).

I would have expected the two commands above to take the same amount
of time, assuming there are at least <limit> revisons to log.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: log --limit not as good as it should be?

Posted by Daniel Berlin <db...@dberlin.org>.

On 5/9/07, David James <ja...@cs.toronto.edu> wrote:
> On 5/9/07, Daniel Berlin <db...@dberlin.org> wrote:
> > On 5/9/07, David James <ja...@cs.toronto.edu> wrote:
> > > On 5/9/07, Daniel Berlin <db...@dberlin.org> wrote:
> > > > [... snip ...]
> > > > Yeah, it may slow down if you log obscure directories with almost no
> > > > changes.  But i don't believe this is the common case.
> > >
> > > How dramatic is the slowdown from your optimization if you run 'svn
> > > log' on a file or directory which has few changes? From reading your
> > > email, it seems like the slowdown will be very minor, but correct me
> > > if I am wrong.
> >
> > Minor, because discover_changed_paths will see there is nothing there
> > *really* quickly (should be O(1), i think)
> >
> > It's just you are going to have say, 10-20 more O(1) calls that take a
> > few milliseconds.
> >
> > >
> > > I do often run 'svn log --limit N' on an individual file or directory
> > > to read about the last N changes to a particular file or directory. I
> > > also find it handy to run "svn log -r1:HEAD --limit 1 --stop-on-copy"
> > > to find the last revision in which a file was copied, or "svn log
> > > -r1:HEAD --limit 1" to find the revision in which a file was created.
> > > So far I haven't noticed any performance problems with these
> > > operations, but if your change will have a dramatic effect on these
> > > cases you might want to think about that.
> >
> > If you haven't had performance problems, you haven't tried it on a
> > large enough repo.  My suggestion would should make those commands
> > very slightly slower.  Let's say you had a million revision, and we
> > picked a batch size of 20000.
> >
> > If the file was changed 4 times, and all in the last 20k revisions, we
> > will have 49 useless calls (1-20k, 20k-40k) taking O(1) time each.
> > If the file was changed 4 times, spread evenly among the batches, we
> > do the same amount of work we used to.
> > If the file was changed thousands of times, spread evenly among the
> > batches, we do a lot less work than we used to.
> >
> > I expect some combination of 2 and 3 is the common case.  Even for 1,
> > it shouldn't drop the performance very much.
>
> Great! In that case, +1 on the optimization. It sounds like it will be
> very helpful when users need to grab the revisions in reverse order
> for a big repository.

Actually, you have it backwards

It will be very helpful when they grab the revs in normal order.
When they ask for the revs in reverse order we can already stream it :)
>
> Cheers,
>
> David
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: log --limit not as good as it should be?

Posted by David James <ja...@cs.toronto.edu>.

On 5/9/07, Daniel Berlin <db...@dberlin.org> wrote:
> On 5/9/07, David James <ja...@cs.toronto.edu> wrote:
> > On 5/9/07, Daniel Berlin <db...@dberlin.org> wrote:
> > > [... snip ...]
> > > Yeah, it may slow down if you log obscure directories with almost no
> > > changes.  But i don't believe this is the common case.
> >
> > How dramatic is the slowdown from your optimization if you run 'svn
> > log' on a file or directory which has few changes? From reading your
> > email, it seems like the slowdown will be very minor, but correct me
> > if I am wrong.
>
> Minor, because discover_changed_paths will see there is nothing there
> *really* quickly (should be O(1), i think)
>
> It's just you are going to have say, 10-20 more O(1) calls that take a
> few milliseconds.
>
> >
> > I do often run 'svn log --limit N' on an individual file or directory
> > to read about the last N changes to a particular file or directory. I
> > also find it handy to run "svn log -r1:HEAD --limit 1 --stop-on-copy"
> > to find the last revision in which a file was copied, or "svn log
> > -r1:HEAD --limit 1" to find the revision in which a file was created.
> > So far I haven't noticed any performance problems with these
> > operations, but if your change will have a dramatic effect on these
> > cases you might want to think about that.
>
> If you haven't had performance problems, you haven't tried it on a
> large enough repo.  My suggestion would should make those commands
> very slightly slower.  Let's say you had a million revision, and we
> picked a batch size of 20000.
>
> If the file was changed 4 times, and all in the last 20k revisions, we
> will have 49 useless calls (1-20k, 20k-40k) taking O(1) time each.
> If the file was changed 4 times, spread evenly among the batches, we
> do the same amount of work we used to.
> If the file was changed thousands of times, spread evenly among the
> batches, we do a lot less work than we used to.
>
> I expect some combination of 2 and 3 is the common case.  Even for 1,
> it shouldn't drop the performance very much.

Great! In that case, +1 on the optimization. It sounds like it will be
very helpful when users need to grab the revisions in reverse order
for a big repository.

Cheers,

David

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: log --limit not as good as it should be?

Posted by Daniel Berlin <db...@dberlin.org>.

On 5/9/07, David James <ja...@cs.toronto.edu> wrote:
> On 5/9/07, Daniel Berlin <db...@dberlin.org> wrote:
> > [... snip ...]
> > Yeah, it may slow down if you log obscure directories with almost no
> > changes.  But i don't believe this is the common case.
>
> How dramatic is the slowdown from your optimization if you run 'svn
> log' on a file or directory which has few changes? From reading your
> email, it seems like the slowdown will be very minor, but correct me
> if I am wrong.

Minor, because discover_changed_paths will see there is nothing there
*really* quickly (should be O(1), i think)

It's just you are going to have say, 10-20 more O(1) calls that take a
few milliseconds.

>
> I do often run 'svn log --limit N' on an individual file or directory
> to read about the last N changes to a particular file or directory. I
> also find it handy to run "svn log -r1:HEAD --limit 1 --stop-on-copy"
> to find the last revision in which a file was copied, or "svn log
> -r1:HEAD --limit 1" to find the revision in which a file was created.
> So far I haven't noticed any performance problems with these
> operations, but if your change will have a dramatic effect on these
> cases you might want to think about that.

If you haven't had performance problems, you haven't tried it on a
large enough repo.  My suggestion would should make those commands
very slightly slower.  Let's say you had a million revision, and we
picked a batch size of 20000.

If the file was changed 4 times, and all in the last 20k revisions, we
will have 49 useless calls (1-20k, 20k-40k) taking O(1) time each.
If the file was changed 4 times, spread evenly among the batches, we
do the same amount of work we used to.
If the file was changed thousands of times, spread evenly among the
batches, we do a lot less work than we used to.

I expect some combination of 2 and 3 is the common case.  Even for 1,
it shouldn't drop the performance very much.
>
> Cheers,
>
> David
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: log --limit not as good as it should be?

Posted by David James <ja...@cs.toronto.edu>.

On 5/9/07, Daniel Berlin <db...@dberlin.org> wrote:
> [... snip ...]
> Yeah, it may slow down if you log obscure directories with almost no
> changes.  But i don't believe this is the common case.

How dramatic is the slowdown from your optimization if you run 'svn
log' on a file or directory which has few changes? From reading your
email, it seems like the slowdown will be very minor, but correct me
if I am wrong.

I do often run 'svn log --limit N' on an individual file or directory
to read about the last N changes to a particular file or directory. I
also find it handy to run "svn log -r1:HEAD --limit 1 --stop-on-copy"
to find the last revision in which a file was copied, or "svn log
-r1:HEAD --limit 1" to find the revision in which a file was created.
So far I haven't noticed any performance problems with these
operations, but if your change will have a dramatic effect on these
cases you might want to think about that.

Cheers,

David

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: log --limit not as good as it should be?

Posted by Daniel Berlin <db...@dberlin.org>.

On 5/9/07, C. Michael Pilato <cm...@collab.net> wrote:
> Daniel Berlin wrote:
> > (This is running against a 1.5.0 dev build server, so the server
> > definitely supports the limit stuff.  The same behavior is also found
> > running against http)
> >
> > svn log -r 1:400 --limit 400 svn://gcc.gnu.org/svn/gcc/trunk
> >
> > This will respond immediately and produce 400 revisions of log output
> >
> > svn log -r 1:HEAD --limit 400 svn://gcc.gnu.org/svn/gcc/trunk
> >
> > This will take about 3-4 minutes before starting to respond, and
> > produce the same 400 revisions of log output.
> >
> > It looks like something is touching all the revisions that *might* be
> > logged before we start producing log output at all.
>
> Yes.  We can only trace history backwards, so anytime you run a log request
> with oldest-to-youngest direction in your range, the revisions have to be
> determined up front, then reported in reverse.

Well, they don't, actually.

To get the history from 1 to 124000, You can do the history from 10000
to 1, reverse and send,  then 20000 to 10000, reverse and send, etc.

This makes no sense when you don't have a limit, but when you have a
limit that is relatively small compared to the greatest revnum you are
asking about, this will win.
>
> Your first command gathers all the changed revs between 400 and the revision
> in which the path came into being, then reports the first 400 of them in
> reverse.
>
> You second command gathers all the changed revs between HEAD and the
> revision in which the path came into being, then ports the first 400 of them
> in reverse.
>
>
Yes we discovered this tracing through code on IRC.
Doing O(revisions that changed between firstrev and lastrev) work up
front, instead of incrementally reversing in batches of say 10000, is
the wrong tradeoff to make.

The first wll end up with less work if you log  with limit some random
obscure directory, but for the common case, where a *lot* of revisions
have changed on the path, the limit is what matters.

If the user gives a low limit relative to the total number of
revisions (say < log_end_rev / 1 0), we should be doing the reversing
in batches.

This will be O(greatest_rev / batch size) work worst case, but it
won't take 5 minutes to respond, *and* it's very lkely we will hit the
limit before we hit the end anyway.

5 minutes is not a guess here.  That's really how long it takes to
discover the path changed 90000 times, then start to log 400 of them.

Yeah, it may slow down if you log obscure directories with almost no
changes.  But i don't believe this is the common case.

When the path has changed a lot, you end up doing less work by doing
it incrementally when their is a limit.

--Dan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: log --limit not as good as it should be?

Posted by "C. Michael Pilato" <cm...@collab.net>.

Daniel Berlin wrote:
> (This is running against a 1.5.0 dev build server, so the server
> definitely supports the limit stuff.  The same behavior is also found
> running against http)
> 
> svn log -r 1:400 --limit 400 svn://gcc.gnu.org/svn/gcc/trunk
> 
> This will respond immediately and produce 400 revisions of log output
> 
> svn log -r 1:HEAD --limit 400 svn://gcc.gnu.org/svn/gcc/trunk
> 
> This will take about 3-4 minutes before starting to respond, and
> produce the same 400 revisions of log output.
> 
> It looks like something is touching all the revisions that *might* be
> logged before we start producing log output at all.

Yes.  We can only trace history backwards, so anytime you run a log request
with oldest-to-youngest direction in your range, the revisions have to be
determined up front, then reported in reverse.

Your first command gathers all the changed revs between 400 and the revision
in which the path came into being, then reports the first 400 of them in
reverse.

You second command gathers all the changed revs between HEAD and the
revision in which the path came into being, then ports the first 400 of them
in reverse.

-- 
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand