You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Garrett Rooney <ro...@electricjellyfish.net> on 2006/02/01 00:37:35 UTC

Peg revisions, diff, and repository roots

So I was using 'svn diff' today, and it struck me that the usual way I
run repos->repos diffs (i.e. svn diff -r FOO:BAR http://server/repos/)
is really really slow with large repositories.  If I add a @FOO to the
end of the repository URL the speed goes way the hell up.

Investigating a bit more, it seems to be in the location tracing code,
without the @FOO at the end we spend forever and a day sitting around
resolving our URLs to make sure they haven't moved around on us.

More investigation reveals that for repos->repos diffs it's pretty
much just -r FOO:BAR URL that's slow, if I do diff URL@FOO URL@BAR
it's fast, use the --new/--old syntax it's fast, etc.

I'm a little fuzzy on the logic behind peg revisions and diff, so bear
with me, but why the heck is there even a difference between -r
FOO:BAR URL and URL@FOO URL@BAR?  Isn't one just a shorthand for the
other?  And if there is a difference, why does the much more commonly
used case default to a behavior that results in terminally slow
results on large repositories?

Additionally, it seems awfully silly that we do history tracing at all
when pointed at the repository root.  It's not like it's going to move
on us if we go back through history...  Can we just special case that
to avoid this pain in at least that case?

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Peg revisions, diff, and repository roots

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 2/1/06, Garrett Rooney <ro...@electricjellyfish.net> wrote:

> Note that the two repositories I saw this on have not yet been
> upgraded to 1.3.x, so it may very well be the case that this is no
> longer a significant issue.  I'll attempt to find a 1.3.x repository
> with a large enough data set to test on later today.

I tried this with the GCC repository (which is both large and running
1.3.x) and wasn't able to reproduce the speed problem.  So either GCC
is running on some monster hardware, or we fixed the issue in 1.3.x
;-)

I guess that means I now have a concrete justification for moving
svn.apache.org to 1.3.x...

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Peg revisions, diff, and repository roots

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 2/1/06, Daniel Berlin <db...@dberlin.org> wrote:
> Garrett Rooney wrote:
> > On 1/31/06, Ben Collins-Sussman <su...@red-bean.com> wrote:
> >> On 1/31/06, Garrett Rooney <ro...@electricjellyfish.net> wrote:
> >>
> >>> It just seems like we should be going out of our way to make sure our
> >>> users have trouble entering commands that seem simple at first glance
> >>> but actually take a huge amount of time to complete.
> >>>
> >> I feel your pain.  It makes me wonder if ... don't kill me for saying
> >> this... if the results of a history-trace couldn't be cached in the
> >> working copy somehow.  It means the first time you run your diff
> >> command (or other command that does history-tracing), it will be slow.
> >>  But if you ever run the same command (or similar command) on the same
> >> path or URL again, the history data would be pulled out of the wc
> >> cache instead.  It would be safe to cache, because history is
> >> immutable, after all.
> >
> > FWIW, at least half the time, maybe more, when I do this sort of thing
> > it's from outside a working copy.
> >
> > -garrett
> >
>
> So uh, this shouldn't be slow anymore, anyway.
>
> If it is, we probably missed somewhere we need to be using closest_copy.
> Unless you really have a tree of 100k revisions where each revision is a
> copy of the previous one or something.
>
> In general, the history tracing should now be O(number of renames/copies
> of file).
>
>
> I'd start by seeing where we are going into a simple "walk every
> revision backwards to find renames" loop, and fix it to use closest copy.

Note that the two repositories I saw this on have not yet been
upgraded to 1.3.x, so it may very well be the case that this is no
longer a significant issue.  I'll attempt to find a 1.3.x repository
with a large enough data set to test on later today.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Peg revisions, diff, and repository roots

Posted by Daniel Berlin <db...@dberlin.org>.
Garrett Rooney wrote:
> On 1/31/06, Ben Collins-Sussman <su...@red-bean.com> wrote:
>> On 1/31/06, Garrett Rooney <ro...@electricjellyfish.net> wrote:
>>
>>> It just seems like we should be going out of our way to make sure our
>>> users have trouble entering commands that seem simple at first glance
>>> but actually take a huge amount of time to complete.
>>>
>> I feel your pain.  It makes me wonder if ... don't kill me for saying
>> this... if the results of a history-trace couldn't be cached in the
>> working copy somehow.  It means the first time you run your diff
>> command (or other command that does history-tracing), it will be slow.
>>  But if you ever run the same command (or similar command) on the same
>> path or URL again, the history data would be pulled out of the wc
>> cache instead.  It would be safe to cache, because history is
>> immutable, after all.
> 
> FWIW, at least half the time, maybe more, when I do this sort of thing
> it's from outside a working copy.
> 
> -garrett
> 

So uh, this shouldn't be slow anymore, anyway.

If it is, we probably missed somewhere we need to be using closest_copy.
Unless you really have a tree of 100k revisions where each revision is a
copy of the previous one or something.

In general, the history tracing should now be O(number of renames/copies
of file).


I'd start by seeing where we are going into a simple "walk every
revision backwards to find renames" loop, and fix it to use closest copy.
--Dan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Peg revisions, diff, and repository roots

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 1/31/06, Ben Collins-Sussman <su...@red-bean.com> wrote:
> On 1/31/06, Garrett Rooney <ro...@electricjellyfish.net> wrote:
>
> > It just seems like we should be going out of our way to make sure our
> > users have trouble entering commands that seem simple at first glance
> > but actually take a huge amount of time to complete.
> >
>
> I feel your pain.  It makes me wonder if ... don't kill me for saying
> this... if the results of a history-trace couldn't be cached in the
> working copy somehow.  It means the first time you run your diff
> command (or other command that does history-tracing), it will be slow.
>  But if you ever run the same command (or similar command) on the same
> path or URL again, the history data would be pulled out of the wc
> cache instead.  It would be safe to cache, because history is
> immutable, after all.

FWIW, at least half the time, maybe more, when I do this sort of thing
it's from outside a working copy.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Peg revisions, diff, and repository roots

Posted by Vincent Lefevre <vi...@vinc17.org>.
On 2006-01-31 19:19:24 -0600, kfogel@collab.net wrote:
> If we're going to be caching the results of queries against immutable
> data, methinks the repository is the place to do it...

Why not caching them in both the repository and the working copy?

-- 
Vincent Lefèvre <vi...@vinc17.org> - Web: <http://www.vinc17.org/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.org/blog/>
Work: CR INRIA - computer arithmetic / SPACES project at LORIA

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Peg revisions, diff, and repository roots

Posted by kf...@collab.net.
Ben Collins-Sussman <su...@red-bean.com> writes:
> On 1/31/06, Garrett Rooney <ro...@electricjellyfish.net> wrote:
> > It just seems like we should be going out of our way to make sure our
> > users have trouble entering commands that seem simple at first glance
> > but actually take a huge amount of time to complete.
> 
> I feel your pain.  It makes me wonder if ... don't kill me for saying
> this... if the results of a history-trace couldn't be cached in the
> working copy somehow.  It means the first time you run your diff
> command (or other command that does history-tracing), it will be slow.
>  But if you ever run the same command (or similar command) on the same
> path or URL again, the history data would be pulled out of the wc
> cache instead.  It would be safe to cache, because history is
> immutable, after all.

If we're going to be caching the results of queries against immutable
data, methinks the repository is the place to do it...

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Peg revisions, diff, and repository roots

Posted by Ben Collins-Sussman <su...@red-bean.com>.
On 1/31/06, Garrett Rooney <ro...@electricjellyfish.net> wrote:

> It just seems like we should be going out of our way to make sure our
> users have trouble entering commands that seem simple at first glance
> but actually take a huge amount of time to complete.
>

I feel your pain.  It makes me wonder if ... don't kill me for saying
this... if the results of a history-trace couldn't be cached in the
working copy somehow.  It means the first time you run your diff
command (or other command that does history-tracing), it will be slow.
 But if you ever run the same command (or similar command) on the same
path or URL again, the history data would be pulled out of the wc
cache instead.  It would be safe to cache, because history is
immutable, after all.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Peg revisions, diff, and repository roots

Posted by "C. Michael Pilato" <cm...@collab.net>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Garrett Rooney wrote:
> The problem, at least in this case, is that our policy of following
> peg revisions is terribly, painfully, horribly slow when presented
> with a repository of significant size.  Like, lets go get a cup of
> coffee and perhaps go purchase a new home, maybe it'll be done when we
> get back.

Oh, no -- trust me.  Purchasing a new home can take *much* longer than
any 'svn diff' I've ever run...

- --
C. Michael Pilato <cm...@collab.net>
CollabNet   <>   www.collab.net   <>   Distributed Development On Demand
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFD4LklokEGqRcG/W4RAjBCAKCTjhm7khHITJRtiwBp0CsyItiuJQCeJn6h
mkMXNtbhh2EN5DAue2X210s=
=NCq9
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Peg revisions, diff, and repository roots

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 1/31/06, Ben Collins-Sussman <su...@red-bean.com> wrote:
> On 1/31/06, Garrett Rooney <ro...@electricjellyfish.net> wrote:
>
> > > Consistency with other commands.  That's the party line, at least.  :-)
> >
> > /me still fails to see why defaulting to HEAD makes sense for the
> > other commands anyway...
>
> Ah!  The history of pegrevs!  It all started with this use-case:
>
> $ svn cat -rN foo.c
> svn: error:  foo.c doesn't exist in rN
>
> // user curses
>
> $ svn ls -rN URL
> svn:error:  URL doesn't exist in rN
>
> // user curses even more.
>
> $ svn log -v
> [user visually parses lots of junk, figures out old name]
>
> $ svn ls -rN slighty-older-URL
> [command works]
>
> [user sends angry mail to list saying "why don't you trace history?!?"]
>
>
> So we decided:  OK, fine.  If the user *ever* leaves off the @REV
> syntax on a path or URL argument, then it's our job to trace that path
> back in time.  We assume that 'path' really means 'path@BASE' and
> 'URL' really means 'URL@HEAD'.
>
> We began by adding this peg-behavior to 'cat' and 'ls', and from there
> it slowly spread to the other commands.  It may be slightly
> inconvenient (i.e. not optimized for the most common use-case in each
> command), but the theory is that at least the behavior is
> consistent... and thus predictable, easy to learn, easy to understand.

And yet, every time I have to deal with pegrevs I get a headache...

The problem, at least in this case, is that our policy of following
peg revisions is terribly, painfully, horribly slow when presented
with a repository of significant size.  Like, lets go get a cup of
coffee and perhaps go purchase a new home, maybe it'll be done when we
get back.

It just seems like we should be going out of our way to make sure our
users have trouble entering commands that seem simple at first glance
but actually take a huge amount of time to complete.

> > Sure, but it seems like this history trace code should be able to say
> > "duh, this is the root node, it's never going to move!" and bail early
> > or something...  And even the client side could check for the repos
> > root URL and bail early if we wanted to, although I'm not sure a round
> > trip is worth it.
>
> Yeah, actually, the WC is now storing the repository root url in the
> entries file.  So I guess it would work!

I'll look into this...

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Peg revisions, diff, and repository roots

Posted by Ben Collins-Sussman <su...@red-bean.com>.
On 1/31/06, Garrett Rooney <ro...@electricjellyfish.net> wrote:

> > Consistency with other commands.  That's the party line, at least.  :-)
>
> /me still fails to see why defaulting to HEAD makes sense for the
> other commands anyway...

Ah!  The history of pegrevs!  It all started with this use-case:

$ svn cat -rN foo.c
svn: error:  foo.c doesn't exist in rN

// user curses

$ svn ls -rN URL
svn:error:  URL doesn't exist in rN

// user curses even more.

$ svn log -v
[user visually parses lots of junk, figures out old name]

$ svn ls -rN slighty-older-URL
[command works]

[user sends angry mail to list saying "why don't you trace history?!?"]


So we decided:  OK, fine.  If the user *ever* leaves off the @REV
syntax on a path or URL argument, then it's our job to trace that path
back in time.  We assume that 'path' really means 'path@BASE' and
'URL' really means 'URL@HEAD'.

We began by adding this peg-behavior to 'cat' and 'ls', and from there
it slowly spread to the other commands.  It may be slightly
inconvenient (i.e. not optimized for the most common use-case in each
command), but the theory is that at least the behavior is
consistent... and thus predictable, easy to learn, easy to understand.


> Sure, but it seems like this history trace code should be able to say
> "duh, this is the root node, it's never going to move!" and bail early
> or something...  And even the client side could check for the repos
> root URL and bail early if we wanted to, although I'm not sure a round
> trip is worth it.

Yeah, actually, the WC is now storing the repository root url in the
entries file.  So I guess it would work!

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Peg revisions, diff, and repository roots

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 1/31/06, Ben Collins-Sussman <su...@red-bean.com> wrote:
> On 1/31/06, Garrett Rooney <ro...@electricjellyfish.net> wrote:
>
> > I'm a little fuzzy on the logic behind peg revisions and diff, so bear
> > with me, but why the heck is there even a difference between -r
> > FOO:BAR URL and URL@FOO URL@BAR?
>
> You know why, because one is doing peg-tracing, and one isn't.  :-)
>
> The first form begins with URL@HEAD, then traces history backwards to
> discover whatever URL used to be called in rFOO, and then repeats the
> trace backwards to rBAR as well.  The ultimate result is that the
> command may end up comparing two totally different paths in rFOO and
> rBAR.
>
> The second form skips the history tracing altogether;  you've already
> pinpointed the exact (rev, path) coordinates to compare.

Sure, I know that's the difference, I'm failing to see WHY there's a
difference though.  From a UI perspective, why is it a good thing that
-rFOO:BAR URL different from URL@FOO URL@BAR?  I'm perfectly willing
to stipulate that we probably can't change this now, I'm just trying
to understand the underlying reason for it was in the beginning.

> > And if there is a difference, why does the much more commonly
> > used case default to a behavior that results in terminally slow
> > results on large repositories?
>
> Consistency with other commands.  That's the party line, at least.  :-)

/me still fails to see why defaulting to HEAD makes sense for the
other commands anyway...

> > Additionally, it seems awfully silly that we do history tracing at all
> > when pointed at the repository root.  It's not like it's going to move
> > on us if we go back through history...  Can we just special case that
> > to avoid this pain in at least that case?
>
> How do we know it's the repository root?
>
> The logic which decides "do a history trace or not" is client-side,
> not server-side.

Sure, but it seems like this history trace code should be able to say
"duh, this is the root node, it's never going to move!" and bail early
or something...  And even the client side could check for the repos
root URL and bail early if we wanted to, although I'm not sure a round
trip is worth it.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Peg revisions, diff, and repository roots

Posted by Ben Collins-Sussman <su...@red-bean.com>.
On 1/31/06, Garrett Rooney <ro...@electricjellyfish.net> wrote:

> I'm a little fuzzy on the logic behind peg revisions and diff, so bear
> with me, but why the heck is there even a difference between -r
> FOO:BAR URL and URL@FOO URL@BAR?

You know why, because one is doing peg-tracing, and one isn't.  :-)

The first form begins with URL@HEAD, then traces history backwards to
discover whatever URL used to be called in rFOO, and then repeats the
trace backwards to rBAR as well.  The ultimate result is that the
command may end up comparing two totally different paths in rFOO and
rBAR.

The second form skips the history tracing altogether;  you've already
pinpointed the exact (rev, path) coordinates to compare.


> And if there is a difference, why does the much more commonly
> used case default to a behavior that results in terminally slow
> results on large repositories?

Consistency with other commands.  That's the party line, at least.  :-)

>
> Additionally, it seems awfully silly that we do history tracing at all
> when pointed at the repository root.  It's not like it's going to move
> on us if we go back through history...  Can we just special case that
> to avoid this pain in at least that case?

How do we know it's the repository root?

The logic which decides "do a history trace or not" is client-side,
not server-side.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org