You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@subversion.apache.org by Andreas Hoegger <An...@bsiag.com> on 2010/01/06 12:28:22 UTC

Subversion Histroy question

Hi all,
We have been using SVN for years. Our repositories raised in size over the years. To get our repositories easier to maintain/backup we are looking for a possibility to get rid of unused old revisions. 

E.g. Out of a repository of 10'000 revisions the interest is in the last 1'000 revisions. Is there a way to remove revision 1-8'999 without losing the correct revision number 10'0000. 

My idea was to dump revision 9'000 to 10'000 and load it into a new repository. The load unfortunately starts with revision 1 and ends with the head revision 1'000 what ended up in a mess since several users checked out the revision 10'000 before the dump and are not anymore able to commit against the new repository's head revision 1'000. 

Is there a suitable solution for this issue? Is there a bug request concerning this problem?

BR 
Andreas Hoegger

______________________________________________________ 
BSI Business Systems Integration AG
Andreas Michael Hoegger
Täfernstrasse 16a
CH-5405 Baden

business (direct): +41 (0)56 484 16 87
mail:              andreas.hoegger@bsiag.com
web:               www.bsiag.com
______________________________________________________

Re: Subversion Histroy question

Posted by Ryan Schmidt <su...@ryandesign.com>.

On Jan 6, 2010, at 06:28, Andreas Hoegger wrote:

> We have been using SVN for years. Our repositories raised in size over the years. To get our repositories easier to maintain/backup we are looking for a possibility to get rid of unused old revisions. 
> 
> E.g. Out of a repository of 10'000 revisions the interest is in the last 1'000 revisions. Is there a way to remove revision 1-8'999 without losing the correct revision number 10'0000. 
> 
> My idea was to dump revision 9'000 to 10'000 and load it into a new repository. The load unfortunately starts with revision 1 and ends with the head revision 1'000 what ended up in a mess since several users checked out the revision 10'000 before the dump and are not anymore able to commit against the new repository's head revision 1'000. 
> 
> Is there a suitable solution for this issue?

You could commit 9000 empty revisions to the new empty repository before you load the dumpfile. But either way the new repository is not the same as the old one, and therefore has a new UUID, and therefore all clients will need to check out new working copies (and will not be able to commit anymore from their existing working copies) regardless of whether you renumber revisions or not.

Deleting old revisions will also not necessarily save you space. It may in fact make your repository larger. The standard advice is not to attempt this procedure. See also the various prior requests for an "svnadmin obliterate" function which may have more information.

> Is there a bug request concerning this problem?

I don't think it's considered a bug.

Re: Subversion Histroy question

Posted by Johan Corveleyn <jc...@gmail.com>.

On Fri, Jan 8, 2010 at 3:50 PM, Bob Archer <Bo...@amsi.com> wrote:
>
>> I think the only way SVN can come close to acceptable performance
>> for
>> this use case (large file, lots of revs) would be to precompute and
>> cache that data on the server side, so it has the data ready like
>> CVS
>> does. If anyone would implement this, I for one would be very happy
>> :D.
>
> tortoise svn caches log data on the client. Of course, that doesn't help you if you don't use a windows client.

Thanks for the suggestion. Yes that helps for "svn log" (however, we
try to do everything from our IDE, which is IntelliJ, and it's svn
plugin doesn't have such a feature). Also, our build tools, which use
just the native command line client, suffer from the same problem of
slow log.

I don't think it helps for blame, as far as I tried. I also tried with
a web viewer like FishEye, which is supposed to cache line-change and
log data in a database. However, it also choked massively on this big
file with lots of changes (and it's not really practical to open such
a big file in a browser).

Regards,
Johan

RE: Subversion Histroy question

Posted by Bob Archer <Bo...@amsi.com>.

> I think the only way SVN can come close to acceptable performance
> for
> this use case (large file, lots of revs) would be to precompute and
> cache that data on the server side, so it has the data ready like
> CVS
> does. If anyone would implement this, I for one would be very happy
> :D.

tortoise svn caches log data on the client. Of course, that doesn't help you if you don't use a windows client.

BOb

Re: Subversion Histroy question

Posted by Johan Corveleyn <jc...@gmail.com>.

On Fri, Jan 8, 2010 at 2:16 PM, Erik Huelsmann <eh...@gmail.com> wrote:
>> Anyway, I can understand Andreas' desire to clean out old history
>> that's no longer needed, and that's only slowing things down. But
>> sometimes, having that entire history is very useful, so we chose not
>> to do that. Instead, we learned to live with the current limitations
>> for now, and we hope they will be improved someday...
>
> Ok. I see your problem now. Did you know you can restrict blame to a
> subrange of the revisions? That will give you at least a way to limit
> the 4 hours.

Yes I know, but that's not all that useful. If the line I'm looking at
was really last changed in revision 328, then I want to know that (and
want to see that revision's log message). But it might also have been
edited in revision 105000. I cannot easily discern those two cases
unless I blame the whole revision range.

Also interesting to know: if I use one of the "ignore-whitespace"
options, it's a lot faster, down to 1 hour (though that's still quite
unusable). My hypothesis here is that some very large delta's become
very small because they are changing whitespace in the entire file
(happens from time to time, if someone accidentally re-indents the
file, or changes tabs to spaces and such).

> Comparing with CVS is logical, but CVS has these values
> precomputed: ready for you to use. So, obviously, it'll be faster than
> any scheme which needs to do calculations to find the same
> information.

Yes that's true. CVS just has that information readily available
because that's the way it stores its versioned data.

I think the only way SVN can come close to acceptable performance for
this use case (large file, lots of revs) would be to precompute and
cache that data on the server side, so it has the data ready like CVS
does. If anyone would implement this, I for one would be very happy
:D.

Regards,
Johan

Re: Subversion Histroy question

Posted by Erik Huelsmann <eh...@gmail.com>.

> Anyway, I can understand Andreas' desire to clean out old history
> that's no longer needed, and that's only slowing things down. But
> sometimes, having that entire history is very useful, so we chose not
> to do that. Instead, we learned to live with the current limitations
> for now, and we hope they will be improved someday...

Ok. I see your problem now. Did you know you can restrict blame to a
subrange of the revisions? That will give you at least a way to limit
the 4 hours. Comparing with CVS is logical, but CVS has these values
precomputed: ready for you to use. So, obviously, it'll be faster than
any scheme which needs to do calculations to find the same
information.


Bye,


Erik

Re: Subversion Histroy question

Posted by Johan Corveleyn <jc...@gmail.com>.

On Wed, Jan 6, 2010 at 9:07 PM, Ryan Schmidt
<su...@ryandesign.com> wrote:
>
> On Jan 6, 2010, at 12:31, Andreas Hoegger wrote:
>
>> Yes I do. You can imagine if 150 developers have been working for 5 years nobody will use 'blame'/'annotate' commands except for you feel infinitely bored at least with version 1.4 (it would take hours). Is there really nobody having the same problems?
>
> You're saying you're using Subversion 1.4 and "svn blame" takes hours to run? That doesn't sound like it should be.
>
> I use "svn blame" probably daily in my work on the MacPorts project. It's not slow. We use Subversion 1.6 now, but I don't remember "blame" ever being slow; it returns in seconds. Our repository has over 62,000 revisions and is 7.5 years old. We have over 120 registered committers, but probably only a few dozen are particularly active at the moment. But "blame" is very helpful to me in trying to figure out why a file says what it says. Just a couple days ago I used "blame" to research the complete 2-year history of a particular line of code, to try to understand why it was there (and not because I was bored):
>

Just jumping in here to support Andreas' complaint about blame being
slow: yes it's definitely slow. See:
http://subversion.tigris.org/issues/show_bug.cgi?id=3074 - Improve
performance of svn annotate

However, the slowness only hits you when you blame a large file with a
lot of revisions. With "large file" meaning more than a couple hundred
KB's, and "lot of revisions" meaning more than thousand or so.
Apparently, this is not so common for "source" files (that you'd want
to blame), so not a lot of svn users actually experience this. But in
our repo we have a 2 Mb xml file with 6000 revisions on which blame is
very useful. Blame on that file takes more than 4 hours so we don't do
it anymore. Before SVN we were on CVS, and there it took 15 seconds.
So I can certainly feel Andreas' pain :(. I can only hope that more
and more users hit this problem, so that it gets some more attention
and the issue will be addressed some day...

Also, log is slow (though not as bad as blame): "svn log"-ing that
6000 revision file takes 4 minutes or so (back on cvs that was 5
seconds). Both problems are not network related (everything on a lan
here), but they do have different causes:
- blame is mainly client-side io-bound (client gets 6000 binary
delta's and computes the line-based blame on the client side).
- log is server-side io-bound (server crawls 6000 FSFS rev files to
get those 6000 log messages).

Our server is 1.5.4 on Solaris.
Access by https (I tried http and svnserve, no significant difference)
Back end is FSFS on a NAS (over NFS).

Having the backend on a NAS over NFS might be a factor for log
slowness (when we upgrade to 1.6.x, I hope that packing the repo might
help here). I've experimented with putting the repo on local disk, and
that about halved the log time (down to 2 minutes, woohoo :)). Also,
BDB might have better performance characteristics for this (I haven't
tested this, but this might be a reason why the svn dev's have not hit
this themselves).

Anyway, I can understand Andreas' desire to clean out old history
that's no longer needed, and that's only slowing things down. But
sometimes, having that entire history is very useful, so we chose not
to do that. Instead, we learned to live with the current limitations
for now, and we hope they will be improved someday...

Regards,
Johan

Re: Subversion Histroy question

Posted by Ryan Schmidt <su...@ryandesign.com>.

On Jan 6, 2010, at 12:31, Andreas Hoegger wrote:

> Yes I do. You can imagine if 150 developers have been working for 5 years nobody will use 'blame'/'annotate' commands except for you feel infinitely bored at least with version 1.4 (it would take hours). Is there really nobody having the same problems?

You're saying you're using Subversion 1.4 and "svn blame" takes hours to run? That doesn't sound like it should be.

I use "svn blame" probably daily in my work on the MacPorts project. It's not slow. We use Subversion 1.6 now, but I don't remember "blame" ever being slow; it returns in seconds. Our repository has over 62,000 revisions and is 7.5 years old. We have over 120 registered committers, but probably only a few dozen are particularly active at the moment. But "blame" is very helpful to me in trying to figure out why a file says what it says. Just a couple days ago I used "blame" to research the complete 2-year history of a particular line of code, to try to understand why it was there (and not because I was bored):

http://trac.macports.org/ticket/20586#comment:10

AW: Subversion Histroy question

Posted by Andreas Hoegger <An...@bsiag.com>.

Hi Erik,

Yes I do. You can imagine if 150 developers have been working for 5 years nobody will use 'blame'/'annotate' commands except for you feel infinitely bored at least with version 1.4 (it would take hours). Is there really nobody having the same problems?

Bye Andy

-----Ursprüngliche Nachricht-----
Von: Erik Huelsmann [mailto:ehuels@gmail.com] 
Gesendet: Mittwoch, 6. Januar 2010 18:11
An: Andreas Hoegger
Cc: users@subversion.apache.org
Betreff: Re: Subversion Histroy question

Hi Andreas,

On Wed, Jan 6, 2010 at 1:28 PM, Andreas Hoegger
<An...@bsiag.com> wrote:
> Hi all,
> We have been using SVN for years. Our repositories raised in size over the years. To get our repositories easier to maintain/backup we are looking for a possibility to get rid of unused old revisions.

You're aware that - if you used the 'blame'/'annotate' command -
you're using the old revisions, right?

bye,

Erik.

Re: Subversion Histroy question

Posted by Erik Huelsmann <eh...@gmail.com>.

Hi Andreas,

On Wed, Jan 6, 2010 at 1:28 PM, Andreas Hoegger
<An...@bsiag.com> wrote:
> Hi all,
> We have been using SVN for years. Our repositories raised in size over the years. To get our repositories easier to maintain/backup we are looking for a possibility to get rid of unused old revisions.

You're aware that - if you used the 'blame'/'annotate' command -
you're using the old revisions, right?


bye,

Erik.