You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Trevor Schaffer <Tr...@smarttech.com> on 2011/09/28 15:57:35 UTC

revs files growing over time, relatively

Server: Fedora Core 13 64-bit
SVN: 1.6.16 running on Apache 2.2.16

We have a repository that currently is in the 128K revision count, and is now standing at 40+GB in size.  Over time it's been growing faster, and we noticed a trend on svn copies... the db/revs/* files are getting bigger over time.

After cracking one open, and another random commit, I saw that the commit entry not only lists information about the current path and it's siblings, but it also enumerates all parents, and their siblings as well... all the way to the root.  This is a big issue for us, as we do a lot of build tagging, which means that every build tag commit lists the other thousands of build tags as well, which is why they are growing over time.  At the current size of 600KB/commit for a single svn copy into our /tags area, performing 20,000 commits covers about 12GB of size, which I think is quite significant.  Truthfully, probably 30+GB of this repo is just svn copies worth of commits.

Anyways, I'm trying to look for a solution to this, as we're starting to run out of room on the server it's on (we have many other repositories, which have grown from 100GB 9 months ago to 160GB now).

There are two questions that I have:
1. Is there any way I can recover the space and trim down the repo?
   a) svndump + filter.  I can filter out the tags, but there are legitimate tags we need to keep (release tags)
   b) start a new repository.  Like the first, but it means we have to keep the old one around read-only.
2. What is the best strategy for keeping the repository optimized?
   a) more layers in our tags/ folder structure to keep those commits a little smaller.

If anyone has any idea on what we can do with this, I would appreciate it.

In the meantime, I'm going to try to dump+load the repo in the hopes that there were some optimizations in svn 1.6 that will help.  We've had this repo running for nearly 5 years, since svn 1.4.

I will also try out svn 1.7, but I don't know how soon we can get that into production. We are relying on Subversion Edge, so it can switch when that becomes live.



Re: revs files growing over time, relatively

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
If those tags don't contain mods you might be able to simply note the
directory/revision that were tagged and lose the tags entirely.

Otherwise, rewriting the /tags tree's history such that it is sharded
(perhaps by quarter as per elsethread) is a non-lossy, but trickier,
option.

Trevor Schaffer wrote on Wed, Sep 28, 2011 at 10:08:15 -0600:
> Thanks.  I was expecting that the back end didn't change much either.  I'll look closer at filtering out the commits we don't care about to try to minimize the tags that we have.   
> 
> -----Original Message-----
> From: Daniel Shahaf [mailto:d.s@daniel.shahaf.name] 
> Sent: Wednesday, September 28, 2011 9:35 AM
> To: Trevor Schaffer
> Cc: users@subversion.apache.org
> Subject: Re: revs files growing over time, relatively
> 
> Trevor Schaffer wrote on Wed, Sep 28, 2011 at 07:57:35 -0600:
> > If anyone has any idea on what we can do with this, I would appreciate
> > it.
> > 
> > In the meantime, I'm going to try to dump+load the repo in the hopes
> > that there were some optimizations in svn 1.6 that will help.  We've
> > had this repo running for nearly 5 years, since svn 1.4.
> > 
> > I will also try out svn 1.7, but I don't know how soon we can get that
> > into production. We are relying on Subversion Edge, so it can switch
> > when that becomes live.
> 
> The storage of directories in FSFS has not changed recently, I expect
> you'll see the same issue in 1.7.
> 
> It seems what you'd like to do is rewrite your history such that
> directory sizes (max # of siblings) is small.  That sounds doable.
> 
> I'm not sure offhand whether the BDB backend has the same issue with
> storing large directories.
> 
> 

RE: revs files growing over time, relatively

Posted by Trevor Schaffer <Tr...@smarttech.com>.
Thanks.  I was expecting that the back end didn't change much either.  I'll look closer at filtering out the commits we don't care about to try to minimize the tags that we have.   

-----Original Message-----
From: Daniel Shahaf [mailto:d.s@daniel.shahaf.name] 
Sent: Wednesday, September 28, 2011 9:35 AM
To: Trevor Schaffer
Cc: users@subversion.apache.org
Subject: Re: revs files growing over time, relatively

Trevor Schaffer wrote on Wed, Sep 28, 2011 at 07:57:35 -0600:
> If anyone has any idea on what we can do with this, I would appreciate
> it.
> 
> In the meantime, I'm going to try to dump+load the repo in the hopes
> that there were some optimizations in svn 1.6 that will help.  We've
> had this repo running for nearly 5 years, since svn 1.4.
> 
> I will also try out svn 1.7, but I don't know how soon we can get that
> into production. We are relying on Subversion Edge, so it can switch
> when that becomes live.

The storage of directories in FSFS has not changed recently, I expect
you'll see the same issue in 1.7.

It seems what you'd like to do is rewrite your history such that
directory sizes (max # of siblings) is small.  That sounds doable.

I'm not sure offhand whether the BDB backend has the same issue with
storing large directories.



Re: revs files growing over time, relatively

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Trevor Schaffer wrote on Wed, Sep 28, 2011 at 07:57:35 -0600:
> If anyone has any idea on what we can do with this, I would appreciate
> it.
> 
> In the meantime, I'm going to try to dump+load the repo in the hopes
> that there were some optimizations in svn 1.6 that will help.  We've
> had this repo running for nearly 5 years, since svn 1.4.
> 
> I will also try out svn 1.7, but I don't know how soon we can get that
> into production. We are relying on Subversion Edge, so it can switch
> when that becomes live.

The storage of directories in FSFS has not changed recently, I expect
you'll see the same issue in 1.7.

It seems what you'd like to do is rewrite your history such that
directory sizes (max # of siblings) is small.  That sounds doable.

I'm not sure offhand whether the BDB backend has the same issue with
storing large directories.

Re: revs files growing over time, relatively

Posted by Konstantin Kolinko <kn...@gmail.com>.
2011/9/28 Trevor Schaffer <Tr...@smarttech.com>:
> Yes, that basically will have to be our plan... going forward.  Now to try to deal with what we have right now.
>
> I am a bit puzzled why the commit requires any information about other folders seemingly not associated in any way with this particular commit though.  I'm sure the database has it for its reasons, so I won't question that too much.
>
> -----Original Message-----
> From: Andy Levy [mailto:andy.levy@gmail.com]
> Sent: Wednesday, September 28, 2011 9:16 AM
> To: Trevor Schaffer
> Cc: Stefan Sperling; users@subversion.apache.org
> Subject: Re: revs files growing over time, relatively
>
> On Wed, Sep 28, 2011 at 11:01, Trevor Schaffer
> <Tr...@smarttech.com> wrote:
>> We definitely use svn copy for revisions, but I think the issue is because our tags are too flat vs not flat enough.
>>
>> E.g. tags/builds/ is where we put all of our tags (all done with svn copy)
>> And over time, this folder has over 7000 tags in it.  So now, ever new tag we put into /tags/builds has a listing of every other tag from tags/builds in the revs/db/<rev> file.  Now when you have 4 lines in each rev file for each entry in the tags folder, that's 28000 lines of text, which gives us the roughly 600KB of data.  And you can see how this grows over time.  So obviously, in hindsight... we need to lessen the tags significantly, but current processes in our company will not allow us to do that quickly, or easily.
>
> Would it be possible to clean up your tags directory, maybe on a
> quarterly or annual basis? So instead of having a flat /tags/builds
> "dumping ground", you have;
>
> /tags/builds/
> /tags/builds/2011Q1
> /tags/builds/2011Q2
> /tags/builds/2011Q3
> /tags/builds/2011Q4
>
> Your tags go into /tags/builds as they do today, but then at the end
> of each quarter, you move that quarter's builds into the corresponding
> directory.
>

Huh. Top-posing is bad.

When you create a new tag "tags/FOO" it results of a new version of
"tags" directory.

Subversion is not able to compare two directories and store a "delta"
between them. Instead it stores a new complete version of the "tags"
directory.

(I think that gives better performance when someone needs to read that
directory).

Thus, as you noted, 7000 tags become 600KB of data.

I agree with Andy's proposal to shard tags directory into several
sections, e.g. by date.


Though it would be easier if those quaterly directories were outside
of your "builds" directory. E.g.:
 /tags/builds/
 /tags/builds-archive-2011-01/

In is easy to do "svn mv ^/tags/builds ^/tags/builds-archive-2011-01/"
in one transaction.
Then you follow it by "svn mkdir ^/tags/builds"

Note:
1) svnmucc allows to perform both mv and mkdir in one commit
2) you can still access your old tags at the old place it you use peg revisions,
 /tags/builds/MYOLDTAG@oldrevnumber

Best regards,
Konstantin Kolinko

RE: revs files growing over time, relatively

Posted by Trevor Schaffer <Tr...@smarttech.com>.
Yes, that basically will have to be our plan... going forward.  Now to try to deal with what we have right now.

I am a bit puzzled why the commit requires any information about other folders seemingly not associated in any way with this particular commit though.  I'm sure the database has it for its reasons, so I won't question that too much.

-----Original Message-----
From: Andy Levy [mailto:andy.levy@gmail.com] 
Sent: Wednesday, September 28, 2011 9:16 AM
To: Trevor Schaffer
Cc: Stefan Sperling; users@subversion.apache.org
Subject: Re: revs files growing over time, relatively

On Wed, Sep 28, 2011 at 11:01, Trevor Schaffer
<Tr...@smarttech.com> wrote:
> We definitely use svn copy for revisions, but I think the issue is because our tags are too flat vs not flat enough.
>
> E.g. tags/builds/ is where we put all of our tags (all done with svn copy)
> And over time, this folder has over 7000 tags in it.  So now, ever new tag we put into /tags/builds has a listing of every other tag from tags/builds in the revs/db/<rev> file.  Now when you have 4 lines in each rev file for each entry in the tags folder, that's 28000 lines of text, which gives us the roughly 600KB of data.  And you can see how this grows over time.  So obviously, in hindsight... we need to lessen the tags significantly, but current processes in our company will not allow us to do that quickly, or easily.

Would it be possible to clean up your tags directory, maybe on a
quarterly or annual basis? So instead of having a flat /tags/builds
"dumping ground", you have;

/tags/builds/
/tags/builds/2011Q1
/tags/builds/2011Q2
/tags/builds/2011Q3
/tags/builds/2011Q4

Your tags go into /tags/builds as they do today, but then at the end
of each quarter, you move that quarter's builds into the corresponding
directory.

> -----Original Message-----
> From: Stefan Sperling [mailto:stsp@elego.de]
> Sent: Wednesday, September 28, 2011 8:23 AM
> To: Trevor Schaffer
> Cc: users@subversion.apache.org
> Subject: Re: revs files growing over time, relatively
>
> On Wed, Sep 28, 2011 at 07:57:35AM -0600, Trevor Schaffer wrote:
>> Server: Fedora Core 13 64-bit
>> SVN: 1.6.16 running on Apache 2.2.16
>>
>> We have a repository that currently is in the 128K revision count, and is now standing at 40+GB in size.  Over time it's been growing faster, and we noticed a trend on svn copies... the db/revs/* files are getting bigger over time.
>>
>> After cracking one open, and another random commit, I saw that the
>> commit entry not only lists information about the current path and
>> it's siblings, but it also enumerates all parents, and their siblings
>> as well... all the way to the root.
>
> Yes, this is how "cheap copies" in Subversion work.
> There is no way of changing this.
> It's called the "bubble up effect" which is described neatly at
> http://red-bean.com/kfogel/beautiful-code/bc-chapter-02.html
>
>> This is a big issue for us, as we
>> do a lot of build tagging, which means that every build tag commit
>> lists the other thousands of build tags as well, which is why they are
>> growing over time.  At the current size of 600KB/commit for a single
>> svn copy into our /tags area, performing 20,000 commits covers about
>> 12GB of size, which I think is quite significant.  Truthfully,
>> probably 30+GB of this repo is just svn copies worth of commits.
>
> I am surprised that the copy references take that much space.
> How deeply nested are the tag folders in the tree?
> You could try to flatten out the hierarchy of your repository.
>
>> In the meantime, I'm going to try to dump+load the repo in the hopes that there were some optimizations in svn 1.6 that will help.  We've had this repo running for nearly 5 years, since svn 1.4.
>
> Yes, try that. 1.6 and upwards support "representation-sharing" which
> might cut down the size of the repository significantly.
>
> You could also try packing your repositories with 'svnadmin pack'.
> This will cause the repository to use less inodes and free up space
> lost to disk blocks which aren't completely filled up with data.
>
> I am curious about how you create tags. Do you really run 'svn copy'
> or do you maybe use 'svn import' to create them (which would bloat the
> repository significantly if representation-sharing is disabled)?
>
>
>



Re: revs files growing over time, relatively

Posted by Nico Kadel-Garcia <nk...@gmail.com>.
On Wed, Sep 28, 2011 at 11:15 AM, Andy Levy <an...@gmail.com> wrote:
> On Wed, Sep 28, 2011 at 11:01, Trevor Schaffer
> <Tr...@smarttech.com> wrote:
>> We definitely use svn copy for revisions, but I think the issue is because our tags are too flat vs not flat enough.
>>
>> E.g. tags/builds/ is where we put all of our tags (all done with svn copy)
>> And over time, this folder has over 7000 tags in it.  So now, ever new tag we put into /tags/builds has a listing of every other tag from tags/builds in the revs/db/<rev> file.  Now when you have 4 lines in each rev file for each entry in the tags folder, that's 28000 lines of text, which gives us the roughly 600KB of data.  And you can see how this grows over time.  So obviously, in hindsight... we need to lessen the tags significantly, but current processes in our company will not allow us to do that quickly, or easily.
>
> Would it be possible to clean up your tags directory, maybe on a
> quarterly or annual basis? So instead of having a flat /tags/builds
> "dumping ground", you have;
>
> /tags/builds/
> /tags/builds/2011Q1
> /tags/builds/2011Q2
> /tags/builds/2011Q3
> /tags/builds/2011Q4
>
> Your tags go into /tags/builds as they do today, but then at the end
> of each quarter, you move that quarter's builds into the corresponding
> directory.

May I suggest "tag-builds/2011Q1", instead? This helps prevent some
poor beggar from checking out "tags" and getting their local disk
space annihilated.

Re: revs files growing over time, relatively

Posted by Andy Levy <an...@gmail.com>.
On Wed, Sep 28, 2011 at 11:01, Trevor Schaffer
<Tr...@smarttech.com> wrote:
> We definitely use svn copy for revisions, but I think the issue is because our tags are too flat vs not flat enough.
>
> E.g. tags/builds/ is where we put all of our tags (all done with svn copy)
> And over time, this folder has over 7000 tags in it.  So now, ever new tag we put into /tags/builds has a listing of every other tag from tags/builds in the revs/db/<rev> file.  Now when you have 4 lines in each rev file for each entry in the tags folder, that's 28000 lines of text, which gives us the roughly 600KB of data.  And you can see how this grows over time.  So obviously, in hindsight... we need to lessen the tags significantly, but current processes in our company will not allow us to do that quickly, or easily.

Would it be possible to clean up your tags directory, maybe on a
quarterly or annual basis? So instead of having a flat /tags/builds
"dumping ground", you have;

/tags/builds/
/tags/builds/2011Q1
/tags/builds/2011Q2
/tags/builds/2011Q3
/tags/builds/2011Q4

Your tags go into /tags/builds as they do today, but then at the end
of each quarter, you move that quarter's builds into the corresponding
directory.

> -----Original Message-----
> From: Stefan Sperling [mailto:stsp@elego.de]
> Sent: Wednesday, September 28, 2011 8:23 AM
> To: Trevor Schaffer
> Cc: users@subversion.apache.org
> Subject: Re: revs files growing over time, relatively
>
> On Wed, Sep 28, 2011 at 07:57:35AM -0600, Trevor Schaffer wrote:
>> Server: Fedora Core 13 64-bit
>> SVN: 1.6.16 running on Apache 2.2.16
>>
>> We have a repository that currently is in the 128K revision count, and is now standing at 40+GB in size.  Over time it's been growing faster, and we noticed a trend on svn copies... the db/revs/* files are getting bigger over time.
>>
>> After cracking one open, and another random commit, I saw that the
>> commit entry not only lists information about the current path and
>> it's siblings, but it also enumerates all parents, and their siblings
>> as well... all the way to the root.
>
> Yes, this is how "cheap copies" in Subversion work.
> There is no way of changing this.
> It's called the "bubble up effect" which is described neatly at
> http://red-bean.com/kfogel/beautiful-code/bc-chapter-02.html
>
>> This is a big issue for us, as we
>> do a lot of build tagging, which means that every build tag commit
>> lists the other thousands of build tags as well, which is why they are
>> growing over time.  At the current size of 600KB/commit for a single
>> svn copy into our /tags area, performing 20,000 commits covers about
>> 12GB of size, which I think is quite significant.  Truthfully,
>> probably 30+GB of this repo is just svn copies worth of commits.
>
> I am surprised that the copy references take that much space.
> How deeply nested are the tag folders in the tree?
> You could try to flatten out the hierarchy of your repository.
>
>> In the meantime, I'm going to try to dump+load the repo in the hopes that there were some optimizations in svn 1.6 that will help.  We've had this repo running for nearly 5 years, since svn 1.4.
>
> Yes, try that. 1.6 and upwards support "representation-sharing" which
> might cut down the size of the repository significantly.
>
> You could also try packing your repositories with 'svnadmin pack'.
> This will cause the repository to use less inodes and free up space
> lost to disk blocks which aren't completely filled up with data.
>
> I am curious about how you create tags. Do you really run 'svn copy'
> or do you maybe use 'svn import' to create them (which would bloat the
> repository significantly if representation-sharing is disabled)?
>
>
>

RE: revs files growing over time, relatively

Posted by Trevor Schaffer <Tr...@smarttech.com>.
We definitely use svn copy for revisions, but I think the issue is because our tags are too flat vs not flat enough.

E.g. tags/builds/ is where we put all of our tags (all done with svn copy)
And over time, this folder has over 7000 tags in it.  So now, ever new tag we put into /tags/builds has a listing of every other tag from tags/builds in the revs/db/<rev> file.  Now when you have 4 lines in each rev file for each entry in the tags folder, that's 28000 lines of text, which gives us the roughly 600KB of data.  And you can see how this grows over time.  So obviously, in hindsight... we need to lessen the tags significantly, but current processes in our company will not allow us to do that quickly, or easily.

I will try out the svnadmin pack now, but I did a test repo with only 10 commits and it shows the same potential for growth.  If you want, I can give you the commands to run to build a test repo to see what I mean.


-----Original Message-----
From: Stefan Sperling [mailto:stsp@elego.de] 
Sent: Wednesday, September 28, 2011 8:23 AM
To: Trevor Schaffer
Cc: users@subversion.apache.org
Subject: Re: revs files growing over time, relatively

On Wed, Sep 28, 2011 at 07:57:35AM -0600, Trevor Schaffer wrote:
> Server: Fedora Core 13 64-bit
> SVN: 1.6.16 running on Apache 2.2.16
> 
> We have a repository that currently is in the 128K revision count, and is now standing at 40+GB in size.  Over time it's been growing faster, and we noticed a trend on svn copies... the db/revs/* files are getting bigger over time.
> 
> After cracking one open, and another random commit, I saw that the
> commit entry not only lists information about the current path and
> it's siblings, but it also enumerates all parents, and their siblings
> as well... all the way to the root.

Yes, this is how "cheap copies" in Subversion work.
There is no way of changing this.
It's called the "bubble up effect" which is described neatly at
http://red-bean.com/kfogel/beautiful-code/bc-chapter-02.html

> This is a big issue for us, as we
> do a lot of build tagging, which means that every build tag commit
> lists the other thousands of build tags as well, which is why they are
> growing over time.  At the current size of 600KB/commit for a single
> svn copy into our /tags area, performing 20,000 commits covers about
> 12GB of size, which I think is quite significant.  Truthfully,
> probably 30+GB of this repo is just svn copies worth of commits.

I am surprised that the copy references take that much space.
How deeply nested are the tag folders in the tree? 
You could try to flatten out the hierarchy of your repository.

> In the meantime, I'm going to try to dump+load the repo in the hopes that there were some optimizations in svn 1.6 that will help.  We've had this repo running for nearly 5 years, since svn 1.4.

Yes, try that. 1.6 and upwards support "representation-sharing" which
might cut down the size of the repository significantly.

You could also try packing your repositories with 'svnadmin pack'.
This will cause the repository to use less inodes and free up space
lost to disk blocks which aren't completely filled up with data.

I am curious about how you create tags. Do you really run 'svn copy'
or do you maybe use 'svn import' to create them (which would bloat the
repository significantly if representation-sharing is disabled)?



Re: revs files growing over time, relatively

Posted by Stefan Sperling <st...@elego.de>.
On Wed, Sep 28, 2011 at 07:57:35AM -0600, Trevor Schaffer wrote:
> Server: Fedora Core 13 64-bit
> SVN: 1.6.16 running on Apache 2.2.16
> 
> We have a repository that currently is in the 128K revision count, and is now standing at 40+GB in size.  Over time it's been growing faster, and we noticed a trend on svn copies... the db/revs/* files are getting bigger over time.
> 
> After cracking one open, and another random commit, I saw that the
> commit entry not only lists information about the current path and
> it's siblings, but it also enumerates all parents, and their siblings
> as well... all the way to the root.

Yes, this is how "cheap copies" in Subversion work.
There is no way of changing this.
It's called the "bubble up effect" which is described neatly at
http://red-bean.com/kfogel/beautiful-code/bc-chapter-02.html

> This is a big issue for us, as we
> do a lot of build tagging, which means that every build tag commit
> lists the other thousands of build tags as well, which is why they are
> growing over time.  At the current size of 600KB/commit for a single
> svn copy into our /tags area, performing 20,000 commits covers about
> 12GB of size, which I think is quite significant.  Truthfully,
> probably 30+GB of this repo is just svn copies worth of commits.

I am surprised that the copy references take that much space.
How deeply nested are the tag folders in the tree? 
You could try to flatten out the hierarchy of your repository.

> In the meantime, I'm going to try to dump+load the repo in the hopes that there were some optimizations in svn 1.6 that will help.  We've had this repo running for nearly 5 years, since svn 1.4.

Yes, try that. 1.6 and upwards support "representation-sharing" which
might cut down the size of the repository significantly.

You could also try packing your repositories with 'svnadmin pack'.
This will cause the repository to use less inodes and free up space
lost to disk blocks which aren't completely filled up with data.

I am curious about how you create tags. Do you really run 'svn copy'
or do you maybe use 'svn import' to create them (which would bloat the
repository significantly if representation-sharing is disabled)?