You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Andrew Hardy <an...@gl-group.com> on 2009/12/18 12:52:58 UTC

svn dump and load - half size repository?

Hi,

Performed a dump of one repository to create a new one (for testing/backup/etc.). No errors reported during either operation. Both repositories are BDS.

The new repository is around half the size of the original - should I be worried? I notice that the 'revs' directory in the new repository has a different format to the old - the old had all files in the base directory, the new one has a new directory for every hundred revisions.

How can I compare two repositories to ensure that they are consistent up to a certain revision?

-- 
Andy

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2431480

Please start new threads on the <us...@subversion.apache.org> mailing list.
To subscribe to the new list, send an empty e-mail to <us...@subversion.apache.org>.

RE: svn dump and load - half size repository?

Posted by Andrew Hardy <an...@gl-group.com>.
Ryan, 

>-----Original Message-----
>From: Ryan Schmidt [mailto:subversion-2009d@ryandesign.com] 
>Sent: 18 December 2009 13:08
>To: Hardy, Andrew
>Cc: users@subversion.tigris.org
>Subject: Re: svn dump and load - half size repository?
>
>On Dec 18, 2009, at 06:52, Andrew Hardy wrote:
>
>> Performed a dump of one repository to create a new one (for 
>testing/backup/etc.). No errors reported during either 
>operation. Both repositories are BDS.
>
>Do you mean BDB -- BerkeleyDB?

Erm... No, fat-fingers time - should have been FSFS.

>The behavior you are observing in the revs directory is called 
>sharding and is new for Subversion 1.5, and was improved in 
>1.6, and can lead to significant space savings:

>http://subversion.tigris.org/svn_1.6_releasenotes.html#fsfs-packing

Thanks, I'll have a look at that. I believe that both repositories were
created with 1.5.6, but it's possible that the older was originally from
an earlier release...

>> How can I compare two repositories to ensure that they are 
>consistent up to a certain revision?
>
>Since no errors were reported during dump and load, I assume 
>your repositories are fine. You can additionally run "svnadmin 
>verify" on them.

Many thanks, I'll give that a quick run to make everyone feel a little
more certain that the new repository is fit for purpose.

This e-mail, and any attachments are strictly confidential and intended for the addressee(s) only.  The content may also contain legally privileged information.  If you are not the intended recipient, please notify the sender immediately, by return of email, and then delete the e-mail and any attachments.  You should not disclose, copy or take any action in reliance on this transmission.

Please ensure you have adequate virus protection before you open or detach any documents from this transmission.

GL Industrial Services UK Ltd is a company registered in England and Wales with company No. 3294136 and registered office at Holywell Park, Ashby Road, Loughborough,  Leicestershire, LE11 3GR, United Kingdom.   

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2431506

Please start new threads on the <us...@subversion.apache.org> mailing list.
To subscribe to the new list, send an empty e-mail to <us...@subversion.apache.org>.

RE: svn dump and load - half size repository?

Posted by Andrew Hardy <an...@gl-group.com>.
Mark, 

>-----Original Message-----
>From: Mark Phippard [mailto:markphip@gmail.com] 
>Sent: 18 December 2009 15:01
>To: Ryan Schmidt
>Cc: Hardy, Andrew; users@subversion.tigris.org
>Subject: Re: svn dump and load - half size repository?
>
>On Fri, Dec 18, 2009 at 8:08 AM, Ryan Schmidt 
><su...@ryandesign.com> wrote:
>> On Dec 18, 2009, at 06:52, Andrew Hardy wrote:

>> The behavior you are observing in the revs directory is called 
>> sharding and is new for Subversion 1.5, and was improved in 
>1.6, and can lead to significant space savings:
>
>Sharding does not specifically bring disk space saving, it 
>just helps some filesystem tools that slow down when you get 
>too many files in a single folder, like backup scanning or 
>browsing tools like Windows Explorer.  It may bring some inode 
>savings, but I am not sure.
>
>>
>> http://subversion.tigris.org/svn_1.6_releasenotes.html#fsfs-packing
>

>Packing will save a lot of disk space by freeing up all the 
>wasted space in unused clusters, and will also greatly reduce 
>inode usage.
>However, packing is not something that happens automatically, 
>you have to run svnadmin pack.

I might give that a whirl during a quiet period!

>The most likely culprit is the new rep-sharing feature in 1.6.  This
>allows duplicate "representations" to be shared.   So imagine you
>committed 5 copies of the exact same ISO image.  With 1.6, 
>only one of those would be "stored" and the other 4 would all 
>have pointers to the first.  We typically see a 10-20% space 
>improvement from this, but it is entirely dependent on how 
>much common content there is.  I would guess this is where the 
>saving came from since this is on by default for a newly 
>created repository.

We're still on 1.5.6, but this looks interesting...

Thanks for your help,
-- 
Andy

This e-mail, and any attachments are strictly confidential and intended for the addressee(s) only.  The content may also contain legally privileged information.  If you are not the intended recipient, please notify the sender immediately, by return of email, and then delete the e-mail and any attachments.  You should not disclose, copy or take any action in reliance on this transmission.

Please ensure you have adequate virus protection before you open or detach any documents from this transmission.

GL Industrial Services UK Ltd is a company registered in England and Wales with company No. 3294136 and registered office at Holywell Park, Ashby Road, Loughborough,  Leicestershire, LE11 3GR, United Kingdom.   

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2431508

Please start new threads on the <us...@subversion.apache.org> mailing list.
To subscribe to the new list, send an empty e-mail to <us...@subversion.apache.org>.

Re: svn dump and load - half size repository?

Posted by Mark Phippard <ma...@gmail.com>.
On Fri, Dec 18, 2009 at 8:08 AM, Ryan Schmidt
<su...@ryandesign.com> wrote:
> On Dec 18, 2009, at 06:52, Andrew Hardy wrote:
> The presence of the revs directory indicates this is not a BDB repository but an FSFS one.
> FSFS is the default repository type since Subversion 1.2.
>
> The behavior you are observing in the revs directory is called sharding and is new for
> Subversion 1.5, and was improved in 1.6, and can lead to significant space savings:

Sharding does not specifically bring disk space saving, it just helps
some filesystem tools that slow down when you get too many files in a
single folder, like backup scanning or browsing tools like Windows
Explorer.  It may bring some inode savings, but I am not sure.

>
> http://subversion.tigris.org/svn_1.6_releasenotes.html#fsfs-packing

Packing will save a lot of disk space by freeing up all the wasted
space in unused clusters, and will also greatly reduce inode usage.
However, packing is not something that happens automatically, you have
to run svnadmin pack.

The most likely culprit is the new rep-sharing feature in 1.6.  This
allows duplicate "representations" to be shared.   So imagine you
committed 5 copies of the exact same ISO image.  With 1.6, only one of
those would be "stored" and the other 4 would all have pointers to the
first.  We typically see a 10-20% space improvement from this, but it
is entirely dependent on how much common content there is.  I would
guess this is where the saving came from since this is on by default
for a newly created repository.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2431501

Please start new threads on the <us...@subversion.apache.org> mailing list.
To subscribe to the new list, send an empty e-mail to <us...@subversion.apache.org>.

Re: svn dump and load - half size repository?

Posted by Ryan Schmidt <su...@ryandesign.com>.
On Dec 18, 2009, at 06:52, Andrew Hardy wrote:

> Performed a dump of one repository to create a new one (for testing/backup/etc.). No errors reported during either operation. Both repositories are BDS.

Do you mean BDB -- BerkeleyDB?


> The new repository is around half the size of the original - should I be worried? I notice that the 'revs' directory in the new repository has a different format to the old - the old had all files in the base directory, the new one has a new directory for every hundred revisions.

The presence of the revs directory indicates this is not a BDB repository but an FSFS one. FSFS is the default repository type since Subversion 1.2.

The behavior you are observing in the revs directory is called sharding and is new for Subversion 1.5, and was improved in 1.6, and can lead to significant space savings:

http://subversion.tigris.org/svn_1.6_releasenotes.html#fsfs-packing

There can also be other differences in size between BDB and FSFS representations of the same repository.

BDB was the default in Subversion 1.1 and earlier, i.e. a long time ago. BerkeleyDB uses log files, and in older versions of BDB, old log files were not automatically deleted. No such logs would be present in your newly loaded repository, which could further account for the space savings you're seeing.


> How can I compare two repositories to ensure that they are consistent up to a certain revision?

Since no errors were reported during dump and load, I assume your repositories are fine. You can additionally run "svnadmin verify" on them.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2431485

Please start new threads on the <us...@subversion.apache.org> mailing list.
To subscribe to the new list, send an empty e-mail to <us...@subversion.apache.org>.