You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Ben Collins-Sussman <su...@collab.net> on 2004/12/30 07:32:53 UTC

Re: Minimizing repository growth when large files change....

On Dec 28, 2004, at 3:27 PM, Peter Valdemar Mørch wrote:
>
> The fsfs repository uses 11% of the space the bdb repository does - for
> the exact same files! Hurray!
>
>                    bdb         fsfs
>                    raw files   raw files
> First commit:      124.56      15.69
> Second commit:     270.44      29.49
> Rep Growth:        145.88      13.79
> Rep Growth Ratio:  117%        88%
>
> Repository size vs.
> Sum of file sizes
> after 2nd commit:  188%        20.4%

This sounds really weird to me.  I mean, we're all aware that fsfs uses 
*some* less space than bdb... like 20% less, I thought, was the rule of 
thumb.

But 90% less space?  Is something really fishy going on here?  If the 
script below really reproduces this, should we investigate?


>
> Peter
>
> -- 
> Peter Valdemar Mørch
> http://www.morch.com
>
>
> --
> Script to reproduce:
>
> #!/bin/bash
>
> file1=f1
> file2=f2
> # file1=F1.gz
> # file2=F2.gz
>
> rm -rf rep dir/
> # svnadmin create --fs-type fsfs rep
> svnadmin create --fs-type bdb rep
> export r=file://`pwd`/rep
> svn mkdir -m "" $r/dir
> svn co $r/dir
>
> cp $file1 dir/file
>
> svn add dir/file
> svn ci -m "" dir
>
> svnadmin list-unused-dblogs rep/ | xargs rm -f
> echo
> echo "Repos size 1"
> calc.pl `du -s --block-size=1 rep | sed s/rep//` / 1024 / 1024
>
> svn ci -m "" dir
>
> cp $file2 dir/file
> svn ci -m "" dir
>
> svnadmin list-unused-dblogs rep/ | xargs rm -f
> echo
> echo "Repos size 2"
> calc.pl `du -s --block-size=1 rep | sed s/rep//` / 1024 / 1024
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: users-help@subversion.tigris.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: Minimizing repository growth when large files change....

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On Thursday, December 30, 2004 1:32 AM -0600 Ben Collins-Sussman 
<su...@collab.net> wrote:

> This sounds really weird to me.  I mean, we're all aware that fsfs uses
> *some* less space than bdb... like 20% less, I thought, was the rule of
> thumb.
>
> But 90% less space?  Is something really fishy going on here?  If the
> script below really reproduces this, should we investigate?

Well, BDB may not be as efficient.  From their docs:

<http://www.sleepycat.com/docs/ref/am_misc/diskspace.html>

"Space freed by deleting key/data pairs from a Btree or Hash database is 
never returned to the filesystem, although it is reused where possible. 
This means that the Btree and Hash databases are grow-only. If enough keys 
are deleted from a database that shrinking the underlying file is 
desirable, you should create a new database and copy the records from the 
old one into it."

Here's a data point with a certain repository with a dump w/~120k revisions:

BDB on a straight load:  6.3GB
FSFS on a straight load: 3.5GB
BDB after a db_dump/db_load cycle: 4.7GB

So, after a BDB dump/load, yes, it's within ~20% of FSFS.  However, I bet 
large BDB temporary transactions (such as we do for a commit) causes a 
spike in the size and that is never really recouped...  (BDB 4.2.52, FWIW.)

HTH.  -- justin

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org