You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by J Kramer <kr...@gmail.com> on 2006/07/19 20:36:51 UTC

Repository Size Growing VERY Quickly

All,

At my workplace we use SVN to manage our software development.  We
commit rather frequently as that seems good policy, but the size of the
SVN repository (on the server) is growing quite rapidly--too rapidly for us.

Part of the problem may be that our project has a lot of binary files
to manage.  As I understand it, SVN does do binary diffs in order to
reduce the data storage, but is it really efficient at that?
Are there any tips that you guys have for reducing the size of the
repository?  There are certain files which are convenient to be under
version control but for which we don't really need many past versions.
Is there some setting that we can put on a file to have SVN keep only
the few most recent versions.  Also, is there a way to clean out the
repository?  For instance, if I have versions 1 ... 100, but I
determine that versions 2, 27, 43, 81, and 100 are the important
milestones, is there a way to get rid of the in between versions to
save space while still being able to revert to those particular
versions?

Thanks very much for any thoughts,

John

Re: Repository Size Growing VERY Quickly

Posted by Jared Hardy <ja...@gmail.com>.
J Kramer wrote:

> As I understand it, SVN does do binary diffs in order to
> reduce the data storage, but is it really efficient at that?
>
My company is running an SVN repository that is majority binary files.
Even at 10K+ revisions, and 20K+ files, the whole Repository store is
about the same size as the Working Copy (both about 18 GB total). Most
of the files are fairly raw art files, or small binary exports, with
small incremental changes committed after creation time. Vdelta
definitely can't deal with compressed, especially lossy compressed,
files very well. I recommend keeping only smallish, raw source binary
files. Any files that can be built whole have no direct need for being
versioned, though we do version some "interim build" type objects if
they have a strong dependency with current project state. Definitely
save any compression steps in your build process until after versioning.
We also version all internally developed build tool executables, since
our project has a strong build-version to art-data dependency.

> Are there any tips that you guys have for reducing the size of the
> repository?

I had similar questions before I converted our project to use
Subversion. I finally gave up on them, because there simply isn't
another SCM system out there that handles versioning our binary files so
well, with so little network and storage overhead. The closest at the
time was Avid AlienBrain, which had a nice "bucketing" system for
performing the features you describe, and lots of nice Artist-centric
interfaces. Unfortunately, the rest of the repository model was
bonehead-stupid, slow, disk-hungry, and the licensing costs were
ridiculous. Perforce was evaluated for a long time, but it simply
doesn't handle binary file versioning well at all (it just numbers them,
gzips them, and puts them in a big directory), and requires a lot more
maintenance to keep the database metadata storage fast. Also, Perforce
had far too little tolerance for our frequent-offline work style. CVS
also had obvious major issues dealing with binary files properly.
    I figure it's easier to buy more storage now, than to worry about
it. There are tricks you can do with branches and dumpfilters. I've even
considered periodically dumping the whole repo, filtering all but the
most recent revisions into one new repo, filtering the rest into an
"archive" repo, and punting the archive repo off to a cheaper (slower
ATA disk) NAS storage volume, and keeping it around as a separate
read-only repo. It just turns out that I gave it plenty of disk space up
front -- way more than we needed -- so I haven't needed to do any of
that yet.

    It's a matter of costs, really. Which is more costly to you:
frequent maintenance time to only keep needed binary versions, periods
of down time to partition off the repo history, or just buying bigger
disks once in a great while, and moving the repo to those larger volumes
when needed? To me, disk is cheap, and just keeps getting bigger and
cheaper by the day. Definitely cheaper than my time.

:) Jared

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Repository Size Growing VERY Quickly

Posted by Les Mikesell <le...@gmail.com>.
On Thu, 2006-07-20 at 01:36, Ulrich Eckhardt wrote:

> Note that generally the result of compiling things (or any kind of generated 
> files) should not be under version control because you can always check out 
> the version and regenerate them. This way of thinking has one minor error 
> though and that is that you can't always retrieve the whole version including 
> the environment that was used to build it, e.g. because you upgraded a 
> compiler or changed the OS. In those cases, adding the generated files when 
> making a release is a good idea.

Back in the days when there were big differences in compiler
versions a group here would actually check the compiler and
any other tools needed to build a version into CVS along with
a script to extract the correct environment.  They could still
easily make a fix to anything they've released in the last
10 years or more (which just saving the compiled binaries
wouldn't permit).  By contrast another group another group
always builds on the same machine and has trouble if anything
ever changes on it - but the nature of their product makes
it less likely that they would ever need to support an
older version.

-- 
  Les Mikesell
   lesmikesell@gmail.com


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Repository Size Growing VERY Quickly

Posted by Ulrich Eckhardt <ec...@satorlaser.com>.
On Thursday 20 July 2006 02:43, Ryan Schmidt wrote:
> On Jul 19, 2006, at 22:36, J Kramer wrote:
> > Also, is there a way to clean out the
> > repository?  For instance, if I have versions 1 ... 100, but I
> > determine that versions 2, 27, 43, 81, and 100 are the important
> > milestones, is there a way to get rid of the in between versions to
> > save space while still being able to revert to those particular
> > versions?
>
> Also no.
>
>
> The standard recommendation would be to not keep these items in
> Subversion then.

... and only add them to the tags you make for said milestones. This is a good 
compromise between having a complete history and not having too many 
uninteresting changes in the repository. 

Note that generally the result of compiling things (or any kind of generated 
files) should not be under version control because you can always check out 
the version and regenerate them. This way of thinking has one minor error 
though and that is that you can't always retrieve the whole version including 
the environment that was used to build it, e.g. because you upgraded a 
compiler or changed the OS. In those cases, adding the generated files when 
making a release is a good idea.

Uli

****************************************************
Visit our website at <http://www.domino-printing.com/>
****************************************************
This Email and any files transmitted with it are intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any reading, redistribution, disclosure or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient please contact the sender immediately and delete the material from your computer.

E-mail may be susceptible to data corruption, interception, viruses and unauthorised amendment and Domino UK Limited does not accept liability for any such corruption, interception, viruses or amendment or their consequences.
****************************************************

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Repository Size Growing VERY Quickly

Posted by Ryan Schmidt <su...@ryandesign.com>.
On Jul 19, 2006, at 22:36, J Kramer wrote:

> At my workplace we use SVN to manage our software development.  We
> commit rather frequently as that seems good policy, but the size of  
> the
> SVN repository (on the server) is growing quite rapidly--too  
> rapidly for us.
> Part of the problem may be that our project has a lot of binary files
> to manage.  As I understand it, SVN does do binary diffs in order to
> reduce the data storage, but is it really efficient at that?
It depends on the way the binary file is made, and how much of the  
file really changes.

> Are there any tips that you guys have for reducing the size of the
> repository?  There are certain files which are convenient to be under
> version control but for which we don't really need many past versions.
> Is there some setting that we can put on a file to have SVN keep only
> the few most recent versions.

No.


> Also, is there a way to clean out the
> repository?  For instance, if I have versions 1 ... 100, but I
> determine that versions 2, 27, 43, 81, and 100 are the important
> milestones, is there a way to get rid of the in between versions to
> save space while still being able to revert to those particular
> versions?

Also no.


The standard recommendation would be to not keep these items in  
Subversion then.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Repository Size Growing VERY Quickly

Posted by Nico Kadel-Garcia <nk...@comcast.net>.
  ----- Original Message ----- 
  From: J Kramer 
  To: Daniel Berlin 
  Cc: users@subversion.tigris.org 
  Sent: Thursday, July 20, 2006 7:02 PM
  Subject: Re: Repository Size Growing VERY Quickly





  On 7/20/06, Daniel Berlin <db...@dberlin.org> wrote:
    J Kramer wrote:
    > All,
    >
    > At my workplace we use SVN to manage our software development.  We
    > commit rather frequently as that seems good policy, but the size of the
    > SVN repository (on the server) is growing quite rapidly--too rapidly for 
    > us.

    You need to quantify these terms with numbers.

  In the past week the repository has grown from about 500 MB to 1.7 GB.  We should be able to take care of some of that by unversioning some things that are currently versioned, but most of our binary files have to be versioned. 



    It depends on the size of the binary files, and how they are changing.
    If we are talking gigabyte files, you'll probably want to up the window
    size a bunch (at the cost of memory and time usage) to get better delta
    performance.
    This requires modifying one of the include files.


  Could you explain more about that?  I'm not familiar with that parameter.

  Also, for files that currently exist in the repository that aren't needed, how can we expunge them from the repository?  If we simply delete them, then there will be no new versions, but SVN will continue to keep the old versions. 


The only graceful way to expunge old files currently is to use svnadmin dump, svndumpfilter, and svnadmin load in combination to skip the undesired files. Like many other systems, Subversion is designed to *NOT* throw away data. This can occasionally be burdensome, for example in your current case.

Re: Repository Size Growing VERY Quickly

Posted by Daniel Berlin <db...@dberlin.org>.
J Kramer wrote:
>     How big are these files each?
> 
> 
> Most are two to five MB.  One is Ten MB.
> 
> What would be a reasonable window size for files of that size?
> 

Well, if you can afford the svn process taking up 20 meg of memory to do
a delta, use 10 meg as the window size.

> Which include file does the window size parameter live in?

subversion/libsvn_delta/delta.h

> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Repository Size Growing VERY Quickly

Posted by J Kramer <kr...@gmail.com>.
> 
> How big are these files each?


Most are two to five MB.  One is Ten MB.

What would be a reasonable window size for files of that size?

Which include file does the window size parameter live in?

Re: Repository Size Growing VERY Quickly

Posted by Daniel Berlin <db...@dberlin.org>.
J Kramer wrote:
> 
> 
> On 7/20/06, *Daniel Berlin* <dberlin@dberlin.org
> <ma...@dberlin.org>> wrote:
> 
>     J Kramer wrote:
>     > All,
>     >
>     > At my workplace we use SVN to manage our software development.  We
>     > commit rather frequently as that seems good policy, but the size
>     of the
>     > SVN repository (on the server) is growing quite rapidly--too
>     rapidly for
>     > us.
> 
>     You need to quantify these terms with numbers.
> 
> 
> In the past week the repository has grown from about 500 MB to 1.7 GB. 
> We should be able to take care of some of that by unversioning some
> things that are currently versioned, but most of our binary files have
> to be versioned.

How big are these files each?

> 
> 
> 
>     It depends on the size of the binary files, and how they are changing.
>     If we are talking gigabyte files, you'll probably want to up the window
>     size a bunch (at the cost of memory and time usage) to get better delta
>     performance.
> 
>     This requires modifying one of the include files.
> 
> 
> 
> Could you explain more about that? 

In order to be able to deal with unlimited size files, without keeping
them  all in memory at once to be able to perform deltas, we break the
streams we are performing deltas on into seperate "windows" of a certain
size.  The default window size is 102400 bytes.  That means any file you
store in subversion that gets stored as a set of deltas, will have
N/102400 delta windows, where N is the file size.  The downside to this
scheme is that stream alignment ends up mattering a lot.  If you end up
with >WINDOW SIZE amount of new data towards the very beginning, you are
going to throw off the entire rest of the delta stream, and it will
never see anything it thinks can be a copy (unless there is other
redundancy in the stream).

For source code files, 102,400 is a perfectly reasonable window size.

For something like video files, it's not.

You can up the window size, but in general, we are going to use (2 *
window_size) temporary memory to perform a delta.


> I'm not familiar with that parameter.
> 
> Also, for files that currently exist in the repository that aren't
> needed, how can we expunge them from the repository?
You can't without dumping /filtering/loading.
We don't design the system to allow you to obliterate old data.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Repository Size Growing VERY Quickly

Posted by J Kramer <kr...@gmail.com>.
On 7/20/06, Daniel Berlin <db...@dberlin.org> wrote:
>
> J Kramer wrote:
> > All,
> >
> > At my workplace we use SVN to manage our software development.  We
> > commit rather frequently as that seems good policy, but the size of the
> > SVN repository (on the server) is growing quite rapidly--too rapidly for
> > us.
>
> You need to quantify these terms with numbers.


In the past week the repository has grown from about 500 MB to 1.7 GB.  We
should be able to take care of some of that by unversioning some things that
are currently versioned, but most of our binary files have to be versioned.

>
>
> It depends on the size of the binary files, and how they are changing.
> If we are talking gigabyte files, you'll probably want to up the window
> size a bunch (at the cost of memory and time usage) to get better delta
> performance.

This requires modifying one of the include files.



Could you explain more about that?  I'm not familiar with that parameter.

Also, for files that currently exist in the repository that aren't needed,
how can we expunge them from the repository?  If we simply delete them, then
there will be no new versions, but SVN will continue to keep the old
versions.

Thanks for you help.

John

Re: Repository Size Growing VERY Quickly

Posted by Daniel Berlin <db...@dberlin.org>.
J Kramer wrote:
> All,
> 
> At my workplace we use SVN to manage our software development.  We
> commit rather frequently as that seems good policy, but the size of the
> SVN repository (on the server) is growing quite rapidly--too rapidly for
> us.

You need to quantify these terms with numbers.

> 
> Part of the problem may be that our project has a lot of binary files
> to manage.  As I understand it, SVN does do binary diffs in order to
> reduce the data storage, but is it really efficient at that?

It depends on the size of the binary files, and how they are changing.
If we are talking gigabyte files, you'll probably want to up the window
size a bunch (at the cost of memory and time usage) to get better delta
performance.

This requires modifying one of the include files.

--Dan

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org