You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Mike Burr <me...@gmail.com> on 2005/03/22 14:43:36 UTC

Large repositories, binary files, using W2K3 server

I'm deploying Subversion in an environment where all work will be done
within a single repository and nearly all files will be binary files
10s of MB in size (graphics files mostly).

I'm wondering if there's any potential problems with having a very
large repository, say 10s or possibly even 100s of GB. I'm thinking of
things like performance and database corruption. The OS should handle
files up to 16TB. I've noticed that Berkley DB doesn't split out the
database files even when they get to be very large. Is it possible
that things will slow with time or that the database files will become
"fragmented", requiring a lot of seeking?

Is there any problem that I might be overlooking by using Subversion
to control binary-only data? Obviously, I don't plan to do any merging
of files. Rather I just want to be able to version project folders and
generally have more structure, collaboration and control.

This will all be on a Windows 2003 Server machine (no choice!). Given
that I'm using the latest stable version of everything (svn,
apache2+mod_auth_sspi) does anyone foresee problems using this OS in a
production environment.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Large repositories, binary files, using W2K3 server

Posted by Mark Phippard <Ma...@softlanding.com>.
Mike Burr <me...@gmail.com> wrote on 03/22/2005 09:43:36 AM:

> I'm deploying Subversion in an environment where all work will be done
> within a single repository and nearly all files will be binary files
> 10s of MB in size (graphics files mostly).
> 
> I'm wondering if there's any potential problems with having a very
> large repository, say 10s or possibly even 100s of GB. I'm thinking of
> things like performance and database corruption. The OS should handle
> files up to 16TB. I've noticed that Berkley DB doesn't split out the
> database files even when they get to be very large. Is it possible
> that things will slow with time or that the database files will become
> "fragmented", requiring a lot of seeking?
> 
> Is there any problem that I might be overlooking by using Subversion
> to control binary-only data? Obviously, I don't plan to do any merging
> of files. Rather I just want to be able to version project folders and
> generally have more structure, collaboration and control.
> 
> This will all be on a Windows 2003 Server machine (no choice!). Given
> that I'm using the latest stable version of everything (svn,
> apache2+mod_auth_sspi) does anyone foresee problems using this OS in a
> production environment.

The biggest problem in this sort of environment is typically the Working 
Copy.  You are aware that your WC will have two copies of each of these 
files? 

I would recommend using the fsfs repository format as it is a bit smaller 
than BDB and doesn't have the problem you mention of storing everything in 
a single file.

Finally, you might find that performance is not great when committing 
these files, even the first time, as Subversion will always do a complete 
binary delta of the file before it sends it over the wire.  With really 
big files that do not compress well, this is often a big time waster as it 
would be better to just send the file without the compression.

I do not wish to discourage you, I would just recommend you prototype this 
before you commit to it.  I do not recall the name of the product, and it 
probably isn't free, but I recall people mentioning another version 
control tool that was created for your specific type of environment.

Mark

PS - One thing you might try once Subversion 1.2 is released is to use the 
DAV autoversioning feature.  This might actually work better for this type 
of environment as you could avoid the use of a WC and some of the issues 
it brings.  The disadvantage is that your commits are more generic (no log 
message and only one file per commit).  Another plus is that your users 
are just saving a file to a folder, they do not even need to know it is 
being versioned.



_____________________________________________________________________________
Scanned for SoftLanding Systems, Inc. by IBM Email Security Management Services powered by MessageLabs. 
_____________________________________________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org