You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Roger Ashby <ro...@gmail.com> on 2006/02/28 14:03:55 UTC

Subversion and Very,Very Large Repositories

I administer a repository of images (about 300,000
tifs,pdfs,pngs,jpgs, and some zips with no single file larger than
700MB, however only 1/3 of those are more then a MB in size and of
those that are they mostly range between 1-40MB) that currently totals
about 700GB data. I've been tasked with coming up with a revision
control/digital asset management solution on a very limited budget. I
ultimately settled upon using svk (because of the lightweight checkout
copy management) with a subversion repository in the background. I
decided to incrementally capacity test the subversion repository a few
Gigs at a time. I have however run into an issue, it seems that the
repository is being saved in one big file, and I've reach the max file
limit for my machine.

 My first question is, are there options for svk or subversion that
dictate how the repository/depot stores it's files so I don't run into
this file size issue.

 My second question is, does it make sense to have this much data (it
could grow to several TB over the next 5 years) under revision control
can subversion handle it or should I be using some other solutions.

Re: Subversion and Very,Very Large Repositories

Posted by Russ Brown <rb...@ebuyer.com>.
On Tue, 2006-02-28 at 09:03 -0500, Roger Ashby wrote:
> I administer a repository of images (about 300,000
> tifs,pdfs,pngs,jpgs, and some zips with no single file larger than
> 700MB, however only 1/3 of those are more then a MB in size and of
> those that are they mostly range between 1-40MB) that currently totals
> about 700GB data. I've been tasked with coming up with a revision
> control/digital asset management solution on a very limited budget. I
> ultimately settled upon using svk (because of the lightweight checkout
> copy management) with a subversion repository in the background. I
> decided to incrementally capacity test the subversion repository a few
> Gigs at a time. I have however run into an issue, it seems that the
> repository is being saved in one big file, and I've reach the max file
> limit for my machine.
> 

What backend are you using? For fsfs each revision is stored in a
separate file. I can't comment on how DBD works though from this
perspective, but fsfs definitely does not store everything in one file.

>  My first question is, are there options for svk or subversion that
> dictate how the repository/depot stores it's files so I don't run into
> this file size issue.
> 

Make sure you're using fsfs. Unless one revision (most likely the first
one) is greater than your filesystem's max file size limit, you will be
fine.

>  My second question is, does it make sense to have this much data (it
> could grow to several TB over the next 5 years) under revision control
> can subversion handle it or should I be using some other solutions.

I can only comment as far as my experience has taken me so far. My svk
depot (mirroring my work's repository) is 2.1GB in size and is now at
21324 revisions. No problems doing day to day stuff whatsoever. Now
several TB I do not know about... :)

-- 

Russ


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Subversion and Very,Very Large Repositories

Posted by Ryan Schmidt <su...@ryandesign.com>.
On Feb 28, 2006, at 15:03, Roger Ashby wrote:

> I administer a repository of images (about 300,000
> tifs,pdfs,pngs,jpgs, and some zips with no single file larger than
> 700MB, however only 1/3 of those are more then a MB in size and of
> those that are they mostly range between 1-40MB) that currently totals
> about 700GB data. I've been tasked with coming up with a revision
> control/digital asset management solution on a very limited budget. I
> ultimately settled upon using svk (because of the lightweight checkout
> copy management) with a subversion repository in the background. I
> decided to incrementally capacity test the subversion repository a few
> Gigs at a time. I have however run into an issue, it seems that the
> repository is being saved in one big file, and I've reach the max file
> limit for my machine.

The FSFS backend stores one file per revision. I don't know how the  
BDB backend stores things. Most people recommend using FSFS, not BDB.  
You should use the current version of Subversion, 1.3.0. To properly  
support large files, you should use APR 1.2 (and, if you want Apache,  
then Apache 2.2.x), not APR 0.9 (which came with Apache 2.0.x).

As to whether it makes sense to store this much information in  
Subversion, others may have an opinion on that.



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Subversion and Very,Very Large Repositories

Posted by Russ Brown <pi...@gmail.com>.
On Tue, 2006-02-28 at 09:03 -0500, Roger Ashby wrote:
> I administer a repository of images (about 300,000
> tifs,pdfs,pngs,jpgs, and some zips with no single file larger than
> 700MB, however only 1/3 of those are more then a MB in size and of
> those that are they mostly range between 1-40MB) that currently totals
> about 700GB data. I've been tasked with coming up with a revision
> control/digital asset management solution on a very limited budget. I
> ultimately settled upon using svk (because of the lightweight checkout
> copy management) with a subversion repository in the background. I
> decided to incrementally capacity test the subversion repository a few
> Gigs at a time. I have however run into an issue, it seems that the
> repository is being saved in one big file, and I've reach the max file
> limit for my machine.
> 

What backend are you using? For fsfs each revision is stored in a
separate file. I can't comment on how DBD works though from this
perspective, but fsfs definitely does not store everything in one file.

>  My first question is, are there options for svk or subversion that
> dictate how the repository/depot stores it's files so I don't run into
> this file size issue.
> 

Make sure you're using fsfs. Unless one revision (most likely the first
one) is greater than your filesystem's max file size limit, you will be
fine.

>  My second question is, does it make sense to have this much data (it
> could grow to several TB over the next 5 years) under revision control
> can subversion handle it or should I be using some other solutions.

I can only comment as far as my experience has taken me so far. My svk
depot (mirroring my work's repository) is 2.1GB in size and is now at
21324 revisions. No problems doing day to day stuff whatsoever. Now
several TB I do not know about... :)

-- 

Russ

-- 

Russ


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Subversion and Very,Very Large Repositories

Posted by Ryan Schmidt <su...@ryandesign.com>.
On Mar 1, 2006, at 02:28, Vincent Starre wrote:

> May be worth noting: I don't think svn support holding a repos  
> across multiple filesystems. While it's likely anyone dealing in  
> TBs has a nice array, you could be in trouble if you dont.

I remember reading somewhere that it is perfectly fine to move old  
revisions to a different volume and symlink these back in place.

Ah yes, here it is:

http://svn.collab.net/repos/svn/trunk/notes/fsfs

"If an FSFS repository is outgrowing the filesystem it lives on, you  
can symlink old revisions off to another filesystem."



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Subversion and Very,Very Large Repositories

Posted by Vincent Starre <vs...@comcast.net>.
Phil Endecott wrote:

> Russ Brown wrote:
>
>>> In general, Subversion can handle very large repositories.  The one 
>>> scalability issue I am aware of is if you create a flat repository, 
>>> that does not scale well.
>>
>>
>> I'm not sure how this would make a difference. FSFS doesn't store the
>> files in the repository in physical folders on the disk: only as a
>> series of revisions. From what I remember, filesystem lookup performance
>> becomes a potential issue for fsfs only when you have a very large
>> number of revisions: not when you have a a flat file structure in the
>> repository.
>>
>> Or does the Subversion filesystem have a similar lookup performance
>> issue?
>
>
> IIRC, the FSFS revision file contains a small amount of data for every 
> file in a directory where some files have changed, even for files that 
> have not changed.  If you have a directory with many thousands of 
> files, this can become an issue.
>
> --Phil.


May be worth noting: I don't think svn support holding a repos across 
multiple filesystems. While it's likely anyone dealing in TBs has a nice 
array, you could be in trouble if you dont.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Subversion and Very,Very Large Repositories

Posted by Phil Endecott <sp...@chezphil.org>.
Russ Brown wrote:
>>In general, Subversion can handle very large repositories.  The one 
>>scalability issue I am aware of is if you create a flat repository, that 
>>does not scale well.
> 
> I'm not sure how this would make a difference. FSFS doesn't store the
> files in the repository in physical folders on the disk: only as a
> series of revisions. From what I remember, filesystem lookup performance
> becomes a potential issue for fsfs only when you have a very large
> number of revisions: not when you have a a flat file structure in the
> repository.
> 
> Or does the Subversion filesystem have a similar lookup performance
> issue?

IIRC, the FSFS revision file contains a small amount of data for every 
file in a directory where some files have changed, even for files that 
have not changed.  If you have a directory with many thousands of files, 
this can become an issue.

--Phil.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Subversion and Very,Very Large Repositories

Posted by Russ Brown <rb...@ebuyer.com>.
On Tue, 2006-02-28 at 09:22 -0500, Mark Phippard wrote:
> "Roger Ashby" <ro...@gmail.com> wrote on 02/28/2006 09:03:55 AM:
> 
> > I have however run into an issue, it seems that the
> > repository is being saved in one big file, and I've reach the max file
> > limit for my machine.
> 
> It sounds like you created a BDB repository.  The default format as of 
> Subversion 1.2 is "fsfs".  With this format there is one file per revision 
> of the repository (commit). 
> 
> >  My second question is, does it make sense to have this much data (it
> > could grow to several TB over the next 5 years) under revision control
> > can subversion handle it or should I be using some other solutions.
> 
> In general, Subversion can handle very large repositories.  The one 
> scalability issue I am aware of is if you create a flat repository, that 
> does not scale well.  In other words, avoid having one folder with 
> thousands of files in it.  If you structure your imagine so that there are 
> a lot of folders the repository should scale fine as long as the 
> underlying disks and filesystem can handle it.
> 

I'm not sure how this would make a difference. FSFS doesn't store the
files in the repository in physical folders on the disk: only as a
series of revisions. From what I remember, filesystem lookup performance
becomes a potential issue for fsfs only when you have a very large
number of revisions: not when you have a a flat file structure in the
repository.

Or does the Subversion filesystem have a similar lookup performance
issue?

> Mark
> 
> 
> _____________________________________________________________________________
> Scanned for SoftLanding Systems, Inc. and SoftLanding Europe Plc by IBM Email Security Management Services powered by MessageLabs. 
> _____________________________________________________________________________
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: users-help@subversion.tigris.org
> 
-- 

Russ


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Subversion and Very,Very Large Repositories

Posted by Mark Phippard <ma...@softlanding.com>.
"Roger Ashby" <ro...@gmail.com> wrote on 02/28/2006 09:03:55 AM:

> I have however run into an issue, it seems that the
> repository is being saved in one big file, and I've reach the max file
> limit for my machine.

It sounds like you created a BDB repository.  The default format as of 
Subversion 1.2 is "fsfs".  With this format there is one file per revision 
of the repository (commit). 

>  My second question is, does it make sense to have this much data (it
> could grow to several TB over the next 5 years) under revision control
> can subversion handle it or should I be using some other solutions.

In general, Subversion can handle very large repositories.  The one 
scalability issue I am aware of is if you create a flat repository, that 
does not scale well.  In other words, avoid having one folder with 
thousands of files in it.  If you structure your imagine so that there are 
a lot of folders the repository should scale fine as long as the 
underlying disks and filesystem can handle it.

Mark


_____________________________________________________________________________
Scanned for SoftLanding Systems, Inc. and SoftLanding Europe Plc by IBM Email Security Management Services powered by MessageLabs. 
_____________________________________________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org