You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by we...@tigris.org on 2009/04/29 23:33:28 UTC

What are the potential performance & limit considerations for high number of files in repository?

Hi,

I would appreciate any advice regarding the following problem. 


Context: 
--------

I'm considering using subversion as a data repository. 

The usage patterns on the data are very similar to source code development. The data is textual in nature. Currently, there are about one million very small files (typically < 1KB each).And it is growing at a relatively slow rate (say maybe about million per year).

The usage flow will include branching and merging of sub-directories as well as the entire repository. There are about total of 20 users and they are all within the same LAN.

The nature of the data is such that each user will typically sync up or check out a folder with 500 files, work on it for 1-3 days and check it back in.

It is all on a Windows based environment (Win2K, Apache, svn 1.4.x)


Questions:
----------

1) What are the limitations on number of files in the repository (assuming I have sufficient hard-disk space of course & within NTFS limits)?

2) Are there any known potential performance bottlenecks/issues in such data repository organization (i.e. where are the potential slowdowns or performance concerns)?

3) My understanding from previous threads is that in terms of total size I'm well within the limits of the system (1-2 GB of data) so this not of a concern. Please correct me if I'm wrong.

4) Generally, is this a valid usage for subversion (in terms of number of files & size, assume development like usage pattern) and has anyone had experience with such repositories? In other words - is it a totally trivial & simple repository layout for subversion that's done everywhere...?


I'd appreciate the discussion or any advise greatly. 


Thanks,


Eyal


(N.B. I will have follow the thread on the web since I am not subscribed to the mailing list.)

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1987179

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: What are the potential performance & limit considerations for high number of files in repository?

Posted by Andy Levy <an...@gmail.com>.
On Wed, Apr 29, 2009 at 19:33,  <we...@tigris.org> wrote:
> Hi,
>
> I would appreciate any advice regarding the following problem.
>
>
> Context:
> --------
>
> I'm considering using subversion as a data repository.
>
> The usage patterns on the data are very similar to source code development. The data is textual in nature. Currently, there are about one million very small files (typically < 1KB each).And it is growing at a relatively slow rate (say maybe about million per year).
>
> The usage flow will include branching and merging of sub-directories as well as the entire repository. There are about total of 20 users and they are all within the same LAN.
>
> The nature of the data is such that each user will typically sync up or check out a folder with 500 files, work on it for 1-3 days and check it back in.
>
> It is all on a Windows based environment (Win2K, Apache, svn 1.4.x)
>
>
> Questions:
> ----------
>
> 1) What are the limitations on number of files in the repository (assuming I have sufficient hard-disk space of course & within NTFS limits)?

Your files aren't stored as individual files in the repository; each
revision is stored as its own file. So if you commit 1000 files in a
revision, it will be as many files in the repository as if you had
committed a single file.

I would recommend upgrading to a newer version of Subversion - at
least 1.5 - to get repository sharding. That way you won't have
thousands upon thousands of revision files in a single directory (NTFS
doesn't handle that case well). With 1.5's sharded directories, the
repository is split up into directories of 1000 revisions each to keep
the files per directory count down.

If you're doing to be doing merging, 1.5's merge tracking will also be
a huge benefit.

With 1.6, you can "pack" each completed sharded directory into a
single file to gain some performance, and reduce your file count
further.

Moving up from Win2K also wouldn't be a bad idea.

> 2) Are there any known potential performance bottlenecks/issues in such data repository organization (i.e. where are the potential slowdowns or performance concerns)?

Will you have users checking out/working on large sections of the
repository at once? Will you be doing a lot of large merges?

> 3) My understanding from previous threads is that in terms of total size I'm well within the limits of the system (1-2 GB of data) so this not of a concern. Please correct me if I'm wrong.

Only 1-2GB of data is a drop in the bucket.

> 4) Generally, is this a valid usage for subversion (in terms of number of files & size, assume development like usage pattern) and has anyone had experience with such repositories? In other words - is it a totally trivial & simple repository layout for subversion that's done everywhere...?

There are some very, very large open source projects which have been
quite successful with Subversion. Apache & KDE to name two. If SVN can
handle those, you should be fine.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1995236

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].


RE: Re: What are the potential performance & limit considerations for high number of files in repository?

Posted by we...@tigris.org.
Thank you for the responses. This is exactly what I needed to hear.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2083272

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: What are the potential performance & limit considerations for high number of files in repository?

Posted by Mark Phippard <ma...@gmail.com>.
On Wed, Apr 29, 2009 at 7:33 PM,  <we...@tigris.org> wrote:

> I would appreciate any advice regarding the following problem.
>
>
> Context:
> --------
>
> I'm considering using subversion as a data repository.
>
> The usage patterns on the data are very similar to source code development. The data is textual in nature. Currently, there are about one million very small files (typically < 1KB each).And it is growing at a relatively slow rate (say maybe about million per year).
>
> The usage flow will include branching and merging of sub-directories as well as the entire repository. There are about total of 20 users and they are all within the same LAN.
>
> The nature of the data is such that each user will typically sync up or check out a folder with 500 files, work on it for 1-3 days and check it back in.
>
> It is all on a Windows based environment (Win2K, Apache, svn 1.4.x)
>
>
> Questions:
> ----------
>
> 1) What are the limitations on number of files in the repository (assuming I have sufficient hard-disk space of course & within NTFS limits)?
>
> 2) Are there any known potential performance bottlenecks/issues in such data repository organization (i.e. where are the potential slowdowns or performance concerns)?
>
> 3) My understanding from previous threads is that in terms of total size I'm well within the limits of the system (1-2 GB of data) so this not of a concern. Please correct me if I'm wrong.
>
> 4) Generally, is this a valid usage for subversion (in terms of number of files & size, assume development like usage pattern) and has anyone had experience with such repositories? In other words - is it a totally trivial & simple repository layout for subversion that's done everywhere...?
>
>
> I'd appreciate the discussion or any advise greatly.
>
> Eyal

I do not think you have to worry about anything based on this pattern.
 Just think how many files are likely in the entire Apache repository?

The one issue that has been observed is that you can start running
into problems when you have a lot of files in a single folder.  The
more you "shard" your files into sub-directories, the less likely you
will have problems.  If you start putting 10K+ files in one folder,
there are some negative consequences that start to happen.

-- 
Thanks

Mark Phippard
http://markphip.blogspot.com/

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=1995271

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].