You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Jon Smirl <jo...@gmail.com> on 2006/06/12 17:24:53 UTC

Huge number of rev files in in svn repository

I've been doing some experiments with importing the Mozilla CVS
repository in to svn and git. I am having terrible performance
problems with this process. Many of the tasks take days to complete.

After importing Mozilla in to svn using cvs2svn, a big problem is that
my svn repository has 450,000 files in it. Two directories have
220,000 files each in them. ext3 collapses under that size of
directory since it does sequential looks ups for file names.

It there some way to pack the repository down to fewer files?

svn could be changed to get around this problem. The files are
sequentially numbered from 1 to 220,000. It would be easy to put the
first 1000 in one directory and so on to spread the files over 220
directories.  git uses this technique.

-- 
Jon Smirl
jonsmirl@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Huge number of rev files in in svn repository

Posted by Andreas Pakulat <ap...@gmx.de>.
On 12.06.06 13:24:53, Jon Smirl wrote:
> I've been doing some experiments with importing the Mozilla CVS
> repository in to svn and git. I am having terrible performance
> problems with this process. Many of the tasks take days to complete.
> 
> After importing Mozilla in to svn using cvs2svn, a big problem is that
> my svn repository has 450,000 files in it. Two directories have
> 220,000 files each in them. ext3 collapses under that size of
> directory since it does sequential looks ups for file names.

- use the dir_index option, this speeds up the filename lookup through
  the use of trees
- use another file system.

Andreas

-- 
Hope that the day after you die is a nice day.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Huge number of rev files in in svn repository

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 6/12/06, Jon Smirl <jo...@gmail.com> wrote:
> I've been doing some experiments with importing the Mozilla CVS
> repository in to svn and git. I am having terrible performance
> problems with this process. Many of the tasks take days to complete.
>
> After importing Mozilla in to svn using cvs2svn, a big problem is that
> my svn repository has 450,000 files in it. Two directories have
> 220,000 files each in them. ext3 collapses under that size of
> directory since it does sequential looks ups for file names.
>
> It there some way to pack the repository down to fewer files?
>
> svn could be changed to get around this problem. The files are
> sequentially numbered from 1 to 220,000. It would be easy to put the
> first 1000 in one directory and so on to spread the files over 220
> directories.  git uses this technique.

For what it's worth, I know of repositories that use fsfs and that are
way beyond that number of revisions.  svn.apache.org/repos/asf is at
413k at the moment, and is doing fine.  As others have suggested,
there may be kernel or fs parameters that you can tweak to avoid the
linear search problem, or switching to a non-ext3 filesystem may be an
option.

If you really feel that the number of files is a problem, feel free to
come to the dev@ list with a proposal on how to fix it.  There are
various hashing schemes that could be tried, it's just that nobody's
been sufficiently motivated to try it since it's usually easier to
just tweak the filesystem to make it fast enough.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Huge number of rev files in in svn repository

Posted by Bob Proulx <bo...@proulx.com>.
Jon Smirl wrote:
> I lost power this morning while the city decided to move the telephone
> pole in front of my house. That killed the import that had been
> running for five days.

If an import was taking five days then almost assuredly the result
would not be something that you would want to be using for real.
There is something undesireable about it such as the duplication
possibility or something.  You would almost certainly have needed to
convert it again after detecting the problem and dealing with it.
Basically what I am saying is that after five days I would have killed
it whether the power was running or not.  You did not lose anything
that was not already lost.

As one example, when the GCC project converted from CVS to SVN the
conversion was started and stopped and started and stopped a number of
times until the conversion time got down to something reasonable.  You
would not want to bring gcc offline for weeks waiting for a
conversion.  It needs to happen in a short enough time to be
reasonable.

Daniel Berlin wrote on the gcc list way back on 6-Mar-2005:
> Due to some massive speedups i've implemented in cvs2svn, the full gcc
> repo, including all non-broken tags (more in a moment), is now
> available.  It would have taken ~7 days before, and now it takes less
> than 2 (it's almost completely disk bound now, and my 7200rpm disks just
> aren't fast enough apparently :P)

Which I read as being that with large projects it will probably take a
few turns to get a good result out of the tools.  With small projects
I have not had any trouble with cvs2svn.  But large projects may need
special handling.

Bob 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Huge number of rev files in in svn repository

Posted by Jon Smirl <jo...@gmail.com>.
On 6/13/06, Daniel Berlin <db...@dberlin.org> wrote:
> Jon Smirl wrote:
> > On 6/13/06, Daniel Berlin <db...@dberlin.org> wrote:
> >> Jon Smirl wrote:
> >>> Another interesting tidbit.
> >>>
> >>> Mozilla CVS is 3.0GB
> >>> Same repository in svn is 8.2GB
> >>> Same repository in git is 680MB
> >>>
> >>> git does deltas and then compresses the deltas all into one giant
> >>> 680MB file. New commits are stored in collections of single files
> >>> until the repository is packed again.
> >>
> >> cvs2svn has significant problems with copy identification.
> >> If mozilla has a lot of branches, tons of them will end up with entire
> >
> > Is 1,800 a lot?
> >
>
> Yes.
>
> It is highly likely that almost all of your repo space is being taken
> up by identical new files that should have been copies, but cvs2svn
> borked it :)
>
> It is likely that git properly notices they are all the same data, and
> just shares them when you do a git to svn import.
>
> If you stare at the dumpfile that cvs2svn generates, and look at the
> commits that create your branches, you can see if it is properly making
> them copies or not.
>
> This is being attacked two fold, by working on fixing cvs2svn, and by
> working on issue 2286.

I've given up on this for a while. I'll try again when there is a new
cvs2svn release.

I lost power this morning while the city decided to move the telephone
pole in front of my house. That killed the import that had been
running for five days.

-- 
Jon Smirl
jonsmirl@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Huge number of rev files in in svn repository

Posted by Daniel Berlin <db...@dberlin.org>.
Jon Smirl wrote:
> On 6/13/06, Daniel Berlin <db...@dberlin.org> wrote:
>> Jon Smirl wrote:
>>> Another interesting tidbit.
>>>
>>> Mozilla CVS is 3.0GB
>>> Same repository in svn is 8.2GB
>>> Same repository in git is 680MB
>>>
>>> git does deltas and then compresses the deltas all into one giant
>>> 680MB file. New commits are stored in collections of single files
>>> until the repository is packed again.
>>
>> cvs2svn has significant problems with copy identification.
>> If mozilla has a lot of branches, tons of them will end up with entire
> 
> Is 1,800 a lot?
> 

Yes.

It is highly likely that almost all of your repo space is being taken
up by identical new files that should have been copies, but cvs2svn
borked it :)

It is likely that git properly notices they are all the same data, and
just shares them when you do a git to svn import.

If you stare at the dumpfile that cvs2svn generates, and look at the
commits that create your branches, you can see if it is properly making
them copies or not.

This is being attacked two fold, by working on fixing cvs2svn, and by
working on issue 2286.


HTH,
Dan


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Huge number of rev files in in svn repository

Posted by Jon Smirl <jo...@gmail.com>.
On 6/13/06, Daniel Berlin <db...@dberlin.org> wrote:
> Jon Smirl wrote:
> > Another interesting tidbit.
> >
> > Mozilla CVS is 3.0GB
> > Same repository in svn is 8.2GB
> > Same repository in git is 680MB
> >
> > git does deltas and then compresses the deltas all into one giant
> > 680MB file. New commits are stored in collections of single files
> > until the repository is packed again.
>
>
> cvs2svn has significant problems with copy identification.
> If mozilla has a lot of branches, tons of them will end up with entire

Is 1,800 a lot?

> adds of the full text of the files as the first revision on the branch,
> instead of copies from the approriate sources.
>
> This happened to gcc, and by numbers, about 4 gig of our repository is
> wasted space from this.
>
>
> HTH,
> Dan
>


-- 
Jon Smirl
jonsmirl@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Huge number of rev files in in svn repository

Posted by Daniel Berlin <db...@dberlin.org>.
Jon Smirl wrote:
> Another interesting tidbit.
> 
> Mozilla CVS is 3.0GB
> Same repository in svn is 8.2GB
> Same repository in git is 680MB
> 
> git does deltas and then compresses the deltas all into one giant
> 680MB file. New commits are stored in collections of single files
> until the repository is packed again.


cvs2svn has significant problems with copy identification.
If mozilla has a lot of branches, tons of them will end up with entire
adds of the full text of the files as the first revision on the branch,
instead of copies from the approriate sources.

This happened to gcc, and by numbers, about 4 gig of our repository is
wasted space from this.


HTH,
Dan

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Huge number of rev files in in svn repository

Posted by Jon Smirl <jo...@gmail.com>.
On 6/14/06, Bob Proulx <bo...@proulx.com> wrote:
> Jon Smirl wrote:
> > Another interesting tidbit.
>
> Interesting data points indeed.  But how was the conversion accomplished?
>
> > Mozilla CVS is 3.0GB
> > Same repository in svn is 8.2GB

Using cvs2svn. Conversion took about three days, phase 1-7 one day and
phase 8 about 2.5 days.

>
> How did you complete the cvs to svn conversion?  I am reading that it
> is not completing for you using cvs2svn.  So I am curious how you
> arrived at the above data point.

cvs2svn finished without problem. what won't finish is git-svnimport.
I wanted to compare the results of git-cvsimport with the results from
taking a trip through subversion.

>
> > Same repository in git is 680MB
>
> How was this conversion done?  Using git-cvsimport?

cvsimport, another person has done this but we did it at the same
time.. cvsimport needs 2GB+ physical RAM or it swaps to death. The
extra RAM I ordered just came today so I can try it for myself.

It took about three days for cvsimport to finish too.

>
> Bob
>


-- 
Jon Smirl
jonsmirl@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Huge number of rev files in in svn repository

Posted by Bob Proulx <bo...@proulx.com>.
Jon Smirl wrote:
> Another interesting tidbit.

Interesting data points indeed.  But how was the conversion accomplished?

> Mozilla CVS is 3.0GB
> Same repository in svn is 8.2GB

How did you complete the cvs to svn conversion?  I am reading that it
is not completing for you using cvs2svn.  So I am curious how you
arrived at the above data point.

> Same repository in git is 680MB

How was this conversion done?  Using git-cvsimport?

Bob

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Huge number of rev files in in svn repository

Posted by Jon Smirl <jo...@gmail.com>.
Another interesting tidbit.

Mozilla CVS is 3.0GB
Same repository in svn is 8.2GB
Same repository in git is 680MB

git does deltas and then compresses the deltas all into one giant
680MB file. New commits are stored in collections of single files
until the repository is packed again.

I suspect svn is large because it is using 450K files each of which
has a minimum allocation size in ext3. CVS only uses 117K files.
Packed git uses less than 5,000.

-- 
Jon Smirl
jonsmirl@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Huge number of rev files in in svn repository

Posted by Jon Smirl <jo...@gmail.com>.
On 6/12/06, Garrett Rooney <ro...@electricjellyfish.net> wrote:
> On 6/12/06, Jon Smirl <jo...@gmail.com> wrote:
> > On 6/12/06, Garrett Rooney <ro...@electricjellyfish.net> wrote:
> > > Nobody has implemented this because nobody has needed to yet.  There
> > > are numerous repositories as large or larger than the one you're
> > > talking about that operate at perfectly reasonable speeds.  On the
> > > other hand, it's not like the standard use case is "extract every
> > > single revision", so you may be hitting a bit of functionality that
> > > isn't quite in the usual use case.  If you have a need to speed this
> > > up, perhaps you can work on a patch to add this functionality to
> > > libsvn_fs_fs.
> >
> > It may be more deep than just the number of files in the repository.
> > My Mozilla SVN has changes from 1998 to 2006. My export of the early
> > years went fairly quickly. In 2002 it started getting slower, and I
> > have been stuck in 2003 for over 24hrs. It is making forward progress
> > but the progress is getting slower and slower.
>
> Interesting.  Some profiling data on that would be quite useful...

Linus wants profile data too but unfortunately I don't have profiling
turned on in the kernel I am running. I am planning on rebooting
tomorrow and I will switch kernels then.

>
> > I have converted the ext3 rev and revprop directories to use the
> > dir_index format. Doing that did speed things up but overall it is
> > still very slow. The process is completely CPU bound, disk activity is
> > light.
>
> What version of Subversion are you using?  If you can try (from the
> beginning, unfortunately, since the changes involved include a new
> delta format in the filesystem, so need to start from the beginning)
> the 1.4.x branch or the trunk it would be interesting to see if it
> improves matters...

subversion-1.3.1-2.1

If I end up restarting the export I can try switching versions.

>
> -garrett
>


-- 
Jon Smirl
jonsmirl@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Huge number of rev files in in svn repository

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 6/12/06, Jon Smirl <jo...@gmail.com> wrote:
> On 6/12/06, Garrett Rooney <ro...@electricjellyfish.net> wrote:
> > Nobody has implemented this because nobody has needed to yet.  There
> > are numerous repositories as large or larger than the one you're
> > talking about that operate at perfectly reasonable speeds.  On the
> > other hand, it's not like the standard use case is "extract every
> > single revision", so you may be hitting a bit of functionality that
> > isn't quite in the usual use case.  If you have a need to speed this
> > up, perhaps you can work on a patch to add this functionality to
> > libsvn_fs_fs.
>
> It may be more deep than just the number of files in the repository.
> My Mozilla SVN has changes from 1998 to 2006. My export of the early
> years went fairly quickly. In 2002 it started getting slower, and I
> have been stuck in 2003 for over 24hrs. It is making forward progress
> but the progress is getting slower and slower.

Interesting.  Some profiling data on that would be quite useful...

> I have converted the ext3 rev and revprop directories to use the
> dir_index format. Doing that did speed things up but overall it is
> still very slow. The process is completely CPU bound, disk activity is
> light.

What version of Subversion are you using?  If you can try (from the
beginning, unfortunately, since the changes involved include a new
delta format in the filesystem, so need to start from the beginning)
the 1.4.x branch or the trunk it would be interesting to see if it
improves matters...

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Huge number of rev files in in svn repository

Posted by Jon Smirl <jo...@gmail.com>.
On 6/12/06, Garrett Rooney <ro...@electricjellyfish.net> wrote:
> Nobody has implemented this because nobody has needed to yet.  There
> are numerous repositories as large or larger than the one you're
> talking about that operate at perfectly reasonable speeds.  On the
> other hand, it's not like the standard use case is "extract every
> single revision", so you may be hitting a bit of functionality that
> isn't quite in the usual use case.  If you have a need to speed this
> up, perhaps you can work on a patch to add this functionality to
> libsvn_fs_fs.

It may be more deep than just the number of files in the repository.
My Mozilla SVN has changes from 1998 to 2006. My export of the early
years went fairly quickly. In 2002 it started getting slower, and I
have been stuck in 2003 for over 24hrs. It is making forward progress
but the progress is getting slower and slower.

I have converted the ext3 rev and revprop directories to use the
dir_index format. Doing that did speed things up but overall it is
still very slow. The process is completely CPU bound, disk activity is
light.

-- 
Jon Smirl
jonsmirl@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Huge number of rev files in in svn repository

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 6/12/06, Jon Smirl <jo...@gmail.com> wrote:
> On 6/12/06, Garrett Rooney <ro...@electricjellyfish.net> wrote:
> > On 6/12/06, Jon Smirl <jo...@gmail.com> wrote:
> >
> > > I suspect using file system extended attributes would be a lot more
> > > efficient than separate files in revprops.
> >
> > Sure, it might be, but it would also be totally unportable and thus a
> > non-starter.
>
> You could simply test if they were available when the repository was
> created and use them when possible. Provide a tool to copy from the EA
> to a revprops file if you are going to copy the repository someplace
> that doesn't support them.
>
> Something needs to be done to help performance. It shouldn't be taking
> four+ days to extract all of the change sets from a repostory. It only
> took cvs2svn one day to get them into the same repository.

And something can be done to help performance, if it's really a
problem of large numbers of files in a single directory you can put
revision and revprop files into subdirectories via some hashing scheme
to reduce the total size of those directories.  There's no reason to
resort to nonportable APIs to solve this problem, and honestly, I find
it incredibly unlikely that it's the revprops that are causing your
problems, at least not such that reimplementing them with EAs would
make it fast enough for your needs.  You'd still be hanging them off
of a file in a giant directory, which seems likely to be equally slow
to access as the current giant directory of files scheme.

Nobody has implemented this because nobody has needed to yet.  There
are numerous repositories as large or larger than the one you're
talking about that operate at perfectly reasonable speeds.  On the
other hand, it's not like the standard use case is "extract every
single revision", so you may be hitting a bit of functionality that
isn't quite in the usual use case.  If you have a need to speed this
up, perhaps you can work on a patch to add this functionality to
libsvn_fs_fs.

> Also, my repository is 8GB not 4GB, the du command wasn't finished. Is
> it useful for me to push a copy out onto a server?  Not sure how big
> the tar.bz will be when finished.

Not really, it's not exactly news that large numbers of revisions
results in large numbers of revision and revprop files, and if we need
to test the developers already have access to some large repositories.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Huge number of rev files in in svn repository

Posted by Jon Smirl <jo...@gmail.com>.
On 6/12/06, Garrett Rooney <ro...@electricjellyfish.net> wrote:
> On 6/12/06, Jon Smirl <jo...@gmail.com> wrote:
>
> > I suspect using file system extended attributes would be a lot more
> > efficient than separate files in revprops.
>
> Sure, it might be, but it would also be totally unportable and thus a
> non-starter.

You could simply test if they were available when the repository was
created and use them when possible. Provide a tool to copy from the EA
to a revprops file if you are going to copy the repository someplace
that doesn't support them.

Something needs to be done to help performance. It shouldn't be taking
four+ days to extract all of the change sets from a repostory. It only
took cvs2svn one day to get them into the same repository.

Also, my repository is 8GB not 4GB, the du command wasn't finished. Is
it useful for me to push a copy out onto a server?  Not sure how big
the tar.bz will be when finished.

Also, I have been doing everything locally with a file URL. I am not
using a svn server.

>
> -garrett
>


-- 
Jon Smirl
jonsmirl@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Huge number of rev files in in svn repository

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 6/12/06, Jon Smirl <jo...@gmail.com> wrote:

> I suspect using file system extended attributes would be a lot more
> efficient than separate files in revprops.

Sure, it might be, but it would also be totally unportable and thus a
non-starter.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Huge number of rev files in in svn repository

Posted by Jon Smirl <jo...@gmail.com>.
On 6/12/06, Garrett Rooney <ro...@electricjellyfish.net> wrote:
> On 6/12/06, Nico Kadel-Garcia <nk...@comcast.net> wrote:
> > Garrett Rooney wrote:
> > > On 6/12/06, Nico Kadel-Garcia <nk...@comcast.net> wrote:
> > >
> > >> Also, anytime you have 200,000 files in a single project directory,
> > >> something sounds like it's not really configured for performance.
> > >> Why do you have so many files in single directories?
> > >
> > > That's just the way fsfs repositories work.  Not that it couldn't be
> > > changed, but that's how it works now.
> >
> > I thought he was referring to the CVS source directories?
>
> As far as I can tell he's talking about speed of svn operations with
> really large numbers of revisions...

I have 210,000 files each in my rev and revprops directories.  The svn
repository is around 4GB in size.

I am making a tar of the svn repository right now. I will push it up
on a server when it is finished. These things take hours for 4GB
files.

I suspect using file system extended attributes would be a lot more
efficient than separate files in revprops.


>
> -garrett
>


-- 
Jon Smirl
jonsmirl@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Huge number of rev files in in svn repository

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 6/12/06, Nico Kadel-Garcia <nk...@comcast.net> wrote:
> Garrett Rooney wrote:
> > On 6/12/06, Nico Kadel-Garcia <nk...@comcast.net> wrote:
> >
> >> Also, anytime you have 200,000 files in a single project directory,
> >> something sounds like it's not really configured for performance.
> >> Why do you have so many files in single directories?
> >
> > That's just the way fsfs repositories work.  Not that it couldn't be
> > changed, but that's how it works now.
>
> I thought he was referring to the CVS source directories?

As far as I can tell he's talking about speed of svn operations with
really large numbers of revisions...

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Huge number of rev files in in svn repository

Posted by Nico Kadel-Garcia <nk...@comcast.net>.
Garrett Rooney wrote:
> On 6/12/06, Nico Kadel-Garcia <nk...@comcast.net> wrote:
> 
>> Also, anytime you have 200,000 files in a single project directory,
>> something sounds like it's not really configured for performance.
>> Why do you have so many files in single directories?
> 
> That's just the way fsfs repositories work.  Not that it couldn't be
> changed, but that's how it works now.

I thought he was referring to the CVS source directories?

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Huge number of rev files in in svn repository

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
On 6/12/06, Nico Kadel-Garcia <nk...@comcast.net> wrote:

> Also, anytime you have 200,000 files in a single project directory,
> something sounds like it's not really configured for performance. Why do you
> have so many files in single directories?

That's just the way fsfs repositories work.  Not that it couldn't be
changed, but that's how it works now.

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Huge number of rev files in in svn repository

Posted by Nico Kadel-Garcia <nk...@comcast.net>.
Jon Smirl wrote:
> On 6/12/06, Les Mikesell <le...@gmail.com> wrote:
>> On Mon, 2006-06-12 at 13:24 -0400, Jon Smirl wrote:
>>
>>> After importing Mozilla in to svn using cvs2svn, a big problem is
>>> that my svn repository has 450,000 files in it. Two directories have
>>> 220,000 files each in them. ext3 collapses under that size of
>>> directory since it does sequential looks ups for file names.
>>
>> Directory indexing is optional in ext3.  See the  dir_index
>> option in tune2fs.
>
> I am running git-svnimport over my svn repository. This creates a huge
> load on svn since it is reading every changeset on every branch. I
> suspended my import job and converted the directory over to one with
> dir_index enabled.
>
> After the change, I'm still pretty much totally CPU bound with 20% in
> user space and 80% in the kernel. The svn process is using all of the
> CPU extracting the changesets. Disk activity is light and no swapping
> is happening. The svn import process is 400MB and about 400MB of disk
> is in the cache.
>
> Tomorrow I'll reboot and install oprofile so that I can try and track
> down why so much time is being spent in the kernel.

Can you turn off "atime" for wherever the repository? Fasciniating system 
software may have significant dependencies on it, so I wouldn't recommend it 
for /, but for /var/www/svn I think it would be fine.

Also, anytime you have 200,000 files in a single project directory, 
something sounds like it's not really configured for performance. Why do you 
have so many files in single directories? 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Huge number of rev files in in svn repository

Posted by Jon Smirl <jo...@gmail.com>.
On 6/12/06, Les Mikesell <le...@gmail.com> wrote:
> On Mon, 2006-06-12 at 13:24 -0400, Jon Smirl wrote:
>
> > After importing Mozilla in to svn using cvs2svn, a big problem is that
> > my svn repository has 450,000 files in it. Two directories have
> > 220,000 files each in them. ext3 collapses under that size of
> > directory since it does sequential looks ups for file names.
>
> Directory indexing is optional in ext3.  See the  dir_index
> option in tune2fs.

I am running git-svnimport over my svn repository. This creates a huge
load on svn since it is reading every changeset on every branch. I
suspended my import job and converted the directory over to one with
dir_index enabled.

After the change, I'm still pretty much totally CPU bound with 20% in
user space and 80% in the kernel. The svn process is using all of the
CPU extracting the changesets. Disk activity is light and no swapping
is happening. The svn import process is 400MB and about 400MB of disk
is in the cache.

Tomorrow I'll reboot and install oprofile so that I can try and track
down why so much time is being spent in the kernel.

-- 
Jon Smirl
jonsmirl@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Huge number of rev files in in svn repository

Posted by Les Mikesell <le...@gmail.com>.
On Mon, 2006-06-12 at 13:24 -0400, Jon Smirl wrote:

> After importing Mozilla in to svn using cvs2svn, a big problem is that
> my svn repository has 450,000 files in it. Two directories have
> 220,000 files each in them. ext3 collapses under that size of
> directory since it does sequential looks ups for file names.

Directory indexing is optional in ext3.  See the  dir_index
option in tune2fs.

-- 
  Les Mikesell
   lesmikesell@gmail.com


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org