You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Jan Keirse <ja...@tvh.be> on 2011/01/10 09:36:39 UTC

Betr.: Re: "svnadmin load" a huge file

> On 1/7/2011 7:57 AM, Victor Sudakov wrote:
> > It would be fine if the project in question did not contain almost all
> > the files in one directory. You may call the layout silly, but CVS 
does
> > not seem to mind. OTOH, I would have distributed the files over
> > several subdirectories, but CVS does not handle moving files well.
> >
> > I wonder if cvs2svn is to blame that it produces a dump svnadmin
> > cannot load. Or I am always risking that "svnadmin dump" may one day
> > produce a dump a subsequent "svnadmin load" will be unable to swallow?
> >
> > I mean, if by hook or by crook, by using third party utilities like
> > svndumptool, I will eventually be able to convert this project from
> > CVS to SVN. Is there a chance that a subsequent dump will be again
> > unloadable?
> 

Have you tried the following:
- Copy your CVS repository (say /myreypository to /myrepositoryconv)
- In the copy move the ,v files into several subdirectories (using the 
operating system, not using CVS commands.) 
- Convert the directories one at a time and load them into svn. 
- Once loaded into svn you can move everything back into one folder (using 
svn commands) if desired. 

Manually moving around ,v files in a cvs repository is generally not 
adviced primarily because it will annoy users with checked out working 
copies (and it's unversioned), but those working copies won't be of any 
use anyway once the server has been migrated to subversion so that 
shouldn't be a problem, so I don't think it could cause problems, but keep 
the original repository around just in case...

Kind Regards,

JAN KEIRSE
ICT-DEPARTMENT
Software quality & Systems: Software Engineer

**** DISCLAIMER ****

http://www.tvh.com/newen2/emaildisclaimer/default.html 

"This message is delivered to all addressees subject to the conditions
set forth in the attached disclaimer, which is an integral part of this
message."


RE: Betr.: Re: "svnadmin load" a huge file

Posted by "Cooke, Mark" <ma...@siemens.com>.
> -----Original Message-----
> From: Victor Sudakov [mailto:sudakov@sibptus.tomsk.ru] 
> Sent: 20 January 2011 08:18
> Subject: Re: Betr.: Re: "svnadmin load" a huge file
> 
> Colleagues,
> 
> I have finally completed a test cvs2svn conversion on an amd64 system.
> The peak memory requirement of svnadmin during the conversion was
> 9796M SIZE, 1880M RES. The resulting SVN repo size is 8.5G on disk.
> 
> "svnadmin dump --deltas" of this new SVN repo required 6692M SIZE,
> 2161M RES of memory at its peak.  Such memory requirements make this
> repo completely unusable on i386 systems.
> 
> The original CVS repo is 59M on disk with 17859 files (including those
> in the Attic) and total 23911 revisions (in SVN terms). All files are
> strictly text.
> 
> Something seems to be very suboptimal either about SVN itself or about
> the cvs2svn utility. I am especially surprised by the 8.5G size of the
> resulting SVN repository (though the result of "svnadmin dump 
> --deltas" 
> is 44M).
> 
> > - Copy your CVS repository (say /myreypository to /myrepositoryconv)
> > - In the copy move the ,v files into several subdirectories 
> (using the
> > operating system, not using CVS commands.)
> > - Convert the directories one at a time and load them into svn.
> > - Once loaded into svn you can move everything back into one folder
> >   (using svn commands) if desired.
> 
> Even if I do this, after moving everything back I will not be able to
> do "svnadmin dump" on an i386 system, perhaps unless I write some
> script which will iterate and keep track of dumped revision numbers.
> 
Did you also notice the --incremental option?  Is that what you mean by
'keeping track of revision numbers'?

http://svnbook.red-bean.com/nightly/en/svn.ref.svnadmin.c.dump.html

This allows you to dump the repo in sections (by specifying a revision
range)

You do not mention what verison of svn you are using but newer versions
allow the repository to be packed, would this help your storage issues?

http://svnbook.red-bean.com/nightly/en/svn.reposadmin.maint.html#svn.rep
osadmin.maint.diskspace.fsfspacking

~ mark c

Re: Betr.: Re: "svnadmin load" a huge file

Posted by Johan Corveleyn <jc...@gmail.com>.
On Thu, Jan 20, 2011 at 9:18 AM, Victor Sudakov
<su...@sibptus.tomsk.ru> wrote:
> Colleagues,
>
> I have finally completed a test cvs2svn conversion on an amd64 system.
> The peak memory requirement of svnadmin during the conversion was
> 9796M SIZE, 1880M RES. The resulting SVN repo size is 8.5G on disk.
>
> "svnadmin dump --deltas" of this new SVN repo required 6692M SIZE,
> 2161M RES of memory at its peak.  Such memory requirements make this
> repo completely unusable on i386 systems.
>
> The original CVS repo is 59M on disk with 17859 files (including those
> in the Attic) and total 23911 revisions (in SVN terms). All files are
> strictly text.
>
> Something seems to be very suboptimal either about SVN itself or about
> the cvs2svn utility. I am especially surprised by the 8.5G size of the
> resulting SVN repository (though the result of "svnadmin dump --deltas"
> is 44M).

Do you have a lot of files in the same directory? (are all those 17859
files in one single directory?)

I don't know the details, but I know that svn rev files (and probably
also some memory structures, explaining the huge memory usage) become
very big for commits in a directory that has many files.

It has something to do with the way SVN tracks directories (all
directory entries are always listed in full in those rev files).

At least, that's what I remember vaguely from some past discussions.
Maybe there is even an issue in the issue tracker for this (or
previous discussions on users- or dev-mailinglist), but I don't have
time to search now ...

If this is the case, a possible workaround could be that you
restructure the project in CVS, or in a copy of your CVS repository
(creating some subdirs, and moving ,v files into them). Of course, I
understand this may be an unworkable solution (depends on the amount
of flexibility you have in moving things around).

Cheers,
-- 
Johan

Re: Betr.: Re: "svnadmin load" a huge file

Posted by Victor Sudakov <su...@sibptus.tomsk.ru>.
Les Mikesell wrote:
> >On Tue, Feb 08, 2011 at 11:32:47PM +0600, Victor Sudakov wrote:
> >>After the 15000th commit, the size of the repository on disk is 5.5G
> >>with the working directory size being 120M. Besides, after several
> >>thousand commits to this directory SVN slows down considerably.  This
> >>must be some design flaw (or peculiarity if you like) of SVN.
> >
> >Probably related to the way directories are represented in the repository.
> >See http://svn.haxx.se/dev/archive-2011-02/0007.shtml
> >and also http://svn.haxx.se/dev/archive-2011-02/0014.shtml for some hints
> >to how this currently works.
> 
> I'd expect even local operations like the compare against the pristine 
> versions to decide what to commit to become slow when you put many 
> thousands of files in one directory because most filesystems aren't good 
> at that either (although they make fake it with caching).  It's one of 
> those "if it hurts, don't do it" things.

I did not know it would hurt until I tried to migrate this particular
repository from CVS to SVN.

FreeBSD by itself handles large directories very well due to its
dirhash feature.

-- 
Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
sip:sudakov@sibptus.tomsk.ru

Re: Betr.: Re: "svnadmin load" a huge file

Posted by Les Mikesell <le...@gmail.com>.
On 2/8/2011 1:34 PM, Stefan Sperling wrote:
> On Tue, Feb 08, 2011 at 11:32:47PM +0600, Victor Sudakov wrote:
>> After the 15000th commit, the size of the repository on disk is 5.5G
>> with the working directory size being 120M. Besides, after several
>> thousand commits to this directory SVN slows down considerably.  This
>> must be some design flaw (or peculiarity if you like) of SVN.
>
> Probably related to the way directories are represented in the repository.
> See http://svn.haxx.se/dev/archive-2011-02/0007.shtml
> and also http://svn.haxx.se/dev/archive-2011-02/0014.shtml for some hints
> to how this currently works.

I'd expect even local operations like the compare against the pristine 
versions to decide what to commit to become slow when you put many 
thousands of files in one directory because most filesystems aren't good 
at that either (although they make fake it with caching).  It's one of 
those "if it hurts, don't do it" things.

-- 
   Les Mikesell
    lesmikesell@gmail.com


Re: Betr.: Re: "svnadmin load" a huge file

Posted by Victor Sudakov <su...@sibptus.tomsk.ru>.
Stefan Sperling wrote:
> > After the 15000th commit, the size of the repository on disk is 5.5G
> > with the working directory size being 120M. Besides, after several
> > thousand commits to this directory SVN slows down considerably.  This
> > must be some design flaw (or peculiarity if you like) of SVN.
> 
> Probably related to the way directories are represented in the repository.
> See http://svn.haxx.se/dev/archive-2011-02/0007.shtml
> and also http://svn.haxx.se/dev/archive-2011-02/0014.shtml for some hints
> to how this currently works.

BTW I use the FSFS backend if it makes any difference.

-- 
Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
sip:sudakov@sibptus.tomsk.ru

Re: Betr.: Re: "svnadmin load" a huge file

Posted by Stefan Sperling <st...@elego.de>.
On Tue, Feb 08, 2011 at 11:32:47PM +0600, Victor Sudakov wrote:
> After the 15000th commit, the size of the repository on disk is 5.5G
> with the working directory size being 120M. Besides, after several
> thousand commits to this directory SVN slows down considerably.  This
> must be some design flaw (or peculiarity if you like) of SVN.

Probably related to the way directories are represented in the repository.
See http://svn.haxx.se/dev/archive-2011-02/0007.shtml
and also http://svn.haxx.se/dev/archive-2011-02/0014.shtml for some hints
to how this currently works.

Re: Betr.: Re: "svnadmin load" a huge file

Posted by Victor Sudakov <su...@sibptus.tomsk.ru>.
Johan Corveleyn wrote:

[dd]

> But that doesn't explain why the resulting repository is so large
> (compared to the original CVS repository). Sure, there might be memory
> usage problems in dump/load (it uses more memory than the resulting
> repository uses diskspace), but I think there is more going on.
> 
> That's why I'm guessing on rev files being large (and the
> corresponding memory structures) because of the amount of dir entries
> in each revision. I'm not that intimately familiar with how this is
> all represented, and how the rev files are structured and all that, so
> I'm just guessing ... I seem to remember something like this from
> another discussion in the past.

I have created a small testcase script:


#!/bin/sh

for i in `jot 15000`
do
cat > Testfile_${i}.txt << __END__
This is a small test file.
This is a small test file.
This is a small test file.
This is a small test file.
This is a small test file.
This is a small test file.
__END__

svn add Testfile_${i}.txt
svn commit -m "Iteration $i"
done

After the 15000th commit, the size of the repository on disk is 5.5G
with the working directory size being 120M. Besides, after several
thousand commits to this directory SVN slows down considerably.  This
must be some design flaw (or peculiarity if you like) of SVN.

-- 
Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
sip:sudakov@sibptus.tomsk.ru

Re: Betr.: Re: "svnadmin load" a huge file

Posted by Johan Corveleyn <jc...@gmail.com>.
On Thu, Jan 20, 2011 at 6:11 PM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> Victor Sudakov wrote on Thu, Jan 20, 2011 at 14:18:00 +0600:
>> Colleagues,
>>
>> I have finally completed a test cvs2svn conversion on an amd64 system.
>> The peak memory requirement of svnadmin during the conversion was
>> 9796M SIZE, 1880M RES. The resulting SVN repo size is 8.5G on disk.
>>
>> "svnadmin dump --deltas" of this new SVN repo required 6692M SIZE,
>> 2161M RES of memory at its peak.  Such memory requirements make this
>> repo completely unusable on i386 systems.
>>
>> The original CVS repo is 59M on disk with 17859 files (including those
>> in the Attic) and total 23911 revisions (in SVN terms). All files are
>> strictly text.
>>
>> Something seems to be very suboptimal either about SVN itself or about
>> the cvs2svn utility. I am especially surprised by the 8.5G size of the
>> resulting SVN repository (though the result of "svnadmin dump --deltas"
>> is 44M).
>>
>> > - Copy your CVS repository (say /myreypository to /myrepositoryconv)
>> > - In the copy move the ,v files into several subdirectories (using the
>> > operating system, not using CVS commands.)
>> > - Convert the directories one at a time and load them into svn.
>> > - Once loaded into svn you can move everything back into one folder
>> >   (using svn commands) if desired.
>>
>> Even if I do this, after moving everything back I will not be able to
>> do "svnadmin dump" on an i386 system, perhaps unless I write some
>> script which will iterate and keep track of dumped revision numbers.
>>
> That's not a nice result, but I think I said somewhere in this thread
> that there are known memory-usage bugs in svnadmin dump/load.  Which
> means the fix (as opposed to 'workaround') to this issue is to have
> someone (possibly you or someone you hire) look into those bugs.
>
> With a bit of luck, this will boil down to looking for some place where
> allocations should be done in a scratch_pool or iterpool instead of some
> long-lived result_pool (which may be called 'pool').  One can compile
> with APR pool debugging enabled to information about what's allocated
> from which pool.
>
> Paul Burba's work on the recent fixed-in-1.6.15
> DoS-via-memory-consumption CVE can serve as an example.
>
> Daniel
> (workarounds are plenty --- svnsync, incremental dump, whatnot --- they
> are discussed elsethread)

But that doesn't explain why the resulting repository is so large
(compared to the original CVS repository). Sure, there might be memory
usage problems in dump/load (it uses more memory than the resulting
repository uses diskspace), but I think there is more going on.

That's why I'm guessing on rev files being large (and the
corresponding memory structures) because of the amount of dir entries
in each revision. I'm not that intimately familiar with how this is
all represented, and how the rev files are structured and all that, so
I'm just guessing ... I seem to remember something like this from
another discussion in the past.

Cheers,
-- 
Johan

Re: Betr.: Re: "svnadmin load" a huge file

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
That's not a nice result, but I think I said somewhere in this thread
that there are known memory-usage bugs in svnadmin dump/load.  Which
means the fix (as opposed to 'workaround') to this issue is to have
someone (possibly you or someone you hire) look into those bugs.

With a bit of luck, this will boil down to looking for some place where
allocations should be done in a scratch_pool or iterpool instead of some
long-lived result_pool (which may be called 'pool').  One can compile
with APR pool debugging enabled to information about what's allocated
from which pool.

Paul Burba's work on the recent fixed-in-1.6.15
DoS-via-memory-consumption CVE can serve as an example.

Daniel
(workarounds are plenty --- svnsync, incremental dump, whatnot --- they
are discussed elsethread)


Victor Sudakov wrote on Thu, Jan 20, 2011 at 14:18:00 +0600:
> Colleagues,
> 
> I have finally completed a test cvs2svn conversion on an amd64 system.
> The peak memory requirement of svnadmin during the conversion was
> 9796M SIZE, 1880M RES. The resulting SVN repo size is 8.5G on disk.
> 
> "svnadmin dump --deltas" of this new SVN repo required 6692M SIZE,
> 2161M RES of memory at its peak.  Such memory requirements make this
> repo completely unusable on i386 systems.
> 
> The original CVS repo is 59M on disk with 17859 files (including those
> in the Attic) and total 23911 revisions (in SVN terms). All files are
> strictly text.
> 
> Something seems to be very suboptimal either about SVN itself or about
> the cvs2svn utility. I am especially surprised by the 8.5G size of the
> resulting SVN repository (though the result of "svnadmin dump --deltas" 
> is 44M).
> 
> > - Copy your CVS repository (say /myreypository to /myrepositoryconv)
> > - In the copy move the ,v files into several subdirectories (using the
> > operating system, not using CVS commands.)
> > - Convert the directories one at a time and load them into svn.
> > - Once loaded into svn you can move everything back into one folder
> >   (using svn commands) if desired.
> 
> Even if I do this, after moving everything back I will not be able to
> do "svnadmin dump" on an i386 system, perhaps unless I write some
> script which will iterate and keep track of dumped revision numbers.
> 
> -- 
> Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
> sip:sudakov@sibptus.tomsk.ru

Re: Betr.: Re: "svnadmin load" a huge file

Posted by Victor Sudakov <su...@sibptus.tomsk.ru>.
Colleagues,

I have finally completed a test cvs2svn conversion on an amd64 system.
The peak memory requirement of svnadmin during the conversion was
9796M SIZE, 1880M RES. The resulting SVN repo size is 8.5G on disk.

"svnadmin dump --deltas" of this new SVN repo required 6692M SIZE,
2161M RES of memory at its peak.  Such memory requirements make this
repo completely unusable on i386 systems.

The original CVS repo is 59M on disk with 17859 files (including those
in the Attic) and total 23911 revisions (in SVN terms). All files are
strictly text.

Something seems to be very suboptimal either about SVN itself or about
the cvs2svn utility. I am especially surprised by the 8.5G size of the
resulting SVN repository (though the result of "svnadmin dump --deltas" 
is 44M).

> - Copy your CVS repository (say /myreypository to /myrepositoryconv)
> - In the copy move the ,v files into several subdirectories (using the
> operating system, not using CVS commands.)
> - Convert the directories one at a time and load them into svn.
> - Once loaded into svn you can move everything back into one folder
>   (using svn commands) if desired.

Even if I do this, after moving everything back I will not be able to
do "svnadmin dump" on an i386 system, perhaps unless I write some
script which will iterate and keep track of dumped revision numbers.

-- 
Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
sip:sudakov@sibptus.tomsk.ru