You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Victor Sudakov <su...@sibptus.tomsk.ru> on 2010/12/31 03:07:32 UTC

"svnadmin load" a huge file

Colleagues, 

I have a CVS repository sized 54M with 17751 files. 

"cvs2svn --dumpfile" produces a dump sized 13G. svnadmin cannot load
this dump aborting with an out of memory condition on a FreeBSD
8.1-RELEASE box with 1G of RAM and 2.5G of swap.

I really need to convert this repository to SVN. What should i do? Any
advice is appreciated.

-- 
Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
sip:sudakov@sibptus.tomsk.ru

Re: "svnadmin load" a huge file

Posted by Victor Sudakov <su...@sibptus.tomsk.ru>.
Kevin Grover wrote:

[dd]

> 2) Don't use '--dumpfile' on cvs2svn, let cvs2svn load it into a subversion
> repo directly.

It did not make any difference.  Frankly speaking, I would be
surprised if it did.

Starting Subversion r10773 / 23520
Starting Subversion r10774 / 23520
Starting Subversion r10775 / 23520
ERROR: svnadmin failed with the following output while loading the dumpfile:

$


-- 
Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
sip:sudakov@sibptus.tomsk.ru

Re: "svnadmin load" a huge file

Posted by Kevin Grover <ke...@kevingrover.net>.
On Thu, Dec 30, 2010 at 19:07, Victor Sudakov <su...@sibptus.tomsk.ru>wrote:

> Colleagues,
>
> I have a CVS repository sized 54M with 17751 files.
>
> "cvs2svn --dumpfile" produces a dump sized 13G. svnadmin cannot load
> this dump aborting with an out of memory condition on a FreeBSD
> 8.1-RELEASE box with 1G of RAM and 2.5G of swap.
>
> I really need to convert this repository to SVN. What should i do? Any
> advice is appreciated.
>
> --
> Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
> sip:sudakov@sibptus.tomsk.ru <si...@sibptus.tomsk.ru>
>


Two suggestions
1) svndumptool split (already suggested)
2) Don't use '--dumpfile' on cvs2svn, let cvs2svn load it into a subversion
repo directly.

RE: "svnadmin load" a huge file

Posted by Ramesh Nadupalli <na...@gmail.com>.
Recently I was doing a migration from cvs to svn and experienced a strange
issue. It was ignoring tabs and spl characters (^M) inside the text files. 
Did anyone experience similar issues? 

Thanks
Ramesh

-----Original Message-----
From: Brian Brophy [mailto:brianmbrophy@gmail.com] 
Sent: Friday, December 31, 2010 10:36 AM
To: Victor Sudakov
Cc: users@subversion.apache.org
Subject: Re: "svnadmin load" a huge file

I migrated a large CVS repository (25-50 GB) to SVN years ago on SVN 1.3.
Our repo had many sections (projects) within it.  We had to migrate each
project independently so that it's team could coordinate when they migrated
to SVN.  As such, I dumped each project when ready and then svnadmin loaded
each dump into it's own path/root (so as not to overwrite anything
previously loaded and unrelated to this project's import).

So, you can do it by controlling which path/portion of CVS you use cvs2vn to
create the dump file from.

Brian


Victor Sudakov wrote:
> Daniel Shahaf wrote:
>   
>> Split the dumpfile to smaller dumpfiles
>>     
>
> How do I do that? I have not found such an option in cvs2svn. 
> I don't mind writing a script if I knew the idea how to split the dump.
> I haven't found any "svnadmin load" option to import part of a dump 
> either. man what?
>
>   
>> or try a newer version of svnadmin.
>>     
>
> I am using subversion-1.6.15, it seems to be the latest ported to 
> FreeBSD.
>
>   
>> (or dive into the source and help us plug that memory leak --- 
>> compile with APR pool debugging enabled)
>>     
>
> I will try to do that but unfortunately I need some immediate 
> workaround :(
>
>   


RE: Betr.: Re: "svnadmin load" a huge file

Posted by "Cooke, Mark" <ma...@siemens.com>.
> -----Original Message-----
> From: Victor Sudakov [mailto:sudakov@sibptus.tomsk.ru] 
> Sent: 20 January 2011 08:18
> Subject: Re: Betr.: Re: "svnadmin load" a huge file
> 
> Colleagues,
> 
> I have finally completed a test cvs2svn conversion on an amd64 system.
> The peak memory requirement of svnadmin during the conversion was
> 9796M SIZE, 1880M RES. The resulting SVN repo size is 8.5G on disk.
> 
> "svnadmin dump --deltas" of this new SVN repo required 6692M SIZE,
> 2161M RES of memory at its peak.  Such memory requirements make this
> repo completely unusable on i386 systems.
> 
> The original CVS repo is 59M on disk with 17859 files (including those
> in the Attic) and total 23911 revisions (in SVN terms). All files are
> strictly text.
> 
> Something seems to be very suboptimal either about SVN itself or about
> the cvs2svn utility. I am especially surprised by the 8.5G size of the
> resulting SVN repository (though the result of "svnadmin dump 
> --deltas" 
> is 44M).
> 
> > - Copy your CVS repository (say /myreypository to /myrepositoryconv)
> > - In the copy move the ,v files into several subdirectories 
> (using the
> > operating system, not using CVS commands.)
> > - Convert the directories one at a time and load them into svn.
> > - Once loaded into svn you can move everything back into one folder
> >   (using svn commands) if desired.
> 
> Even if I do this, after moving everything back I will not be able to
> do "svnadmin dump" on an i386 system, perhaps unless I write some
> script which will iterate and keep track of dumped revision numbers.
> 
Did you also notice the --incremental option?  Is that what you mean by
'keeping track of revision numbers'?

http://svnbook.red-bean.com/nightly/en/svn.ref.svnadmin.c.dump.html

This allows you to dump the repo in sections (by specifying a revision
range)

You do not mention what verison of svn you are using but newer versions
allow the repository to be packed, would this help your storage issues?

http://svnbook.red-bean.com/nightly/en/svn.reposadmin.maint.html#svn.rep
osadmin.maint.diskspace.fsfspacking

~ mark c

Re: Betr.: Re: "svnadmin load" a huge file

Posted by Johan Corveleyn <jc...@gmail.com>.
On Thu, Jan 20, 2011 at 9:18 AM, Victor Sudakov
<su...@sibptus.tomsk.ru> wrote:
> Colleagues,
>
> I have finally completed a test cvs2svn conversion on an amd64 system.
> The peak memory requirement of svnadmin during the conversion was
> 9796M SIZE, 1880M RES. The resulting SVN repo size is 8.5G on disk.
>
> "svnadmin dump --deltas" of this new SVN repo required 6692M SIZE,
> 2161M RES of memory at its peak.  Such memory requirements make this
> repo completely unusable on i386 systems.
>
> The original CVS repo is 59M on disk with 17859 files (including those
> in the Attic) and total 23911 revisions (in SVN terms). All files are
> strictly text.
>
> Something seems to be very suboptimal either about SVN itself or about
> the cvs2svn utility. I am especially surprised by the 8.5G size of the
> resulting SVN repository (though the result of "svnadmin dump --deltas"
> is 44M).

Do you have a lot of files in the same directory? (are all those 17859
files in one single directory?)

I don't know the details, but I know that svn rev files (and probably
also some memory structures, explaining the huge memory usage) become
very big for commits in a directory that has many files.

It has something to do with the way SVN tracks directories (all
directory entries are always listed in full in those rev files).

At least, that's what I remember vaguely from some past discussions.
Maybe there is even an issue in the issue tracker for this (or
previous discussions on users- or dev-mailinglist), but I don't have
time to search now ...

If this is the case, a possible workaround could be that you
restructure the project in CVS, or in a copy of your CVS repository
(creating some subdirs, and moving ,v files into them). Of course, I
understand this may be an unworkable solution (depends on the amount
of flexibility you have in moving things around).

Cheers,
-- 
Johan

Re: Betr.: Re: "svnadmin load" a huge file

Posted by Victor Sudakov <su...@sibptus.tomsk.ru>.
Les Mikesell wrote:
> >On Tue, Feb 08, 2011 at 11:32:47PM +0600, Victor Sudakov wrote:
> >>After the 15000th commit, the size of the repository on disk is 5.5G
> >>with the working directory size being 120M. Besides, after several
> >>thousand commits to this directory SVN slows down considerably.  This
> >>must be some design flaw (or peculiarity if you like) of SVN.
> >
> >Probably related to the way directories are represented in the repository.
> >See http://svn.haxx.se/dev/archive-2011-02/0007.shtml
> >and also http://svn.haxx.se/dev/archive-2011-02/0014.shtml for some hints
> >to how this currently works.
> 
> I'd expect even local operations like the compare against the pristine 
> versions to decide what to commit to become slow when you put many 
> thousands of files in one directory because most filesystems aren't good 
> at that either (although they make fake it with caching).  It's one of 
> those "if it hurts, don't do it" things.

I did not know it would hurt until I tried to migrate this particular
repository from CVS to SVN.

FreeBSD by itself handles large directories very well due to its
dirhash feature.

-- 
Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
sip:sudakov@sibptus.tomsk.ru

Re: Betr.: Re: "svnadmin load" a huge file

Posted by Les Mikesell <le...@gmail.com>.
On 2/8/2011 1:34 PM, Stefan Sperling wrote:
> On Tue, Feb 08, 2011 at 11:32:47PM +0600, Victor Sudakov wrote:
>> After the 15000th commit, the size of the repository on disk is 5.5G
>> with the working directory size being 120M. Besides, after several
>> thousand commits to this directory SVN slows down considerably.  This
>> must be some design flaw (or peculiarity if you like) of SVN.
>
> Probably related to the way directories are represented in the repository.
> See http://svn.haxx.se/dev/archive-2011-02/0007.shtml
> and also http://svn.haxx.se/dev/archive-2011-02/0014.shtml for some hints
> to how this currently works.

I'd expect even local operations like the compare against the pristine 
versions to decide what to commit to become slow when you put many 
thousands of files in one directory because most filesystems aren't good 
at that either (although they make fake it with caching).  It's one of 
those "if it hurts, don't do it" things.

-- 
   Les Mikesell
    lesmikesell@gmail.com


Re: Betr.: Re: "svnadmin load" a huge file

Posted by Victor Sudakov <su...@sibptus.tomsk.ru>.
Stefan Sperling wrote:
> > After the 15000th commit, the size of the repository on disk is 5.5G
> > with the working directory size being 120M. Besides, after several
> > thousand commits to this directory SVN slows down considerably.  This
> > must be some design flaw (or peculiarity if you like) of SVN.
> 
> Probably related to the way directories are represented in the repository.
> See http://svn.haxx.se/dev/archive-2011-02/0007.shtml
> and also http://svn.haxx.se/dev/archive-2011-02/0014.shtml for some hints
> to how this currently works.

BTW I use the FSFS backend if it makes any difference.

-- 
Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
sip:sudakov@sibptus.tomsk.ru

Re: Betr.: Re: "svnadmin load" a huge file

Posted by Stefan Sperling <st...@elego.de>.
On Tue, Feb 08, 2011 at 11:32:47PM +0600, Victor Sudakov wrote:
> After the 15000th commit, the size of the repository on disk is 5.5G
> with the working directory size being 120M. Besides, after several
> thousand commits to this directory SVN slows down considerably.  This
> must be some design flaw (or peculiarity if you like) of SVN.

Probably related to the way directories are represented in the repository.
See http://svn.haxx.se/dev/archive-2011-02/0007.shtml
and also http://svn.haxx.se/dev/archive-2011-02/0014.shtml for some hints
to how this currently works.

Re: Betr.: Re: "svnadmin load" a huge file

Posted by Victor Sudakov <su...@sibptus.tomsk.ru>.
Johan Corveleyn wrote:

[dd]

> But that doesn't explain why the resulting repository is so large
> (compared to the original CVS repository). Sure, there might be memory
> usage problems in dump/load (it uses more memory than the resulting
> repository uses diskspace), but I think there is more going on.
> 
> That's why I'm guessing on rev files being large (and the
> corresponding memory structures) because of the amount of dir entries
> in each revision. I'm not that intimately familiar with how this is
> all represented, and how the rev files are structured and all that, so
> I'm just guessing ... I seem to remember something like this from
> another discussion in the past.

I have created a small testcase script:


#!/bin/sh

for i in `jot 15000`
do
cat > Testfile_${i}.txt << __END__
This is a small test file.
This is a small test file.
This is a small test file.
This is a small test file.
This is a small test file.
This is a small test file.
__END__

svn add Testfile_${i}.txt
svn commit -m "Iteration $i"
done

After the 15000th commit, the size of the repository on disk is 5.5G
with the working directory size being 120M. Besides, after several
thousand commits to this directory SVN slows down considerably.  This
must be some design flaw (or peculiarity if you like) of SVN.

-- 
Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
sip:sudakov@sibptus.tomsk.ru

Re: Betr.: Re: "svnadmin load" a huge file

Posted by Johan Corveleyn <jc...@gmail.com>.
On Thu, Jan 20, 2011 at 6:11 PM, Daniel Shahaf <d....@daniel.shahaf.name> wrote:
> Victor Sudakov wrote on Thu, Jan 20, 2011 at 14:18:00 +0600:
>> Colleagues,
>>
>> I have finally completed a test cvs2svn conversion on an amd64 system.
>> The peak memory requirement of svnadmin during the conversion was
>> 9796M SIZE, 1880M RES. The resulting SVN repo size is 8.5G on disk.
>>
>> "svnadmin dump --deltas" of this new SVN repo required 6692M SIZE,
>> 2161M RES of memory at its peak.  Such memory requirements make this
>> repo completely unusable on i386 systems.
>>
>> The original CVS repo is 59M on disk with 17859 files (including those
>> in the Attic) and total 23911 revisions (in SVN terms). All files are
>> strictly text.
>>
>> Something seems to be very suboptimal either about SVN itself or about
>> the cvs2svn utility. I am especially surprised by the 8.5G size of the
>> resulting SVN repository (though the result of "svnadmin dump --deltas"
>> is 44M).
>>
>> > - Copy your CVS repository (say /myreypository to /myrepositoryconv)
>> > - In the copy move the ,v files into several subdirectories (using the
>> > operating system, not using CVS commands.)
>> > - Convert the directories one at a time and load them into svn.
>> > - Once loaded into svn you can move everything back into one folder
>> >   (using svn commands) if desired.
>>
>> Even if I do this, after moving everything back I will not be able to
>> do "svnadmin dump" on an i386 system, perhaps unless I write some
>> script which will iterate and keep track of dumped revision numbers.
>>
> That's not a nice result, but I think I said somewhere in this thread
> that there are known memory-usage bugs in svnadmin dump/load.  Which
> means the fix (as opposed to 'workaround') to this issue is to have
> someone (possibly you or someone you hire) look into those bugs.
>
> With a bit of luck, this will boil down to looking for some place where
> allocations should be done in a scratch_pool or iterpool instead of some
> long-lived result_pool (which may be called 'pool').  One can compile
> with APR pool debugging enabled to information about what's allocated
> from which pool.
>
> Paul Burba's work on the recent fixed-in-1.6.15
> DoS-via-memory-consumption CVE can serve as an example.
>
> Daniel
> (workarounds are plenty --- svnsync, incremental dump, whatnot --- they
> are discussed elsethread)

But that doesn't explain why the resulting repository is so large
(compared to the original CVS repository). Sure, there might be memory
usage problems in dump/load (it uses more memory than the resulting
repository uses diskspace), but I think there is more going on.

That's why I'm guessing on rev files being large (and the
corresponding memory structures) because of the amount of dir entries
in each revision. I'm not that intimately familiar with how this is
all represented, and how the rev files are structured and all that, so
I'm just guessing ... I seem to remember something like this from
another discussion in the past.

Cheers,
-- 
Johan

Re: Betr.: Re: "svnadmin load" a huge file

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
That's not a nice result, but I think I said somewhere in this thread
that there are known memory-usage bugs in svnadmin dump/load.  Which
means the fix (as opposed to 'workaround') to this issue is to have
someone (possibly you or someone you hire) look into those bugs.

With a bit of luck, this will boil down to looking for some place where
allocations should be done in a scratch_pool or iterpool instead of some
long-lived result_pool (which may be called 'pool').  One can compile
with APR pool debugging enabled to information about what's allocated
from which pool.

Paul Burba's work on the recent fixed-in-1.6.15
DoS-via-memory-consumption CVE can serve as an example.

Daniel
(workarounds are plenty --- svnsync, incremental dump, whatnot --- they
are discussed elsethread)


Victor Sudakov wrote on Thu, Jan 20, 2011 at 14:18:00 +0600:
> Colleagues,
> 
> I have finally completed a test cvs2svn conversion on an amd64 system.
> The peak memory requirement of svnadmin during the conversion was
> 9796M SIZE, 1880M RES. The resulting SVN repo size is 8.5G on disk.
> 
> "svnadmin dump --deltas" of this new SVN repo required 6692M SIZE,
> 2161M RES of memory at its peak.  Such memory requirements make this
> repo completely unusable on i386 systems.
> 
> The original CVS repo is 59M on disk with 17859 files (including those
> in the Attic) and total 23911 revisions (in SVN terms). All files are
> strictly text.
> 
> Something seems to be very suboptimal either about SVN itself or about
> the cvs2svn utility. I am especially surprised by the 8.5G size of the
> resulting SVN repository (though the result of "svnadmin dump --deltas" 
> is 44M).
> 
> > - Copy your CVS repository (say /myreypository to /myrepositoryconv)
> > - In the copy move the ,v files into several subdirectories (using the
> > operating system, not using CVS commands.)
> > - Convert the directories one at a time and load them into svn.
> > - Once loaded into svn you can move everything back into one folder
> >   (using svn commands) if desired.
> 
> Even if I do this, after moving everything back I will not be able to
> do "svnadmin dump" on an i386 system, perhaps unless I write some
> script which will iterate and keep track of dumped revision numbers.
> 
> -- 
> Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
> sip:sudakov@sibptus.tomsk.ru

Re: Betr.: Re: "svnadmin load" a huge file

Posted by Victor Sudakov <su...@sibptus.tomsk.ru>.
Colleagues,

I have finally completed a test cvs2svn conversion on an amd64 system.
The peak memory requirement of svnadmin during the conversion was
9796M SIZE, 1880M RES. The resulting SVN repo size is 8.5G on disk.

"svnadmin dump --deltas" of this new SVN repo required 6692M SIZE,
2161M RES of memory at its peak.  Such memory requirements make this
repo completely unusable on i386 systems.

The original CVS repo is 59M on disk with 17859 files (including those
in the Attic) and total 23911 revisions (in SVN terms). All files are
strictly text.

Something seems to be very suboptimal either about SVN itself or about
the cvs2svn utility. I am especially surprised by the 8.5G size of the
resulting SVN repository (though the result of "svnadmin dump --deltas" 
is 44M).

> - Copy your CVS repository (say /myreypository to /myrepositoryconv)
> - In the copy move the ,v files into several subdirectories (using the
> operating system, not using CVS commands.)
> - Convert the directories one at a time and load them into svn.
> - Once loaded into svn you can move everything back into one folder
>   (using svn commands) if desired.

Even if I do this, after moving everything back I will not be able to
do "svnadmin dump" on an i386 system, perhaps unless I write some
script which will iterate and keep track of dumped revision numbers.

-- 
Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
sip:sudakov@sibptus.tomsk.ru

Betr.: Re: "svnadmin load" a huge file

Posted by Jan Keirse <ja...@tvh.be>.
> On 1/7/2011 7:57 AM, Victor Sudakov wrote:
> > It would be fine if the project in question did not contain almost all
> > the files in one directory. You may call the layout silly, but CVS 
does
> > not seem to mind. OTOH, I would have distributed the files over
> > several subdirectories, but CVS does not handle moving files well.
> >
> > I wonder if cvs2svn is to blame that it produces a dump svnadmin
> > cannot load. Or I am always risking that "svnadmin dump" may one day
> > produce a dump a subsequent "svnadmin load" will be unable to swallow?
> >
> > I mean, if by hook or by crook, by using third party utilities like
> > svndumptool, I will eventually be able to convert this project from
> > CVS to SVN. Is there a chance that a subsequent dump will be again
> > unloadable?
> 

Have you tried the following:
- Copy your CVS repository (say /myreypository to /myrepositoryconv)
- In the copy move the ,v files into several subdirectories (using the 
operating system, not using CVS commands.) 
- Convert the directories one at a time and load them into svn. 
- Once loaded into svn you can move everything back into one folder (using 
svn commands) if desired. 

Manually moving around ,v files in a cvs repository is generally not 
adviced primarily because it will annoy users with checked out working 
copies (and it's unversioned), but those working copies won't be of any 
use anyway once the server has been migrated to subversion so that 
shouldn't be a problem, so I don't think it could cause problems, but keep 
the original repository around just in case...

Kind Regards,

JAN KEIRSE
ICT-DEPARTMENT
Software quality & Systems: Software Engineer

**** DISCLAIMER ****

http://www.tvh.com/newen2/emaildisclaimer/default.html 

"This message is delivered to all addressees subject to the conditions
set forth in the attached disclaimer, which is an integral part of this
message."


Re: "svnadmin load" a huge file

Posted by Victor Sudakov <su...@sibptus.tomsk.ru>.
Les Mikesell wrote:

[dd]

> > Does it mean that on a 32bit OS I am stuck hopelessly? A dump/load
> > cycle will eventually fail as the repository grows beyond a certain
> > size?
> 
> A 'real' svnadmin dump would let you specify revision ranges so you
> could do it incrementally but cvs2svn doesn't have an equivalent
> option other than splitting out directories. Perhaps someone could do
> the load on a larger 64-bit machine and dump it back in smaller ranges
> if you can't find a better way to split it.

I have also noticed that the --deltas option dramatically decreases
the dump size (it becomes megabytes instead of gigabytes).
Unfortunately cvs2svn cannot do deltas.  I will try to load on a
64-bit machine and dump it back with the --deltas option.

-- 
Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
sip:sudakov@sibptus.tomsk.ru

Re: "svnadmin load" a huge file

Posted by Victor Sudakov <su...@sibptus.tomsk.ru>.
Johan Corveleyn wrote:
> 
> Like Stephen Connolly suggested a week ago: I think you should take a
> look at svndumptool: http://svn.borg.ch/svndumptool/
> 
> I've never used it myself, but in the README.txt file, there is
> mention of a subcommand "split":

I am already trying it but it turns out not as easy as it seems. I
will share what comes of it.

-- 
Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
sip:sudakov@sibptus.tomsk.ru

Re: "svnadmin load" a huge file

Posted by Johan Corveleyn <jc...@gmail.com>.
On Fri, Jan 7, 2011 at 8:47 PM, Les Mikesell <le...@gmail.com> wrote:
> On 1/7/2011 1:31 PM, Victor Sudakov wrote:
>
>>>
>>> I don't think you are hitting some absolute limit in the software here,
>>> just running out of RAM on your particular machine.  Can you do the
>>> conversion on a machine with more RAM?
>>
>> I ran "svnadmin load" on a machine with 1 GB RAM and 25 GB swap (added
>> so much swap specially for the occasion). svnadmin crashed after
>> reaching the SIZE about 2.5 GB.
>>
>> Is 1 GB RAM and 25 GB swap not enough?
>
> If it is a 32bit OS, you'll most likely hit a per-process limit at 2 or 4
> gigs.  Or maybe some quota setting before that.

Like Stephen Connolly suggested a week ago: I think you should take a
look at svndumptool: http://svn.borg.ch/svndumptool/

I've never used it myself, but in the README.txt file, there is
mention of a subcommand "split":

[[[
Split
-----

Splits a dump file into multiple smaller dump files.

svndumptool.py split inputfile [startrev endrev filename]...

options:
  --version   show program's version number and exit
  -h, --help  show this help message and exit

Known bugs:
 * None
]]]

HTH
-- 
Johan

Re: "svnadmin load" a huge file

Posted by Les Mikesell <le...@gmail.com>.
On Sat, Jan 8, 2011 at 4:33 AM, Victor Sudakov <su...@sibptus.tomsk.ru> wrote:

>> >I ran "svnadmin load" on a machine with 1 GB RAM and 25 GB swap (added
>> >so much swap specially for the occasion). svnadmin crashed after
>> >reaching the SIZE about 2.5 GB.
>> >
>> >Is 1 GB RAM and 25 GB swap not enough?
>>
>> If it is a 32bit OS, you'll most likely hit a per-process limit at 2 or
>> 4 gigs.  Or maybe some quota setting before that.
>
> The more I think about it, the more likely it seems.
>
> Does it mean that on a 32bit OS I am stuck hopelessly? A dump/load
> cycle will eventually fail as the repository grows beyond a certain
> size?

A 'real' svnadmin dump would let you specify revision ranges so you
could do it incrementally but cvs2svn doesn't have an equivalent
option other than splitting out directories. Perhaps someone could do
the load on a larger 64-bit machine and dump it back in smaller ranges
if you can't find a better way to split it.

-- 
   Les Mikesell
    lesmikesell@gmail.com

Re: "svnadmin load" a huge file

Posted by Victor Sudakov <su...@sibptus.tomsk.ru>.
Les Mikesell wrote:
> 
> >>I don't think you are hitting some absolute limit in the software here,
> >>just running out of RAM on your particular machine.  Can you do the
> >>conversion on a machine with more RAM?
> >
> >I ran "svnadmin load" on a machine with 1 GB RAM and 25 GB swap (added
> >so much swap specially for the occasion). svnadmin crashed after
> >reaching the SIZE about 2.5 GB.
> >
> >Is 1 GB RAM and 25 GB swap not enough?
> 
> If it is a 32bit OS, you'll most likely hit a per-process limit at 2 or 
> 4 gigs.  Or maybe some quota setting before that.

The more I think about it, the more likely it seems.

Does it mean that on a 32bit OS I am stuck hopelessly? A dump/load
cycle will eventually fail as the repository grows beyond a certain
size?

BTW here are the limits for the svn user:

$ whoami 
svn
$ limits
Resource limits (current):
  cputime              infinity secs
  filesize             infinity kB
  datasize               524288 kB
  stacksize               65536 kB
  coredumpsize         infinity kB
  memoryuse            infinity kB
  memorylocked         infinity kB
  maxprocesses             5547
  openfiles               11095
  sbsize               infinity bytes
  vmemoryuse           infinity kB
  pseudo-terminals     infinity
  swapuse              infinity kB
$ uname -srm
FreeBSD 8.1-RELEASE-p2 i386


-- 
Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
sip:sudakov@sibptus.tomsk.ru

Re: "svnadmin load" a huge file

Posted by Les Mikesell <le...@gmail.com>.
On 1/7/2011 1:31 PM, Victor Sudakov wrote:

>>
>> I don't think you are hitting some absolute limit in the software here,
>> just running out of RAM on your particular machine.  Can you do the
>> conversion on a machine with more RAM?
>
> I ran "svnadmin load" on a machine with 1 GB RAM and 25 GB swap (added
> so much swap specially for the occasion). svnadmin crashed after
> reaching the SIZE about 2.5 GB.
>
> Is 1 GB RAM and 25 GB swap not enough?

If it is a 32bit OS, you'll most likely hit a per-process limit at 2 or 
4 gigs.  Or maybe some quota setting before that.

-- 
   Les Mikesell
    lesmikesell@gmail.com

Re: "svnadmin load" a huge file

Posted by Victor Sudakov <su...@sibptus.tomsk.ru>.
Les Mikesell wrote:
> >
> >>>>I migrated a large CVS repository (25-50 GB) to SVN years ago on SVN
> >>>>1.3.  Our repo had many sections (projects) within it.  We had to
> >>>>migrate each project independently so that it's team could coordinate
> >>>>when they migrated to SVN.  As such, I dumped each project when ready
> >>>>and then svnadmin loaded each dump into it's own path/root (so as not to
> >>>>overwrite anything previously loaded and unrelated to this project's
> >>>>import).
> >>>>
> 
> >
> >It would be fine if the project in question did not contain almost all
> >the files in one directory. You may call the layout silly, but CVS does
> >not seem to mind. OTOH, I would have distributed the files over
> >several subdirectories, but CVS does not handle moving files well.
> >
> >I wonder if cvs2svn is to blame that it produces a dump svnadmin
> >cannot load. Or I am always risking that "svnadmin dump" may one day
> >produce a dump a subsequent "svnadmin load" will be unable to swallow?
> >
> >I mean, if by hook or by crook, by using third party utilities like
> >svndumptool, I will eventually be able to convert this project from
> >CVS to SVN. Is there a chance that a subsequent dump will be again
> >unloadable?
> 
> I don't think you are hitting some absolute limit in the software here, 
> just running out of RAM on your particular machine.  Can you do the 
> conversion on a machine with more RAM?

I ran "svnadmin load" on a machine with 1 GB RAM and 25 GB swap (added
so much swap specially for the occasion). svnadmin crashed after
reaching the SIZE about 2.5 GB.

Is 1 GB RAM and 25 GB swap not enough?

I don't know if this gdb output will be useful:

(gdb) where
#0  0x2841e117 in kill () from /lib/libc.so.7
#1  0x2841e076 in raise () from /lib/libc.so.7
#2  0x2841cc4a in abort () from /lib/libc.so.7
#3  0x28116ec5 in abort_on_pool_failure (retcode=Could not find the frame base f
or "abort_on_pool_failure".
)
    at subversion/libsvn_subr/pool.c:49
#4  0x283095fb in apr_palloc (pool=0xba46d018, in_size=204800)
    at memory/unix/apr_pools.c:663
#5  0x280ebad3 in svn_txdelta_target_push (
    handler=0x280eb140 <window_handler>, handler_baton=0x5b26c058,
    source=0xba470738, pool=0xba46d018)
    at subversion/libsvn_delta/text_delta.c:528
#6  0x280d4d62 in svn_fs_fs__set_contents (stream=0xbfbfe7e4, fs=0x28512020,
    noderev=0x27dea528, pool=0x2852a018)
    at subversion/libsvn_fs_fs/fs_fs.c:5066
#7  0x280c9d52 in svn_fs_fs__dag_get_edit_stream (contents=0x2852a138,
    file=0x5b2e61a0, pool=0x2852a018) at subversion/libsvn_fs_fs/dag.c:997
#8  0x280de42e in fs_apply_text (contents_p=0xbfbfe904, root=0x5aef8058,
    path=0x2852a080 "ns2/trunk/tomsk.ru/SOA", result_checksum=0x2852a110,
    pool=0x2852a018) at subversion/libsvn_fs_fs/tree.c:2615
#9  0x280bf44e in svn_fs_apply_text (contents_p=0xbfbfe904, root=0x5aef8058,
    path=0x2852a080 "ns2/trunk/tomsk.ru/SOA",
    result_checksum=0x2852a0e8 "b03cbddfbc11be113cbf675862eb971e",
    pool=0x2852a018) at subversion/libsvn_fs/fs-loader.c:1096
---Type <return> to continue, or q <return> to quit---

-- 
Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
sip:sudakov@sibptus.tomsk.ru

Re: "svnadmin load" a huge file

Posted by Stephen Connolly <st...@gmail.com>.
On 10 January 2011 08:30, Michael Haggerty <mh...@alum.mit.edu> wrote:
> On 01/07/2011 08:38 PM, Victor Sudakov wrote:
>> Daniel Shahaf wrote:
>>> I don't know cvs2svn, but it could have a --sharded-output option, so eg
>>> it would produce a dumpfile per 1000 revisions, rather than one huge
>>> dumpfile.
>>
>> cvs2svn-2.3.0_2 does not seem to have such an option:
>> "cvs2svn: error: no such option: --sharded-output"
>
> cvs2svn doesn't have such an option, but it wouldn't be very difficult
> to implement.  Let me know if you would like some pointers to help you
> get started implementing it.
>
> Michael

Hello... I said this a week ago

Split
-----

Splits a dump file into multiple smaller dump files.

svndumptool.py split inputfile [startrev endrev filename]...

options:
  --version   show program's version number and exit
  -h, --help  show this help message and exit

Known bugs:
 * None


No need to go implementing anything

-Stephen
>
> --
> Michael Haggerty
> mhagger@alum.mit.edu
> http://softwareswirl.blogspot.com/
>

Re: "svnadmin load" a huge file

Posted by Michael Haggerty <mh...@alum.mit.edu>.
On 01/07/2011 08:38 PM, Victor Sudakov wrote:
> Daniel Shahaf wrote:
>> I don't know cvs2svn, but it could have a --sharded-output option, so eg
>> it would produce a dumpfile per 1000 revisions, rather than one huge
>> dumpfile.
> 
> cvs2svn-2.3.0_2 does not seem to have such an option:
> "cvs2svn: error: no such option: --sharded-output"

cvs2svn doesn't have such an option, but it wouldn't be very difficult
to implement.  Let me know if you would like some pointers to help you
get started implementing it.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

Re: "svnadmin load" a huge file

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Victor Sudakov wrote on Sat, Jan 08, 2011 at 01:38:00 +0600:
> Daniel Shahaf wrote:
> 
> [dd]
> 
> > 
> > I believe there are known issues with memory usage in svnadmin.  See the
> > issue tracker.
> 
> Namely?
> 

Search for 'svnadmin' and you should find it.

> > 
> > I don't know cvs2svn, but it could have a --sharded-output option, so eg
> > it would produce a dumpfile per 1000 revisions, rather than one huge
> > dumpfile.
> 
> cvs2svn-2.3.0_2 does not seem to have such an option:
> "cvs2svn: error: no such option: --sharded-output"
> 

I said "could have", which means I don't know whether or not such an
option already exists and if so how it's called.

> 
> -- 
> Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
> sip:sudakov@sibptus.tomsk.ru

Re: "svnadmin load" a huge file

Posted by Victor Sudakov <su...@sibptus.tomsk.ru>.
Daniel Shahaf wrote:

[dd]

> 
> I believe there are known issues with memory usage in svnadmin.  See the
> issue tracker.

Namely?

> 
> I don't know cvs2svn, but it could have a --sharded-output option, so eg
> it would produce a dumpfile per 1000 revisions, rather than one huge
> dumpfile.

cvs2svn-2.3.0_2 does not seem to have such an option:
"cvs2svn: error: no such option: --sharded-output"


-- 
Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
sip:sudakov@sibptus.tomsk.ru

Re: "svnadmin load" a huge file

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Les Mikesell wrote on Fri, Jan 07, 2011 at 10:37:12 -0600:
> On 1/7/2011 7:57 AM, Victor Sudakov wrote:
>>
>>>>> I migrated a large CVS repository (25-50 GB) to SVN years ago on SVN
>>>>> 1.3.  Our repo had many sections (projects) within it.  We had to
>>>>> migrate each project independently so that it's team could coordinate
>>>>> when they migrated to SVN.  As such, I dumped each project when ready
>>>>> and then svnadmin loaded each dump into it's own path/root (so as not to
>>>>> overwrite anything previously loaded and unrelated to this project's
>>>>> import).
>>>>>
>
>>
>> It would be fine if the project in question did not contain almost all
>> the files in one directory. You may call the layout silly, but CVS does
>> not seem to mind. OTOH, I would have distributed the files over
>> several subdirectories, but CVS does not handle moving files well.
>>
>> I wonder if cvs2svn is to blame that it produces a dump svnadmin
>> cannot load. Or I am always risking that "svnadmin dump" may one day
>> produce a dump a subsequent "svnadmin load" will be unable to swallow?
>>
>> I mean, if by hook or by crook, by using third party utilities like
>> svndumptool, I will eventually be able to convert this project from
>> CVS to SVN. Is there a chance that a subsequent dump will be again
>> unloadable?
>
> I don't think you are hitting some absolute limit in the software here,  
> just running out of RAM on your particular machine.  Can you do the  
> conversion on a machine with more RAM?
>

I believe there are known issues with memory usage in svnadmin.  See the
issue tracker.

I don't know cvs2svn, but it could have a --sharded-output option, so eg
it would produce a dumpfile per 1000 revisions, rather than one huge
dumpfile.



> -- 
>   Les Mikesell
>    lesmikesell@gmail.com

Re: "svnadmin load" a huge file

Posted by Les Mikesell <le...@gmail.com>.
On 1/7/2011 7:57 AM, Victor Sudakov wrote:
>
>>>> I migrated a large CVS repository (25-50 GB) to SVN years ago on SVN
>>>> 1.3.  Our repo had many sections (projects) within it.  We had to
>>>> migrate each project independently so that it's team could coordinate
>>>> when they migrated to SVN.  As such, I dumped each project when ready
>>>> and then svnadmin loaded each dump into it's own path/root (so as not to
>>>> overwrite anything previously loaded and unrelated to this project's
>>>> import).
>>>>

>
> It would be fine if the project in question did not contain almost all
> the files in one directory. You may call the layout silly, but CVS does
> not seem to mind. OTOH, I would have distributed the files over
> several subdirectories, but CVS does not handle moving files well.
>
> I wonder if cvs2svn is to blame that it produces a dump svnadmin
> cannot load. Or I am always risking that "svnadmin dump" may one day
> produce a dump a subsequent "svnadmin load" will be unable to swallow?
>
> I mean, if by hook or by crook, by using third party utilities like
> svndumptool, I will eventually be able to convert this project from
> CVS to SVN. Is there a chance that a subsequent dump will be again
> unloadable?

I don't think you are hitting some absolute limit in the software here, 
just running out of RAM on your particular machine.  Can you do the 
conversion on a machine with more RAM?

-- 
   Les Mikesell
    lesmikesell@gmail.com

Re: "svnadmin load" a huge file

Posted by Victor Sudakov <su...@sibptus.tomsk.ru>.
Brian Brophy wrote:
> >  
> >>I migrated a large CVS repository (25-50 GB) to SVN years ago on SVN 
> >>1.3.  Our repo had many sections (projects) within it.  We had to 
> >>migrate each project independently so that it's team could coordinate 
> >>when they migrated to SVN.  As such, I dumped each project when ready 
> >>and then svnadmin loaded each dump into it's own path/root (so as not to 
> >>overwrite anything previously loaded and unrelated to this project's 
> >>import).
> >>
> >>So, you can do it by controlling which path/portion of CVS you use 
> >>cvs2vn to create the dump file from.
> >>    
> >
> >The CVS repository in question (with the size 54M with 17751 files) is
> >exactly one project. It's the history of a geographical DNS zone for
> >more than 10 years.

> Fair enough, the same pattern is still applicable.  For example, in our 
> CVS repo what separated one "project" from another was basically a 
> root-level folder.
> 
> In kind, you could similarly use cvs2svn to "chunk/dump" subdirectories 
> at a time.
> 
> For example, if in CVS you have something like:
> /Folder1
> /Folder2
> /Folder3
> 
> ... you run cvs2svn three times, once for each subdirectory, producing 
> folder1.dump, folder2.dump, and folder3.dump respectively.
> 
> Then, svnadmin load each individually:

It would be fine if the project in question did not contain almost all
the files in one directory. You may call the layout silly, but CVS does
not seem to mind. OTOH, I would have distributed the files over
several subdirectories, but CVS does not handle moving files well.

I wonder if cvs2svn is to blame that it produces a dump svnadmin
cannot load. Or I am always risking that "svnadmin dump" may one day
produce a dump a subsequent "svnadmin load" will be unable to swallow?

I mean, if by hook or by crook, by using third party utilities like
svndumptool, I will eventually be able to convert this project from
CVS to SVN. Is there a chance that a subsequent dump will be again
unloadable?

-- 
Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
sip:sudakov@sibptus.tomsk.ru

Re: "svnadmin load" a huge file

Posted by Brian Brophy <br...@gmail.com>.
Fair enough, the same pattern is still applicable.  For example, in our 
CVS repo what separated one "project" from another was basically a 
root-level folder.

In kind, you could similarly use cvs2svn to "chunk/dump" subdirectories 
at a time.

For example, if in CVS you have something like:
/Folder1
/Folder2
/Folder3

... you run cvs2svn three times, once for each subdirectory, producing 
folder1.dump, folder2.dump, and folder3.dump respectively.

Then, svnadmin load each individually:
 - manually create the root folders: Folder1, Folder2, Folder3
 - svnadmin load --parent-dir Folder1 /path/to/svn/repo < folder1.dump
 - svnadmin load --parent-dir Folder2 /path/to/svn/repo < folder2.dump
 - svnadmin load --parent-dir Folder3 /path/to/svn/repo < folder3.dump



Victor Sudakov wrote:
> Brian Brophy wrote:
>   
>> I migrated a large CVS repository (25-50 GB) to SVN years ago on SVN 
>> 1.3.  Our repo had many sections (projects) within it.  We had to 
>> migrate each project independently so that it's team could coordinate 
>> when they migrated to SVN.  As such, I dumped each project when ready 
>> and then svnadmin loaded each dump into it's own path/root (so as not to 
>> overwrite anything previously loaded and unrelated to this project's 
>> import).
>>
>> So, you can do it by controlling which path/portion of CVS you use 
>> cvs2vn to create the dump file from.
>>     
>
> The CVS repository in question (with the size 54M with 17751 files) is
> exactly one project. It's the history of a geographical DNS zone for
> more than 10 years.
>
>   

Re: "svnadmin load" a huge file

Posted by Victor Sudakov <su...@sibptus.tomsk.ru>.
Brian Brophy wrote:
> I migrated a large CVS repository (25-50 GB) to SVN years ago on SVN 
> 1.3.  Our repo had many sections (projects) within it.  We had to 
> migrate each project independently so that it's team could coordinate 
> when they migrated to SVN.  As such, I dumped each project when ready 
> and then svnadmin loaded each dump into it's own path/root (so as not to 
> overwrite anything previously loaded and unrelated to this project's 
> import).
> 
> So, you can do it by controlling which path/portion of CVS you use 
> cvs2vn to create the dump file from.

The CVS repository in question (with the size 54M with 17751 files) is
exactly one project. It's the history of a geographical DNS zone for
more than 10 years.

-- 
Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
sip:sudakov@sibptus.tomsk.ru

Re: "svnadmin load" a huge file

Posted by Brian Brophy <br...@gmail.com>.
I migrated a large CVS repository (25-50 GB) to SVN years ago on SVN 
1.3.  Our repo had many sections (projects) within it.  We had to 
migrate each project independently so that it's team could coordinate 
when they migrated to SVN.  As such, I dumped each project when ready 
and then svnadmin loaded each dump into it's own path/root (so as not to 
overwrite anything previously loaded and unrelated to this project's 
import).

So, you can do it by controlling which path/portion of CVS you use 
cvs2vn to create the dump file from.

Brian


Victor Sudakov wrote:
> Daniel Shahaf wrote:
>   
>> Split the dumpfile to smaller dumpfiles 
>>     
>
> How do I do that? I have not found such an option in cvs2svn. 
> I don't mind writing a script if I knew the idea how to split the dump.
> I haven't found any "svnadmin load" option to import part of a dump
> either. man what?
>
>   
>> or try a newer version of svnadmin.
>>     
>
> I am using subversion-1.6.15, it seems to be the latest ported to
> FreeBSD.
>
>   
>> (or dive into the source and help us plug that memory leak --- compile
>> with APR pool debugging enabled)
>>     
>
> I will try to do that but unfortunately I need some immediate
> workaround :(
>
>   

Re: "svnadmin load" a huge file

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
> On 31 Dec 2010 15:10, "Victor Sudakov" <su...@sibptus.tomsk.ru> wrote:
> > Daniel Shahaf wrote:
> >> (or dive into the source and help us plug that memory leak --- compile
> >> with APR pool debugging enabled)
> >
> > I will try to do that but unfortunately I need some immediate
> > workaround :(
> >

Thanks.  I was referring specifically to -DAPR_POOL_DEBUG=19; there's
more information in HACKING:
http://subversion.apache.org/docs/community-guide/


Stephen Connolly wrote on Fri, Dec 31, 2010 at 16:22:56 +0000:
> Google is your friend: svndumptool
> 

That's a better suggestion than I would have made.

Re: "svnadmin load" a huge file

Posted by Stephen Connolly <st...@gmail.com>.
Google is your friend: svndumptool

You moght need to append a .py

Also if this is a _top post_ it's three phone what done it... Haven't
figured out how to control where it puts the reply

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 31 Dec 2010 15:10, "Victor Sudakov" <su...@sibptus.tomsk.ru> wrote:
> Daniel Shahaf wrote:
>> Split the dumpfile to smaller dumpfiles
>
> How do I do that? I have not found such an option in cvs2svn.
> I don't mind writing a script if I knew the idea how to split the dump.
> I haven't found any "svnadmin load" option to import part of a dump
> either. man what?
>
>> or try a newer version of svnadmin.
>
> I am using subversion-1.6.15, it seems to be the latest ported to
> FreeBSD.
>
>>
>> (or dive into the source and help us plug that memory leak --- compile
>> with APR pool debugging enabled)
>
> I will try to do that but unfortunately I need some immediate
> workaround :(
>
> --
> Victor Sudakov, VAS4-RIPE, VAS47-RIPN
> sip:sudakov@sibptus.tomsk.ru <si...@sibptus.tomsk.ru>

Re: "svnadmin load" a huge file

Posted by Victor Sudakov <su...@sibptus.tomsk.ru>.
Daniel Shahaf wrote:
> Split the dumpfile to smaller dumpfiles 

How do I do that? I have not found such an option in cvs2svn. 
I don't mind writing a script if I knew the idea how to split the dump.
I haven't found any "svnadmin load" option to import part of a dump
either. man what?

> or try a newer version of svnadmin.

I am using subversion-1.6.15, it seems to be the latest ported to
FreeBSD.

> 
> (or dive into the source and help us plug that memory leak --- compile
> with APR pool debugging enabled)

I will try to do that but unfortunately I need some immediate
workaround :(

-- 
Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
sip:sudakov@sibptus.tomsk.ru

Re: "svnadmin load" a huge file

Posted by Daniel Shahaf <d....@daniel.shahaf.name>.
Split the dumpfile to smaller dumpfiles or try a newer version of svnadmin.

(or dive into the source and help us plug that memory leak --- compile
with APR pool debugging enabled)

Victor Sudakov wrote on Fri, Dec 31, 2010 at 09:07:32 +0600:
> Colleagues, 
> 
> I have a CVS repository sized 54M with 17751 files. 
> 
> "cvs2svn --dumpfile" produces a dump sized 13G. svnadmin cannot load
> this dump aborting with an out of memory condition on a FreeBSD
> 8.1-RELEASE box with 1G of RAM and 2.5G of swap.
> 
> I really need to convert this repository to SVN. What should i do? Any
> advice is appreciated.
> 
> -- 
> Victor Sudakov,  VAS4-RIPE, VAS47-RIPN
> sip:sudakov@sibptus.tomsk.ru