You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Nathan Sharp <sp...@phoenix-int.com> on 2002/12/18 14:02:07 UTC

Large repository crashing cvs2svn.py and possible memory leak?

I'm currently testing out subversion on a test box here (RedHat 8.0) and 
have been very impressed so far.  I'm quite excited to employ it on a 
real project, but am having problems importing our large CVS repository 
into it.  Our CVS repository goes back to 1996 and is 2.2GB on disk. 
 The cvs2svn.py script successfully runs through the first thee passes 
but fails with a segementation fault in the fourth pass.  The repository 
it generates is valid up until the point it crashes (somewhere in 1999) 
and it doesn't seem to crash at the same spot if I re-run it.  One thing 
I noticed was that as the script runs, it takes increasingly more and 
more memory as it goes, was up to almost 400Meg last I checked before it 
crashed.  I'm suspicious that perhaps it just runs the computer out of 
memory (it isn't a real powerful box, it is just for testing), but I 
don't have any real evidence to that fact.  
I'm running:
svn HEAD as of a couple of days ago
python 2.2.1-17 RPM
swig 1.3.16 from tarball
viewcvs HEAD as of a couple of days ago
Berkely db 4.0.14-14 RPM


Any advice for debugging this?  I ran the script w/ a -v and the reports 
it generates seem O.K. up until it crashes, at which point the output 
ends abruptly mid-line w/ no other errors.  The shell reports the 
segmentation fault.  No core file is generated.  

Thanks again!
  Nathan


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [PATCH] Re: Large repository crashing cvs2svn.py and possible memory leak?

Posted by Nathan Sharp <sp...@phoenix-int.com>.
Sorry for the long delay getting back.  I finally got around to 
compiling in your patch and it doesn't appear to help - although I'm now 
getting "obstructed updates" even after doing a clean co, so I haven't 
been able to run a job to completion.  It is still taking up near 100M 
of memory and growing fast before it stops.  It seems to me that the 
leak is related to reading the local working copy files, not with 
accessing the repository, and that whether I'm using a local file:/// or 
http:// URL for the repository makes no difference.  Win32 vs Unix also 
seems to not matter.  These operations have given me trouble:

cvs2svn.py using local repository
pset whether fs or http (but that doesn't access the repos...)
commit whether fs or http.

These operations seem to perhaps have a very small leak, but small 
enough to not matter for even my 2G repository.

revert
cleanup
co

I haven't had a chance yet to try the mmacek branch.  I'll see if I can 
get to it this weekend (as well as trying the latest head).

Thanks for the reply!  Hopefully we can resolve this soon.

  Nathan

P.S. Regarding the obstructed updates:  Yes I've tried a svnadmin 
recover as well as an svn cleanup.  Neither help. I think the problem is 
related to the memory leak, though, because if I go run the command I 
was running and filed on the file which failed individually, it works. 
 It only fails when doing a large recursive call to set a lot of stuff.

Marko Macek wrote:

> Nathan Sharp wrote:
>
>> I am about 2 days old to subversion, I wasn't even aware that there 
>> was a branch with anything I might be interested in.  As Donald said, 
>> yes, I am using the trunk.
>> I experimented further (thanks to some help I got on the IRC channel) 
>> and think I have a workaround now.  After running passes 1-3 (and 
>> forcibly preventing 4 from running), I took the cvs2svn-data.s-revs 
>> file and chopped it into files with 20,000 lines each.  By running 
>> the script on pass 4 on each file in order, I seem to be able to run 
>> successfully, which proves that the problem a) is a memory leak and 
>> b) was failing because it ran my system out of memory.  The only 
>> negative effect of what I did is that right where I split the files 
>> (since I just did exactly 20k line files and didn't manually split 
>> the files up at a natural commit break) I will end up with an commit 
>> which is split in 2, which is minor enough for me not to worry about it.
>> The general belief I heard on the IRC channel is that the memory leak 
>> is probably in the swig bindings and not in the cvs2svn.py script 
>> itself.  I'd be happy to help in any way possible if someone wants to 
>> try and fix it.
>>
> Please try the following patch to subversion.
>




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

[PATCH] Re: Large repository crashing cvs2svn.py and possible memory leak?

Posted by Marko Macek <Ma...@gmx.net>.
Nathan Sharp wrote:

> I am about 2 days old to subversion, I wasn't even aware that there 
> was a branch with anything I might be interested in.  As Donald said, 
> yes, I am using the trunk.
> I experimented further (thanks to some help I got on the IRC channel) 
> and think I have a workaround now.  After running passes 1-3 (and 
> forcibly preventing 4 from running), I took the cvs2svn-data.s-revs 
> file and chopped it into files with 20,000 lines each.  By running the 
> script on pass 4 on each file in order, I seem to be able to run 
> successfully, which proves that the problem a) is a memory leak and b) 
> was failing because it ran my system out of memory.  The only negative 
> effect of what I did is that right where I split the files (since I 
> just did exactly 20k line files and didn't manually split the files up 
> at a natural commit break) I will end up with an commit which is split 
> in 2, which is minor enough for me not to worry about it.
> The general belief I heard on the IRC channel is that the memory leak 
> is probably in the swig bindings and not in the cvs2svn.py script 
> itself.  I'd be happy to help in any way possible if someone wants to 
> try and fix it.
>
Please try the following patch to subversion.

Index: subversion/libsvn_fs/bdb/nodes-table.c
===================================================================
--- subversion/libsvn_fs/bdb/nodes-table.c      (revision 4167)
+++ subversion/libsvn_fs/bdb/nodes-table.c      (working copy)
@@ -146,6 +146,7 @@
          "successor id `%s' (for `%s') already exists in filesystem %s", 
          new_id_str->data, id_str->data, fs->path);
     }
+  if (err) svn_error_clear(err);
 
   /* Return the new node revision ID. */
   *successor_p = new_id;


If you can, please also test /branches/cvs2svn-mmacek from subversion 
repository (it has some bugfixes in addition to basic branch and tag 
conversion support)

MArk


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Large repository crashing cvs2svn.py and possible memory leak?

Posted by Nathan Sharp <ns...@phoenix-int.com>.
I am about 2 days old to subversion, I wasn't even aware that there was 
a branch with anything I might be interested in.  As Donald said, yes, I 
am using the trunk.
I experimented further (thanks to some help I got on the IRC channel) 
and think I have a workaround now.  After running passes 1-3 (and 
forcibly preventing 4 from running), I took the cvs2svn-data.s-revs file 
and chopped it into files with 20,000 lines each.  By running the script 
on pass 4 on each file in order, I seem to be able to run successfully, 
which proves that the problem a) is a memory leak and b) was failing 
because it ran my system out of memory.  The only negative effect of 
what I did is that right where I split the files (since I just did 
exactly 20k line files and didn't manually split the files up at a 
natural commit break) I will end up with an commit which is split in 2, 
which is minor enough for me not to worry about it.
The general belief I heard on the IRC channel is that the memory leak is 
probably in the swig bindings and not in the cvs2svn.py script itself.  
I'd be happy to help in any way possible if someone wants to try and fix it.

  Nathan

Branko Čibej wrote:

>Which cvs2svn.py script are you using? The one from thr trunk, or the
>one from /branches/cvs2svn-mmacek?
>
>Nathan Sharp wrote:
>
>  
>
>>I'm currently testing out subversion on a test box here (RedHat 8.0)
>>and have been very impressed so far.  I'm quite excited to employ it
>>on a real project, but am having problems importing our large CVS
>>repository into it.  Our CVS repository goes back to 1996 and is 2.2GB
>>on disk. The cvs2svn.py script successfully runs through the first
>>thee passes but fails with a segementation fault in the fourth pass. 
>>The repository it generates is valid up until the point it crashes
>>(somewhere in 1999) and it doesn't seem to crash at the same spot if I
>>re-run it.  One thing I noticed was that as the script runs, it takes
>>increasingly more and more memory as it goes, was up to almost 400Meg
>>last I checked before it crashed.  I'm suspicious that perhaps it just
>>runs the computer out of memory (it isn't a real powerful box, it is
>>just for testing), but I don't have any real evidence to that fact. 
>>I'm running:
>>svn HEAD as of a couple of days ago
>>python 2.2.1-17 RPM
>>swig 1.3.16 from tarball
>>viewcvs HEAD as of a couple of days ago
>>Berkely db 4.0.14-14 RPM
>>
>>
>>Any advice for debugging this?  I ran the script w/ a -v and the
>>reports it generates seem O.K. up until it crashes, at which point the
>>output ends abruptly mid-line w/ no other errors.  The shell reports
>>the segmentation fault.  No core file is generated. 
>>Thanks again!
>> Nathan
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
>>For additional commands, e-mail: dev-help@subversion.tigris.org
>>
>>    
>>
>
>
>  
>




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Large repository crashing cvs2svn.py and possible memory leak?

Posted by Branko Čibej <br...@xbc.nu>.
Which cvs2svn.py script are you using? The one from thr trunk, or the
one from /branches/cvs2svn-mmacek?

Nathan Sharp wrote:

> I'm currently testing out subversion on a test box here (RedHat 8.0)
> and have been very impressed so far.  I'm quite excited to employ it
> on a real project, but am having problems importing our large CVS
> repository into it.  Our CVS repository goes back to 1996 and is 2.2GB
> on disk. The cvs2svn.py script successfully runs through the first
> thee passes but fails with a segementation fault in the fourth pass. 
> The repository it generates is valid up until the point it crashes
> (somewhere in 1999) and it doesn't seem to crash at the same spot if I
> re-run it.  One thing I noticed was that as the script runs, it takes
> increasingly more and more memory as it goes, was up to almost 400Meg
> last I checked before it crashed.  I'm suspicious that perhaps it just
> runs the computer out of memory (it isn't a real powerful box, it is
> just for testing), but I don't have any real evidence to that fact. 
> I'm running:
> svn HEAD as of a couple of days ago
> python 2.2.1-17 RPM
> swig 1.3.16 from tarball
> viewcvs HEAD as of a couple of days ago
> Berkely db 4.0.14-14 RPM
>
>
> Any advice for debugging this?  I ran the script w/ a -v and the
> reports it generates seem O.K. up until it crashes, at which point the
> output ends abruptly mid-line w/ no other errors.  The shell reports
> the segmentation fault.  No core file is generated. 
> Thanks again!
>  Nathan
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
>


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org