You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by Matt Pounsett <ma...@cira.ca> on 2004/09/07 23:21:29 UTC

Lockups on large initial import (was: Re: Failures importing binaries)

Okay.. I finally got the time to look at this again, and here's what  
I've got.

Running Joe Orton's subversion-1.0.6 and mod_dav_svn-1.0.6 RPMs on  
RedHat Edge Server 3.x (fully updated).  The repository in question is  
accessed via https using Apache (RedHat's httpd-2.0.46-38.ent).  The  
https certificate is self-signed (probably irrelevant, but what the  
hay).  I've changed the default LANG environment variable from  
en_US.UTF-8 to just en_US, and have removed the "AddDefaultCharset  
UTF-8" which was set by default in the httpd.conf ... I did this to  
remove problems we were having with ISO-8859-1 encoded files which our  
web developer's tools output.

The web site in question is approximately 1.4GB of data spread over  
~17000 files.  It's a mixture of HTML, a small number of images  
(decoration and buttons, nothing big), Word docs, PDFs and raw text.   
The largest file is 18M, and the smallest is 0 bytes.

I've tried doing my initial import using two basic methods.  Both  
methods fail consistently, but exactly how they fail seems to vary.

First:
   svnadmin create
   svn import

This method fails in one of two ways:
   1) svn locks up.  An strace I ran on one iteration showed it was  
waiting for a write() call to complete.  Apache logs a PUT and  
PROPPATCH for the second-to-last file displayed by the svn client.  In  
this state, svn doesn't appear to respond to any keyboard input, and so  
far after 20 minutes it hasn't timed out and died on its own.  strace  
shows it attempting to handle a ^C, but the sighandler doesn't cause it  
to exit.  So far I've only been able to get svn to exit by sending it a  
sig kill.

   2) svn dies with an error like the following:
svn: PUT of  
/web-www/!svn/wrk/9384d977-87e3-0310-9c18-e8a479a54b6e/www/fr/webcast/ 
2002/2002.05.28/fcir020528-avs/msh-jm.htm: Could not read status line:  
Connection reset by peer (https://svn.cira.ca)

Regardless of which way it dies, this method leaves a large number of  
log.NNNN files in the db directory (anywhere from dozens to thousands,  
depending on where in the import it died), and causes the repository to  
be unusable.  Subsequent attempts to access the repository for any type  
of transaction result in a timeout after several minutes.. it must be  
deleted and re-created with svnadmin.


Second:
   svn create
   svn checkout
   cp <orig_source> <workdir>
   svn add *
   svn commit

The commit makes it past all the "Adding.." lines and into Transmitting  
file data.  This runs for some period of time, then appears to hang for  
a while (one strace showed it waiting on a select() call), and  
eventually dies with an error like the following:
svn: Commit failed (details follow):
svn: At least one property change failed; repository is unchanged

The last entry logged by Apache in this case is a successful PUT.

I've also received a timeout message here.. though my tests today  
haven't produced one so I haven't got a capture of the exact text.

This also appears to leave the database in an unusable state.   
Sometimes this leaves only a single log.NNN file, sometimes a couple  
dozen.


Of note in all this is that I can regularly import this web site using  
a file:// URL... however, putting the resulting repository behind  
Apache and trying to do a checkout results in a similar set of  
failures.
When the db gets locked up, I've tried running svnadmin recover on it,  
and that locks up on an fcntl64() call on locks/db.lock

Because this works fine using only svn with a file:// URL, and because  
this web server allows me to read and write many large files using DAV,  
I'm inclined to think the problem is in mod_dav_svn somewhere.

I'm at a bit of an impasse now.  I can't think of anything else to try  
or test.  Is there any other information I can provide, or specific  
troubleshooting tasks I can perform to help one of the developers track  
this down?

Thanks (especially for reading all the way to here)..
    Matt

Matt Pounsett                 CIRA - Canadian Internet Registration  
Authority
Technical Support Programmer                    350 Sparks Street,  
Suite 1110
matt.pounsett@cira.ca                                 Ottawa, Ontario,  
Canada
613.237.5335 ext. 231                                       
http://www.cira.ca


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Lockups on large initial import

Posted by Matt Pounsett <ma...@cira.ca>.
Another self-followup:

On Sep 10, 2004, at 21:00, Matt Pounsett wrote:

> I re-ran my tests today using svnserve instead of Apache, and had no 
> problem.  I moved back to Apache and got this during the Transferring 
> Data phase:
> svn: Commit failed (details follow):
> svn: At least one property change failed; repository is unchanged
>
> ... and the db is now locked up.
>
> I ran your script over an http connection and got a failure as well.
>
> Are any of the developers watching this thread?  This seems pretty 
> consistently repeatable.. is there any further troubleshooting you'd 
> like us to do to help track down the problem?

More info on this particular issue.

I managed to catch another "At least one property change failed" error 
this morning, and here's what I found.  It appears to me as if the 
problem has something to do with mod_dav_svn failing to remove db locks 
as Apache expires its children and HTTP requests get passed off to new 
children.

After getting the error, I checked on my httpd processes, and found a 
few of them waiting for futex calls to return.  Now here's where it 
gets interesting... (note for completeness that I'm the only one 
accessing this httpd at the moment, and I'm only running one svn client 
at a time):

% lsof __db*
COMMAND   PID   USER  FD   TYPE DEVICE   SIZE    NODE NAME
httpd   12883 apache mem    REG    9,0  16384 3597498 __db.005
httpd   12883 apache mem    REG    9,0 917504 3597497 __db.004
httpd   12883 apache mem    REG    9,0 327680 3597496 __db.003
httpd   12883 apache mem    REG    9,0 278528 3597495 __db.002
httpd   12883 apache mem    REG    9,0  16384 3597494 __db.001
httpd   12886 apache mem    REG    9,0  16384 3597498 __db.005
httpd   12886 apache mem    REG    9,0 917504 3597497 __db.004
httpd   12886 apache mem    REG    9,0 327680 3597496 __db.003
httpd   12886 apache mem    REG    9,0 278528 3597495 __db.002
httpd   12886 apache mem    REG    9,0  16384 3597494 __db.001
httpd   12889 apache mem    REG    9,0  16384 3597498 __db.005
httpd   12889 apache mem    REG    9,0 917504 3597497 __db.004
httpd   12889 apache mem    REG    9,0 327680 3597496 __db.003
httpd   12889 apache mem    REG    9,0 278528 3597495 __db.002
httpd   12889 apache mem    REG    9,0  16384 3597494 __db.001

% strace -p 12883 -p 12886 -p 12889
Process 12883 attached - interrupt to quit
Process 12886 attached - interrupt to quit
Process 12889 attached - interrupt to quit
[pid 12883] futex(0xb6eca2b0, FUTEX_WAIT, 2, NULL <unfinished ...>
[pid 12886] futex(0xb6f0d5c8, FUTEX_WAIT, 2, NULL <unfinished ...>
[pid 12889] futex(0xb6f0d5c8, FUTEX_WAIT, 2, NULL <unfinished ...>
Process 12883 detached
Process 12886 detached
Process 12889 detached

So the only httpd processes accessing the database are all waiting for 
some other process to release a lock.


Matt Pounsett                   Canadian Internet Registration Authority
Technical Support Programmer               350 Sparks Street, Suite 1110
matt.pounsett@cira.ca                            Ottawa, Ontario, Canada
613.237.5335 ext. 231                                 http://www.cira.ca


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Lockups on large initial import

Posted by Matt Pounsett <ma...@cira.ca>.
On Sep 08, 2004, at 04:27, Alex R. Mosteo wrote:

> Matt Pounsett wrote:
>
> [big and interesting snip]
>
>> Because this works fine using only svn with a file:// URL, and 
>> because  this web server allows me to read and write many large files 
>> using DAV,  I'm inclined to think the problem is in mod_dav_svn 
>> somewhere.
>> I'm at a bit of an impasse now.  I can't think of anything else to 
>> try  or test.  Is there any other information I can provide, or 
>> specific  troubleshooting tasks I can perform to help one of the 
>> developers track  this down?
>
> I agree with your conclusions, but must be noted that someone else 
> posted the other day a description that could match this problem using 
> svnserve. So maybe it is a race condition inside BDB or something 
> else.

I re-ran my tests today using svnserve instead of Apache, and had no 
problem.  I moved back to Apache and got this during the Transferring 
Data phase:
svn: Commit failed (details follow):
svn: At least one property change failed; repository is unchanged

... and the db is now locked up.

I ran your script over an http connection and got a failure as well.

Are any of the developers watching this thread?  This seems pretty 
consistently repeatable.. is there any further troubleshooting you'd 
like us to do to help track down the problem?

Matt Pounsett                 CIRA - Canadian Internet Registration 
Authority
Technical Support Programmer                    350 Sparks Street, 
Suite 1110
matt.pounsett@cira.ca                                 Ottawa, Ontario, 
Canada
613.237.5335 ext. 231                                      
http://www.cira.ca


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Lockups on large initial import

Posted by "Alex R. Mosteo" <al...@mosteo.com>.
Alex R. Mosteo wrote:
> Lee Merrill wrote:
> 
>> Just FYI, I am using RedHat 7.3, neon 0.24-7-1, Apache 2.0.50, and 
>> Berkeley DB 4.2.52.NC.
> 
> 
> Thanks, Lee. We have several differences:
> 
> svn 1.0.6 and BDB 4.1.25 on Mandrake 10
> 
> These are the highest versions prepackaged for M10. It seems the culprit 
> is one of these.
> 
> If I get the time will upgrade manually to BDB 4.2 (after a dump ;) and 
> retest. I'll inform if so.

Could this patch in the Berkeley homepage have something to do?

"Long-running applications can hang in the Berkeley DB cache.
Apply the following patch to the db-4.2.52 release."

I'll try with and without the patch.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Lockups on large initial import

Posted by "Alex R. Mosteo" <al...@mosteo.com>.
Lee Merrill wrote:
> Just FYI, I am using RedHat 7.3, neon 0.24-7-1, Apache 2.0.50, and 
> Berkeley DB 4.2.52.NC.

Thanks, Lee. We have several differences:

svn 1.0.6 and BDB 4.1.25 on Mandrake 10

These are the highest versions prepackaged for M10. It seems the culprit 
is one of these.

If I get the time will upgrade manually to BDB 4.2 (after a dump ;) and 
retest. I'll inform if so.

Kind regards,

A. Mosteo.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Lockups on large initial import

Posted by Lee Merrill <Le...@bustech.com>.
Hi Alex,

    I tried your script, with Subversion 1.10, and it worked fine, with 
both Berkeley DB, and with the new fsfs filesystem, using http access. 
So maybe this has been fixed, in Subversion, or in the version of 
Apache/Berkeley/Linux/etc. that I am using.

Just FYI, I am using RedHat 7.3, neon 0.24-7-1, Apache 2.0.50, and 
Berkeley DB 4.2.52.NC.

Lee

> My experience has been the same, with both failure types, also using 
> Apache.
>
> I agree with your conclusions, but must be noted that someone else 
> posted the other day a description that could match this problem using 
> svnserve. So maybe it is a race condition inside BDB or something else.
>
> I wanted to do a test but was unable, maybe you want to try it: to use 
> a fsfs repository.
>
>


-- 
+=========================================================
+ Lee Merrill    lee@bustech.com    919-866-2008
+=========================================================

Unless otherwise stated, any views presented in this email are solely those of the author and do not necessarily represent those of the company.



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org

Re: Lockups on large initial import

Posted by "Alex R. Mosteo" <al...@mosteo.com>.
Matt Pounsett wrote:

[big and interesting snip]

> Because this works fine using only svn with a file:// URL, and because  
> this web server allows me to read and write many large files using DAV,  
> I'm inclined to think the problem is in mod_dav_svn somewhere.
> 
> I'm at a bit of an impasse now.  I can't think of anything else to try  
> or test.  Is there any other information I can provide, or specific  
> troubleshooting tasks I can perform to help one of the developers track  
> this down?

Thanks Matt for putting it together so clearly. My experience has been 
the same, with both failure types, also using Apache.

I agree with your conclusions, but must be noted that someone else 
posted the other day a description that could match this problem using 
svnserve. So maybe it is a race condition inside BDB or something else.

I wanted to do a test but was unable, maybe you want to try it: to use a 
fsfs repository. I've been unable to compile the latest RC so I'm stuck 
in 1.0.6 until a .rpm is available. If you do this test, please report 
on it.

I'll repost my test script just in case someone else want to stress his 
configuration. I suppose that with minor changes it could be used with 
svnserve or directly using file://...

(Note that it needs plenty of disk space available).

----8<-----------

#!/bin/sh

# test_data is a folder with randomly created files
rm -rf test_data
mkdir test_data

filesize=10000  # Adjust this filesize for different experiences
x=$((100000000 / filesize))

echo "Creating binaries..."
while ((x)); do
    head -c $filesize /dev/urandom > test_data/random-$x.bin;
    x=$((x - 1));
    echo Remaining to be created: $x...
done

# svn-test is the folder with the repository
sudo rm -rf svn-test
sudo mkdir svn-test
sudo svnadmin create svn-test
sudo chown nobody:nobody -R svn-test  # nobody is my apache user
echo "Beginning import..."
svn import test_data http://your.server.here/svn-test/test_data -m ""
echo "Verifying..."
sudo svnadmin verify svn-test
echo "Done."


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@subversion.tigris.org
For additional commands, e-mail: users-help@subversion.tigris.org