You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by "C.A.T.Magic" <c....@gmx.at> on 2004/06/08 13:45:16 UTC

fsfs still very unstable with simultaneous commits

Hi

I just read your 1.1 release proposal and now try to hurry
to inform you that there are -still- several severe issues
with the new fsfs when doing multiple commits at the same time
on win32  svn, version 1.0.4 (r9844)  windows xp sp1/latest patches.

the problem is the same as discussed a while ago:
use 1 repos, 2 or more working dirs,
simultaneously loop over a modify-commit
from both working dirs resulting in weird errors like
the following:


Sending        Version2.h
Transmitting file data .svn: Commit failed (details follow):
svn: Invalid diff stream: insn 2 has non-positive length


Sending        Version2.h
svn: Commit failed (details follow):
svn: Reference to non-existent node '0.0.t9-1' in filesystem 
'X:/SVNSandbox/Test
FS/Repos/db'


============== (again tried with another repos)

Sending        Version.h
Transmitting file data .svn: Commit failed (details follow):
svn: Checksum mismatch for resulting fulltext
(Version.h):
    expected checksum:  8ddd214ec052d5ee4c652996e1c0a123
    actual checksum:    00997804605b81104cc9a1b18c783031


Sending        Version.h
Transmitting file data .svn: Commit failed (details follow):
svn: Can't remove 'x:/svnsandbox/testfs/repos/db/transactions/92-1.txn': 
The directory is not empty.


Sending        Version.h
Transmitting file data .svn: Commit failed (details follow):
svn: Can't remove 'x:/svnsandbox/testfs/repos/db/transactions/92-1.txn': 
The directory is not empty.
Sending        Version.h
svn: Commit failed (details follow):
svn: Out of date: 'Version.h' in transaction '93-1.txn-2'


Sending        Version2.h
svn: Commit failed (details follow):
svn: Found malformed header in revision file


==================================


so all this doesnt sound very good.
especially after the last message happens, the repository is unusable.

also the recovery procedure doesn't really work:


svnadmin verify reports:
* Verified revision 27.
svn: Found malformed header in revision file

svnadmin recover reports:
Recovery completed.
The latest repos revision is 93.


======
c.a.t.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: fsfs still very unstable with simultaneous commits

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
C.A.T.Magic wrote:

> 
> Hi
> 
> I just read your 1.1 release proposal and now try to hurry
> to inform you that there are -still- several severe issues
> with the new fsfs when doing multiple commits at the same time
> on win32  svn, version 1.0.4 (r9844)  windows xp sp1/latest patches.
> 
> the problem is the same as discussed a while ago:
> use 1 repos, 2 or more working dirs,
> simultaneously loop over a modify-commit
> from both working dirs resulting in weird errors like
> the following:
> 
> 
> Sending        Version2.h
> Transmitting file data .svn: Commit failed (details follow):
> svn: Invalid diff stream: insn 2 has non-positive length
> 
> 
> Sending        Version2.h
> svn: Commit failed (details follow):
> svn: Reference to non-existent node '0.0.t9-1' in filesystem 
> 'X:/SVNSandbox/Test
> FS/Repos/db'
> 
> 
> ============== (again tried with another repos)
> 
> Sending        Version.h
> Transmitting file data .svn: Commit failed (details follow):
> svn: Checksum mismatch for resulting fulltext
> (Version.h):
>    expected checksum:  8ddd214ec052d5ee4c652996e1c0a123
>    actual checksum:    00997804605b81104cc9a1b18c783031
> 
> 
> Sending        Version.h
> Transmitting file data .svn: Commit failed (details follow):
> svn: Can't remove 'x:/svnsandbox/testfs/repos/db/transactions/92-1.txn': 
> The directory is not empty.
> 
> 
> Sending        Version.h
> Transmitting file data .svn: Commit failed (details follow):
> svn: Can't remove 'x:/svnsandbox/testfs/repos/db/transactions/92-1.txn': 
> The directory is not empty.
> Sending        Version.h
> svn: Commit failed (details follow):
> svn: Out of date: 'Version.h' in transaction '93-1.txn-2'
> 
> 
> Sending        Version2.h
> svn: Commit failed (details follow):
> svn: Found malformed header in revision file
> 
> 
> ==================================
> 
> 
> so all this doesnt sound very good.
> especially after the last message happens, the repository is unusable.

Note that this is /not/ win32 specific.  I was able to reproduce on 
FreeBSD with stress.pl.  I haven't seen the 'malformed header' stuff 
yet, but I did get the checksum mismatch part, so something is wrong.

> also the recovery procedure doesn't really work:
> 
> 
> svnadmin verify reports:
> * Verified revision 27.
> svn: Found malformed header in revision file
> 
> svnadmin recover reports:
> Recovery completed.
> The latest repos revision is 93.

Well, "not work" isn't really the right word for it.  svnadmin recover 
doesn't really do anything at all for an fsfs repository.  In theory it 
isn't needed.  Of course theory and practice appear to be slightly 
different in this case...

-garrett

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: fsfs still very unstable with simultaneous commits

Posted by Greg Hudson <gh...@MIT.EDU>.
On Wed, 2004-06-09 at 19:41, C.A.T.Magic wrote:
> one question on performance, arising from my test-scripts:
> a commit of a single 1 kb file on an almost empty repository
> takes about 2 seconds to complete --
>    what is causing this slowness?
>    is it the buffer-flush to disk?
> is there anything (conf option) to speed up the test procedures,
> e.g. to disable the flush?

I don't know what's causing this slowness; you'd have to profile.  But
one possible culprit for some of it is svn_sleep_for_timestamps().


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: fsfs still very unstable with simultaneous commits

Posted by "C.A.T.Magic" <c....@gmx.at>.
Tobias Ringström wrote:

> C.A.T.Magic wrote:
> 
>> I just read your 1.1 release proposal and now try to hurry
>> to inform you that there are -still- several severe issues
>> with the new fsfs when doing multiple commits at the same time
>> on win32  svn, version 1.0.4 (r9844)  windows xp sp1/latest patches.
> 
> 
> I believe this was fixed in r9939. It turned out to be independant of 
> the platform, but it would be great if you or someone else could retest 
> on Win32 just to be sure that there aren't other issues as well. I guess 
> nobody has been running stress.pl on fs_fs a lot before.

yes 9939 fixed it (again).
i tried 9938 (reproduces), 9939 (ok) and head 9943 (ok).


(but the repository broken by 9938 remains unusable
with various errors - also for 9939)


========


one question on performance, arising from my test-scripts:
a commit of a single 1 kb file on an almost empty repository
takes about 2 seconds to complete --
   what is causing this slowness?
   is it the buffer-flush to disk?
is there anything (conf option) to speed up the test procedures,
e.g. to disable the flush?


======
c.a.t.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: fsfs still very unstable with simultaneous commits

Posted by Tobias Ringström <to...@ringstrom.mine.nu>.
Greg Hudson wrote:

>On Wed, 2004-06-09 at 12:29, Tobias Ringström wrote:
>  
>
>>I believe this was fixed in r9939. It turned out to be independant of 
>>the platform, but it would be great if you or someone else could retest 
>>on Win32 just to be sure that there aren't other issues as well. I guess 
>>nobody has been running stress.pl on fs_fs a lot before.
>>    
>>
>Josh has done some stress.pl tests in the past, but I broke that code
>(the thing you fixed in r9939) more recently than that.
>  
>
Ahh, thanks for clarifying that. The quality of the work on fsfs (and 
the resulting code) has been (and is) truly outstanding, so I could not 
really believe that there had been no stress testing.

/Tobias


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: fsfs still very unstable with simultaneous commits

Posted by Greg Hudson <gh...@MIT.EDU>.
On Wed, 2004-06-09 at 12:29, Tobias Ringström wrote:
> I believe this was fixed in r9939. It turned out to be independant of 
> the platform, but it would be great if you or someone else could retest 
> on Win32 just to be sure that there aren't other issues as well. I guess 
> nobody has been running stress.pl on fs_fs a lot before.

Josh has done some stress.pl tests in the past, but I broke that code
(the thing you fixed in r9939) more recently than that.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: fsfs still very unstable with simultaneous commits

Posted by Tobias Ringström <to...@ringstrom.mine.nu>.
C.A.T.Magic wrote:

> I just read your 1.1 release proposal and now try to hurry
> to inform you that there are -still- several severe issues
> with the new fsfs when doing multiple commits at the same time
> on win32  svn, version 1.0.4 (r9844)  windows xp sp1/latest patches.

I believe this was fixed in r9939. It turned out to be independant of 
the platform, but it would be great if you or someone else could retest 
on Win32 just to be sure that there aren't other issues as well. I guess 
nobody has been running stress.pl on fs_fs a lot before.

/Tobias


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: fsfs still very unstable with simultaneous commits

Posted by Garrett Rooney <ro...@electricjellyfish.net>.
Greg Hudson wrote:

> On Tue, 2004-06-08 at 17:04, Greg Hudson wrote:
> 
>>Please try compiling the attached source against APR, if you know how. 
> 
> 
> I made it especially challenging by not attaching the source.  Here it
> is.

I was able to reproduce the checksum difference error on Mac OS X in 
addition to FreeBSD...  The attached program does seem to work though, 
so as far as I can tell apr_file_lock is not the culprit.

I have not yet been able to reproduce any error that makes the 
repository unusable yet though...

-garrett


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: fsfs still very unstable with simultaneous commits

Posted by "D.J. Heap" <dj...@shadyvale.net>.
Greg Hudson wrote:
> On Tue, 2004-06-08 at 17:04, Greg Hudson wrote:
> 
>>Please try compiling the attached source against APR, if you know how. 
> 

I haven't experienced these problems (I haven't tested simultaneous 
committing much) but I tested this program on these datapoints and all 
worked fine (2nd instance waited until 1st released lock):

1. WinXP Local harddisk (with NTFS filesystem)

2. WinXP client to remote XFS filesystem (with Samba on Fedora Core 2)

3. WinXP client to remote NTFS filesystem (with WinXP server)

DJ

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: fsfs still very unstable with simultaneous commits

Posted by Greg Hudson <gh...@MIT.EDU>.
On Tue, 2004-06-08 at 17:04, Greg Hudson wrote:
> Please try compiling the attached source against APR, if you know how. 

I made it especially challenging by not attaching the source.  Here it
is.



Re: fsfs still very unstable with simultaneous commits

Posted by "C.A.T.Magic" <c....@gmx.at>.
Greg Hudson wrote:
> On Tue, 2004-06-08 at 09:45, C.A.T.Magic wrote:
> 
> Please try compiling the attached source against APR, if you know how. 
> Run it with the pathname of a file on the same storage facility as the
> repositories you're testing with.  (And let me know what kind of storage
> facility this is, while you're at it.)  The program simply opens a file,
> grabs a lock using apr_file_lock(), waits for a keypress, and exits. 
> Try running two instances of it with the same filename, and let me know
> if they both succeed in grabbing the lock at the same time.  If they do,
> that explains the problem, though not the solution.

the small test program works,
the second instance correctly waits for the first to release the lock.
i tried with 3 instances, and even different filename cases and its ok.

c.a.t.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: fsfs still very unstable with simultaneous commits

Posted by Greg Hudson <gh...@MIT.EDU>.
On Tue, 2004-06-08 at 09:45, C.A.T.Magic wrote:
> the problem is the same as discussed a while ago:
> use 1 repos, 2 or more working dirs,
> simultaneously loop over a modify-commit
> from both working dirs resulting in weird errors like
> the following:

> svn: Invalid diff stream: insn 2 has non-positive length
[etc.]

It looks like maybe two committers are trying to write to the same rev
file, and wacky antics ensue.  (The "Invalid diff stream" error probably
comes from trying to reconstruct the base for the delta being
constructed.)

Please try compiling the attached source against APR, if you know how. 
Run it with the pathname of a file on the same storage facility as the
repositories you're testing with.  (And let me know what kind of storage
facility this is, while you're at it.)  The program simply opens a file,
grabs a lock using apr_file_lock(), waits for a keypress, and exits. 
Try running two instances of it with the same filename, and let me know
if they both succeed in grabbing the lock at the same time.  If they do,
that explains the problem, though not the solution.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: fsfs still very unstable with simultaneous commits

Posted by Tobias Ringström <to...@ringstrom.mine.nu>.
Josh Pieper wrote:

>Greg,
>
>I am traveling now and not exactly able to debug this, but could it be
>that the unique directory creation code is failing somehow on Win32?
>i.e. two different processes creating a new transaction both get the
>same directory?
>  
>
It's been said before, but this is not a Win32 specific bug. I just 
reproduced on a Fedora Core 2 box (SMP kernel with one Intel P4/HT). 
With three instances of stress.pl it just takes a few seconds to get the 
checksum error. I'm debugging now.

/Tobias


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: fsfs still very unstable with simultaneous commits

Posted by Josh Pieper <jj...@pobox.com>.
D.J. Heap wrote:
> I can confirm that this quickly produces various errors on my local NTFS 
> harddisk on WinXP if two are run simultaneously, where one runs fine by 
> itself:
> 
> [stuff]

Greg,

I am traveling now and not exactly able to debug this, but could it be
that the unique directory creation code is failing somehow on Win32?
i.e. two different processes creating a new transaction both get the
same directory?

-Josh

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: fsfs still very unstable with simultaneous commits

Posted by "D.J. Heap" <dj...@shadyvale.net>.
C.A.T.Magic wrote:
> 
> 
> i created a few batch files to reproduce the effect.
> the test is using file:/// effect.
> i think it doesn't reproduce on remote repositories,
> because its essential that both commands run at the
> same time and same speed. rendom network delay may
> reduce the occourence.
> i test this on a hyperthreading cpu, and the effect is
> most noticeable when both commits run at the same frequency,
> but almost not reproducable if they run at different speeds.
[snip]
> 
> i hope this helps to reproduce the errors
> 
> =====
> c.a.t.

I can confirm that this quickly produces various errors on my local NTFS 
harddisk on WinXP if two are run simultaneously, where one runs fine by 
itself:


Sending        Version.h
Transmitting file data .svn: Commit failed (details follow):
svn: Invalid diff stream: insn 3 overflows the target view


Sending        Version.h
svn: Commit failed (details follow):
svn: Reference to non-existent node '0.0.t27-1' in filesystem 
'/Temp/TestFS/Repos/db'


Sending        Version2.h
Transmitting file data .svn: Commit failed (details follow):
svn: Checksum mismatch for resulting fulltext
(Version2.h):
    expected checksum:  375c8b0d2a12aaf855e0adfc3b722ed7
    actual checksum:    86e8fcb21febb7df17089dc11c32ede1



DJ

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: fsfs still very unstable with simultaneous commits

Posted by "C.A.T.Magic" <c....@gmx.at>.

i created a few batch files to reproduce the effect.
the test is using file:/// effect.
i think it doesn't reproduce on remote repositories,
because its essential that both commands run at the
same time and same speed. rendom network delay may
reduce the occourence.
i test this on a hyperthreading cpu, and the effect is
most noticeable when both commits run at the same frequency,
but almost not reproducable if they run at different speeds.


unzip the 3 files, adjust the file:///X:/Repos path
then start makefsfs.bat once.
this should create "Repos", "Work1", "Work2"
then open two consoles and start
   fill10000.bat from Work1
and
   fill20000.bat from Work2


note that there is a "svn cleanup" in the
file10000/fill2000 files, this is to make the loop
going on even after the svn client crashes
(which it does quite often in my tests)


i hope this helps to reproduce the errors

=====
c.a.t.


p.s: luckily the effect does not reproduce for a bdb repository.


Re: fsfs still very unstable with simultaneous commits

Posted by Mark Benedetto King <mb...@lowlatency.com>.
On Tue, Jun 08, 2004 at 03:45:16PM +0200, C.A.T.Magic wrote:
> 
> Hi
> 
> I just read your 1.1 release proposal and now try to hurry
> to inform you that there are -still- several severe issues
> with the new fsfs when doing multiple commits at the same time
> on win32  svn, version 1.0.4 (r9844)  windows xp sp1/latest patches.
> 

Just wondering, what options are you passing to svnserve?

--ben


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org