You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Greg Hudson <gh...@MIT.EDU> on 2004/05/08 23:57:46 UTC

Re: [bug] vtable=0x00000000 for the new FSFS when doing simultaneous commits

On Sat, 2004-05-08 at 19:24, C.A.T.Magic wrote:
>      svn_error_t *
>      svn_fs_abort_txn (svn_fs_txn_t *txn, apr_pool_t *pool)
>      {
>  >      return txn->vtable->abort (txn, pool);
>      }
> 
> 
>      txn->vtable = 0x00000000

Let me know if the following patch helps:

Index: subversion/libsvn_repos/commit.c
===================================================================
--- subversion/libsvn_repos/commit.c    (revision 9652)
+++ subversion/libsvn_repos/commit.c    (working copy)
@@ -129,6 +129,7 @@
   struct dir_baton *dirb;
   struct edit_baton *eb = edit_baton;
   svn_revnum_t youngest;
+  svn_fs_txn_t *txn;
 
   /* Ignore BASE_REVISION.  We always build our transaction against
      HEAD.  However, we will keep it in our dir baton for out of
@@ -137,12 +138,13 @@
 
   /* Begin a subversion transaction, cache its name, and get its
      root object. */
-  SVN_ERR (svn_repos_fs_begin_txn_for_commit (&(eb->txn), 
+  SVN_ERR (svn_repos_fs_begin_txn_for_commit (&txn,
                                               eb->repos, 
                                               youngest,
                                               eb->user, 
                                               eb->log_msg,
                                               eb->pool));
+  eb->txn = txn;
   SVN_ERR (svn_fs_txn_root (&(eb->txn_root), eb->txn, eb->pool));
   SVN_ERR (svn_fs_txn_name (&(eb->txn_name), eb->txn, eb->pool));
   


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [bug] FSFS errors when doing simultaneous commits - win32 apr_file_mktemp is broken

Posted by rb...@rkbloom.net.
I have forwarded this to the APR dev list.

Ryan

On Sun, 9 May 2004, C.A.T.Magic wrote:

> C.A.T.Magic wrote:
> > Josh Pieper wrote:
> >
> >> C.A.T.Magic wrote:
> [...]
> >>> which means that apr_file_mktemp is failing -
> >>> maybe a bug in apr for win32?
> [...]
> > fails with errorcode
> >   720080 -- The file exists
> > which is probably the most senseless thing a generate unique
> > file function could ever return :)
> [...]
>
> maybe a simple minded aproach will do a better job:
> (pseudocode)
>    int r = head_revision;
>    int t = 1;
>    while ( mkdir("$r-$t")==DIR_EXISTS ) t++;
>
> :)
> =======
> c.a.t.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [bug] FSFS errors when doing simultaneous commits - win32 apr_file_mktemp is broken

Posted by "C.A.T.Magic" <c....@gmx.at>.
Josh Pieper wrote:
> C.A.T.Magic wrote:
> 
>>maybe a simple minded aproach will do a better job:
>>(pseudocode)
>>  int r = head_revision;
>>  int t = 1;
>>  while ( mkdir("$r-$t")==DIR_EXISTS ) t++;

> This is essentially what I did by using svn_io_open_unique_file.  Try
> r9662 and see how it goes.

great, that finally solved all the commit problems :)


but noticed 2 more things:
its always starting with an "_2" suffix (never _0 or _1)
and it would be safer to test for the 3243_2.txn _directory_
directly instead of testing for a _file_ first.
because if things go bad, then there is an existing
   3242_2.txn directory but no 3242_2 temp-file, which would
lead to permanent commit errors.  svn_io_open_unique_dir ?  :)


thanks
:-)
c.a.t.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [bug] FSFS errors when doing simultaneous commits - win32 apr_file_mktemp is broken

Posted by Josh Pieper <jj...@pobox.com>.
C.A.T.Magic wrote:
> maybe a simple minded aproach will do a better job:
> (pseudocode)
>   int r = head_revision;
>   int t = 1;
>   while ( mkdir("$r-$t")==DIR_EXISTS ) t++;
> 
> :)

This is essentially what I did by using svn_io_open_unique_file.  Try
r9662 and see how it goes.

-Josh

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [bug] FSFS errors when doing simultaneous commits - win32 apr_file_mktemp is broken

Posted by "C.A.T.Magic" <c....@gmx.at>.
C.A.T.Magic wrote:
> Josh Pieper wrote:
> 
>> C.A.T.Magic wrote:
[...]
>>> which means that apr_file_mktemp is failing -
>>> maybe a bug in apr for win32?
[...]
> fails with errorcode
>   720080 -- The file exists
> which is probably the most senseless thing a generate unique
> file function could ever return :)
[...]

maybe a simple minded aproach will do a better job:
(pseudocode)
   int r = head_revision;
   int t = 1;
   while ( mkdir("$r-$t")==DIR_EXISTS ) t++;

:)
=======
c.a.t.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [bug] FSFS errors when doing simultaneous commits - win32 apr_file_mktemp is broken

Posted by "C.A.T.Magic" <c....@gmx.at>.
Josh Pieper wrote:
> C.A.T.Magic wrote:
> 
>>which means that apr_file_mktemp is failing -
>>maybe a bug in apr for win32?
> 
> C.A.T.:  Do you think you could figure out why apr_file_mktemp is
> failing?  Perhaps find the exact error code it is returning?

please change the flags for apr_file_mktemp in fsfs.c to
    APR_CREATE | APR_READ | APR_WRITE | APR_EXCL
because otherways apr always fails with EACCES on win32.


i debugged into the APR code and from what i see, i can tell
that apr's "gettemp" implementation is doint it quite
ineffective: it takes the "current time" and used that as a
seed for its random files --- which then collides quite often
if two processes (e.g HyperThread CPU) generate a temp file at 
approximately the same time. its also generating "Foo" and fOO"
which are identical for win32.
it then tries to create the same filename in both threads and
fails with errorcode
   720080 -- The file exists
which is probably the most senseless thing a generate unique
file function could ever return :)

since apr is broken this way, you should probably remove
all uses of apr_file_mktemp from svn, since you can't expect
all users to have the newest apr-fixes.
can someone point that behaviour out on the apr mailing list?


=======
c.a.t.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [bug] vtable=0x00000000 for the new FSFS when doing simultaneous commits

Posted by Josh Pieper <jj...@pobox.com>.
C.A.T.Magic wrote:
> i tried it the quick way by deleting
>   /* SVN_ERR (svn_io_file_close (txn_file, pool)); */
> from svn_fs_fs__create_txn,
> and this fixes the txn collisions.
> (the file gets closed anyway, because of the pool).
> but for some strange reasons i still get

I have fixed this temporary file so that it works the way I originally
intended it to in r9660.

> svn: Unable to create new transaction.
> 
> which means that apr_file_mktemp is failing -
> maybe a bug in apr for win32?

However, I am not sure why file_mktemp is still failing.  I looked
through the APR code for this, and from what I can tell it can only
fail if the parent directory doesn't exist, creating the file returns
an error other than APR_EEXIST, or counting the random portion of the
string completely wraps around.

On a side note, it probably shouldn't fail on the first wrap-around,
since the start could be arbitrarily close to the wrap-around point,
but I doubt this condition is being hit here.

C.A.T.:  Do you think you could figure out why apr_file_mktemp is
failing?  Perhaps find the exact error code it is returning?

Thanks,
Josh

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [bug] vtable=0x00000000 for the new FSFS when doing simultaneous commits

Posted by "C.A.T.Magic" <c....@gmx.at>.
C.A.T.Magic wrote:

> you create a unique filename through apr, but then DELETE
> that unique file. and then you try to create a new folder
> with the 'deleted' file's name. but since you delete the
> unique file first, the name is no longer unique and the
> second process is taking the same name.....

i tried it the quick way by deleting
   /* SVN_ERR (svn_io_file_close (txn_file, pool)); */
from svn_fs_fs__create_txn,
and this fixes the txn collisions.
(the file gets closed anyway, because of the pool).
but for some strange reasons i still get

svn: Unable to create new transaction.

which means that apr_file_mktemp is failing -
maybe a bug in apr for win32?

a better solution for Win32 meantime could be

   UINT GetTempFileName(
     LPCTSTR lpPathName,
     LPCTSTR lpPrefixString,
     UINT uUnique,
     LPTSTR lpTempFileName
   );

it seems to me that the apr-function for tempfiles on win32
is pretty new:
   APR-CHANGELOG 0.9.0:
   *) Fixed apr_file_mktemp on systems without mkstemp (Win32, etc).


(i'm using apr 0.9.5 dlls)


:-)
c.a.t.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [bug] vtable=0x00000000 for the new FSFS when doing simultaneous commits

Posted by "C.A.T.Magic" <c....@gmx.at>.
Greg Hudson wrote:

>>Transmitting file data .svn: Commit failed (details follow):
>>svn: Transaction out of date.
> 
> I also fixed this bug, in r9657.

its still there (or maybe another one with the same effect)


but this may be of interrest to you:

console #1:
   svn: Commit failed (details follow):
   svn: Can't open file   'X:/SVNSandbox/TestFSFS/RepoFS/db/ \
              transactions/KBQQdn.txn/rev': The file exists.

console #2:
   Sending        Version.h
   svn: Commit failed (details follow):
   svn: Can't read file   'X:/SVNSandbox/TestFSFS/RepoFS/db/ \
              transactions/KBQQdn.txn/0.0': End of file found

obviously, the two processes access the SAME txn folder
and offcourse this results in strange collisions.
(this effect can be reproduced).

i think the bug is here:

   if (apr_file_mktemp (&txn_file, txn_filename, 0, pool) != APR_SUCCESS)
     return svn_error_create (SVN_ERR_FS_CORRUPT, NULL,
                              "Unable to create new transaction.");

   /* Create the transaction directory based on this temporary file. */
   txn_dirname = apr_pstrcat (pool, txn_filename, SVN_FS_FS__TXNS_EXT, 
NULL);

   SVN_ERR (svn_io_make_dir_recursively (txn_dirname, pool));

   SVN_ERR (svn_io_file_close (txn_file, pool));


you create a unique filename through apr, but then DELETE
that unique file. and then you try to create a new folder
with the 'deleted' file's name. but since you delete the
unique file first, the name is no longer unique and the
second process is taking the same name.....

i think you have several options:
- use a simple increasing number for the txn folders
   with that number stored similar to db/uuid or db/current
   (+1)
- directly use an uuid
   (+0)
- create the unique file, but then keep it alive as long as
   you also have to keep the 'unique'.txn folder.
   (-0)
- check if svn_io_make_dir_recursively fails and if it does,
   simply try another unique name. (-1, this could loop forever).


=====
:-)
c.a.t.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [bug] vtable=0x00000000 for the new FSFS when doing simultaneous commits

Posted by Greg Hudson <gh...@MIT.EDU>.
On Sat, 2004-05-08 at 23:31, C.A.T.Magic wrote:
> > Let me know if the following patch helps:

> yes it did

Excellent.  I already committed that in r9658.

> Transmitting file data .svn: Commit failed (details follow):
> svn: Transaction out of date.

I also fixed this bug, in r9657.

> svn: Unable to create new transaction.
[...]
> svn: Can't read file
'X:/SVNSandbox/TestFSFS/RepoFS/db/transactions/2hZiLB.txn/0.0': End of
file found
[...]
> *crash*

I have no clue about these.

> it seems that either both processes try to access
> the same folder and damage each others transaction,
> or a process creates a transaction folder and
> fails to write into it.

To the best of my knowledge, each transaction should only be of interest
to one process, over ra_local or ra_svn.  (Over ra_dav, several
processes may access a transaction in sequence, but not concurrently.)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: [bug] vtable=0x00000000 for the new FSFS when doing simultaneous commits

Posted by "C.A.T.Magic" <c....@gmx.at>.
Greg Hudson wrote:

> Let me know if the following patch helps:


yes it did, it doesn't crash any more on my
previous test and also the 'working dir locked'
effect has gone away.

but i still get

 >>

Sending        Version.h
Transmitting file data .svn: Commit failed (details follow):
svn: Transaction out of date.
10030008
Sending        Version.h

<<

is this because both test files, Version.h and Version2.h
are located in the same directory? (repository root)
i can pause one console, then the other console succeeds,
then unpause it and pause the other and it succeeds.
but if they both run there is always one that succeeds
every time and the other that fails every time. kind of a
racing condition.


ok, I tried it with one file repos/Version.h and another
in repos/foo/Version2.h -- and got another crash now.
trying to reproduce it...


sometimes now i get:

=====1===================
svn: Commit failed (details follow):
svn: Can't open file 
'X:/SVNSandbox/TestFSFS/RepoFS/db/transactions/ttXiPf.txn/rev': The file 
exists.

=====2====================
svn: Commit failed (details follow):
svn: Unable to create new transaction.

=====3=====================
Transmitting file data .svn: Commit failed (details follow):
svn: Transaction out of date.

=====4=====================
Sending        Version.h
svn: Commit failed (details follow):
svn: Can't read file 
'X:/SVNSandbox/TestFSFS/RepoFS/db/transactions/2hZiLB.txn/0.0': End of 
file found

=====5=====================
*crash*
unfortunately i could not debug it this time,
because the crash only occours with the release build
and not with a debug build. (could mean access to
uninitialized or dead/freed memory?)
  -- is there a switch to the configure-script
to enable debug information for (msvc) release builds?


and the transaction dir contains several 'lost'
and -empty- transaction folders
('IdrIsD.txn', 'np5jKb.txn', etc)
it seems that either both processes try to access
the same folder and damage each others transaction,
or a process creates a transaction folder and
fails to write into it.


=======
goodnight for now.
i can probably try further experiments tomorrow, if needed.
:-)
c.a.t.



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org