You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Bob Kerns <Bo...@positscience.com> on 2007/05/28 17:52:09 UTC

svnadmin load errors on Windows

Google turns up a few past discussions of this symptom, but the analysis
is incorrect - that it's the fault of a virus scan or other interfering
process....

 

Symptom:  Doing svnadmin load gives:

svnadmin: Can't remove 'D:\repos\bfc\db\transactions\1768-1.txn': The
directory is not empty.

 

When you look, the directory is not only empty - it's usually even been
deleted!

 

I strongly suspect it occurs in other contexts as well, but svnadmin
load is where we do thousands of commits in a short time...

 

This is a problem that Subversion shares with many, many other tools -
ClearCase, Eclipse, ant, and even the Windows explorer. Eclipse has
recently taken steps in some (but not all) places to deal with it,
prompted by a report that it happens more often on Vista. This, after
years of beating my head against the wall looking for interfering
processes, finally led me to the light...

 

On Windows, DeleteFile is NOT SYNCHRONOUS; it marks the file for
deletion after all IO ceases and all handles are closed. This means that
if you delete all the files in a directory, and then delete the
directory, it will SOMETIMES FAIL.

 

Add to this that processes scanning directories for indexing, virus
protection, backup, and other purposes are part of the basic ecosystem,
and it is clear: the code should be prepared for deletion of the
directory to fail, and to retry a short time later.

 

This is a real pain. In my case, a repository move that should have
taken a few hours overnight in a single step is turning into a manual
nightmare, with a greatly increased risk I'll make a typo and drop
revisions or something.

 

I really think this needs to be tracked as a Subversion bug, tested for,
and fixed.

 

But I'm sending email first, since you've got that Big Yellow Buddy
System warning, and maybe it's in there some way I didn't manage to turn
up in my searches.


Re: svnadmin load errors on Windows (and .svn/tmp errors on svn cleanup)

Posted by "D.J. Heap" <dj...@gmail.com>.
On 6/27/07, Bob Kerns <Bo...@positscience.com> wrote:
[snip]
> Second obvservation:
>
> The comment on WIN32_RETRY_LOOP doesn't really explain the situation
> properly.
> /*
>  Windows is 'aided' by a number of types of applications that
>  follow other applications around and open up files they have
>  changed for various reasons (the most intrusive are virus
>  scanners).  So, if one of these other apps has glommed onto
>  our file we may get an 'access denied' error.
>
>  This retry loop does not completely solve the problem (who
>  knows how long the other app is going to hold onto it for), but
>  goes a long way towards minimizing it.  It is not an infinite
>  loop because there might really be an error.
> */
>
> This is only one of the scenarios that can cause it to fail. The main
> problem is that deleting is asynchronous. No virus scanners or other
> applications are necessary to provoke the problem.



True -- I'll clarify the comment.  Erik Huelsmann found a couple of
places where we weren't using our wrapper functions, and I think I've
found another spot where the async nature of delete's could be causing
us grief despite the retry loop.



>
> Third observation:
>
> I think the number of iterations in WIN32_RETRY_LOOP should be
> substantially increased, probably by a factor of 10. (That shouldn't be
> necessary for handling the asynchronous deletes, but if it's virus
> scanner, etc, it should really give it more time before concluding it's
> not going to work).


Possibly...but it is already retrying for over 10 seconds and that has
alleviated all known occurences to my knowledge (I believe the issue
you are hitting is caused by another problem and a longer loop
wouldn't help).  Making the user wait minutes in order to discover
what could be a real problem seems excessive to me.


>
> Final, MAIN observation:
>
> In this case, the problem (I believe) is in dir_make, which needs to use
> WIN32_RETRY_LOOP:
>  status = apr_dir_make(path_apr, perm, pool);
>  WIN32_RETRY_LOOP(status, apr_dir_make(path_apr, perm, pool);
>
> Probably also svn_io_make_dir_recursive:
>  apr_err = apr_dir_make_recursive(path_apr, APR_OS_DEFAULT, pool);
>  WIN32_RETRY_LOOP(apr_err, apr_dir_make_recursive(path_apr,
> APR_OS_DEFAULT, pool));
>
> What I believe is happening here, is that the delete of the tmp
> directory returns before the directory tmp has been removed, and the
> attempt to recreate it is failing when it happens before the actual
> removal.



Ah, yes I'll look at add the retry loop around that code.



> Sorry I can't actually make this change and verify it. If someone could
> make the change and ship me a svn.exe, I'd be happy to verify the fix.
> (And if you include the .pdb file, I can try to debug if it doesn't
> work).
>
> I don't think this is the same problem as I was seeing on svnadmin load,
> as the transaction directory names are incremented, so there shouldn't
> be a problem creating them. (Though perhaps the problem there is just
> not enough time for retry under heavy load?). I had to do another
> transfer, but since this was Windows to Windows (Server 2003) I just
> ROBOCOPY'd the repository instead of doing a dump/reload.


Right, I think it is a different issue as well -- our directory
deletion code was changed a while ago to 'rewind' and loop over any
remaining directory entries in order to work around a problem in OSX
and FreeBSD (I think).  I think Window's async delete's are working
against us in that situation, but I haven't completely tested it yet.

Once I have finished testing and it works for me, I'll send you the
binaries zip to test out, if you would.

DJ

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: svnadmin load errors on Windows (and .svn/tmp errors on svn cleanup)

Posted by Bob Kerns <Bo...@positscience.com>.
Well, the Compiler I have (the OrcaBeta1 of Visual Studio 9) doesn't
seem to be compatible.

But I have a different, but related problem, and what I *think* is a
solution. Unfortunately, since I can't build, I can't test....

Every time I try to do 'svn cleanup' on a working copy of our full tree
under Vista, it destroys my working copy. First it gets an "access
denied" trying to create .svn/tmp, and then it gets an error that the
.svn/tmp directory isn't there (since it couldn't create it).

The directory where it fails is non-deterministic -- a different one
every time. But over the size of my full tree, the odds of success for
all directories approach zero.

I have four observations, the fourth of which I believe is the cause of
this difficulty.

First obvservation: Revision 23370 claims to fix the error on missing
.svn/tmp, but doesn't seem to handle this case.
Revision 23370 - (view) (download) (annotate) - [select for diffs]
Modified Thu Feb 8 09:13:39 2007 UTC (4 months, 2 weeks ago) by lundblad
File length: 103631 byte(s)
Diff to previous 23341

Make 'svn cleanup' not fail on a missing .svn/tmp directory.  It is
removed and rebuilt anyway so failing if it does not exist is
counterproductive.

Patch by: Henner Zeller <h....@acm.org>

* subversion/include/svn_io.h
  (svn_io_remove_dir2): New function.  Add ignore_enoent flag.
  (svn_io_remove_dir): Deprecate.  All callers updated to use
  svn_io_remove_dir2.

* subversion/libsvn_subr/io.c
  (svn_io_remove_dir2): Copy from svn_io_remove_dir, adding
ignore_enoent
  parameter.
  (svn_io_remove_dir): Wrap svn_io_remove_dir2.

* subversion/libsvn_wc/adm_files.c (svn_wc__adm_cleanup_tmp_area): Don't
  fail if the tmp directory doesn't exist.

* subversion/tests/cmdline/basic_tests.py
  (basic_cleanup): Add regression: missing tmp-dir in cleanup.

* subversion/tests/cmdline/svntest/actions.py
  (remove_admin_tmp_dir): New function.

Second obvservation:

The comment on WIN32_RETRY_LOOP doesn't really explain the situation
properly.
/*
  Windows is 'aided' by a number of types of applications that
  follow other applications around and open up files they have
  changed for various reasons (the most intrusive are virus
  scanners).  So, if one of these other apps has glommed onto
  our file we may get an 'access denied' error.

  This retry loop does not completely solve the problem (who
  knows how long the other app is going to hold onto it for), but
  goes a long way towards minimizing it.  It is not an infinite
  loop because there might really be an error.
*/

This is only one of the scenarios that can cause it to fail. The main
problem is that deleting is asynchronous. No virus scanners or other
applications are necessary to provoke the problem.

Third observation:

I think the number of iterations in WIN32_RETRY_LOOP should be
substantially increased, probably by a factor of 10. (That shouldn't be
necessary for handling the asynchronous deletes, but if it's virus
scanner, etc, it should really give it more time before concluding it's
not going to work).

Final, MAIN observation:

In this case, the problem (I believe) is in dir_make, which needs to use
WIN32_RETRY_LOOP:
  status = apr_dir_make(path_apr, perm, pool);
  WIN32_RETRY_LOOP(status, apr_dir_make(path_apr, perm, pool);

Probably also svn_io_make_dir_recursive:
  apr_err = apr_dir_make_recursive(path_apr, APR_OS_DEFAULT, pool);
  WIN32_RETRY_LOOP(apr_err, apr_dir_make_recursive(path_apr,
APR_OS_DEFAULT, pool));

What I believe is happening here, is that the delete of the tmp
directory returns before the directory tmp has been removed, and the
attempt to recreate it is failing when it happens before the actual
removal.

Sorry I can't actually make this change and verify it. If someone could
make the change and ship me a svn.exe, I'd be happy to verify the fix.
(And if you include the .pdb file, I can try to debug if it doesn't
work).

I don't think this is the same problem as I was seeing on svnadmin load,
as the transaction directory names are incremented, so there shouldn't
be a problem creating them. (Though perhaps the problem there is just
not enough time for retry under heavy load?). I had to do another
transfer, but since this was Windows to Windows (Server 2003) I just
ROBOCOPY'd the repository instead of doing a dump/reload.

-----Original Message-----
From: Bob Kerns 
Sent: Saturday, June 02, 2007 06:59
To: 'D.J. Heap'
Cc: dev@subversion.tigris.org
Subject: RE: svnadmin load errors on Windows

Oh, great!  It makes me very happy to hear that you understand the
issue. Given that Windows Explorer, even on Vista, often leaves empty
folders behind when deleting large trees, you get credits for being way
ahead of the pack!

It usually just gets blamed on antivirus software...

Unfortunately, due to health, family, and work, I can't make any
promises, though I'd love to help. The place to look would be where it's
cleaning up completed transactions in the repository.

I've flagged this to follow up in 3 weeks; we'll see if I have more time
(and a C compiler!) by then.

My next migration (in a couple weeks) will be back from Windows to a new
Linux box, so I don't expect to do any more mass series of commits on
Windows, personally... but I've gotten enough value out of Subversion
I'd be pleased to pay some back.



-----Original Message-----
From: D.J. Heap [mailto:djheap@gmail.com] 
Sent: Saturday, June 02, 2007 06:06
To: Bob Kerns
Cc: dev@subversion.tigris.org
Subject: Re: svnadmin load errors on Windows

On 5/28/07, Bob Kerns <Bo...@positscience.com> wrote:
[snip]
>
> On Windows, DeleteFile is NOT SYNCHRONOUS; it marks the file for
deletion
> after all IO ceases and all handles are closed. This means that if you
> delete all the files in a directory, and then delete the directory, it
will
> SOMETIMES FAIL.


Yes, Subversion attempts to deal with this already.  See references to
the WIN32_RETRY_LOOP macro in subversion\libsvn_subr\io.c.

It is, of course, possible that code has crept in that does not use
the subversion wrappers for removing files or dirs and so could
exhibit the problem.  Do you mind scanning the source for that and
creating a patch if you find any?  If you could even scan the source
and file an issue with the details if you find somewhere the
subversion wrappers aren't being used, that would be great too.

Thanks!

DJ

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


RE: svnadmin load errors on Windows

Posted by Bob Kerns <Bo...@positscience.com>.
Oh, great!  It makes me very happy to hear that you understand the
issue. Given that Windows Explorer, even on Vista, often leaves empty
folders behind when deleting large trees, you get credits for being way
ahead of the pack!

It usually just gets blamed on antivirus software...

Unfortunately, due to health, family, and work, I can't make any
promises, though I'd love to help. The place to look would be where it's
cleaning up completed transactions in the repository.

I've flagged this to follow up in 3 weeks; we'll see if I have more time
(and a C compiler!) by then.

My next migration (in a couple weeks) will be back from Windows to a new
Linux box, so I don't expect to do any more mass series of commits on
Windows, personally... but I've gotten enough value out of Subversion
I'd be pleased to pay some back.



-----Original Message-----
From: D.J. Heap [mailto:djheap@gmail.com] 
Sent: Saturday, June 02, 2007 06:06
To: Bob Kerns
Cc: dev@subversion.tigris.org
Subject: Re: svnadmin load errors on Windows

On 5/28/07, Bob Kerns <Bo...@positscience.com> wrote:
[snip]
>
> On Windows, DeleteFile is NOT SYNCHRONOUS; it marks the file for
deletion
> after all IO ceases and all handles are closed. This means that if you
> delete all the files in a directory, and then delete the directory, it
will
> SOMETIMES FAIL.


Yes, Subversion attempts to deal with this already.  See references to
the WIN32_RETRY_LOOP macro in subversion\libsvn_subr\io.c.

It is, of course, possible that code has crept in that does not use
the subversion wrappers for removing files or dirs and so could
exhibit the problem.  Do you mind scanning the source for that and
creating a patch if you find any?  If you could even scan the source
and file an issue with the details if you find somewhere the
subversion wrappers aren't being used, that would be great too.

Thanks!

DJ

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org


Re: svnadmin load errors on Windows

Posted by "D.J. Heap" <dj...@gmail.com>.
On 5/28/07, Bob Kerns <Bo...@positscience.com> wrote:
[snip]
>
> On Windows, DeleteFile is NOT SYNCHRONOUS; it marks the file for deletion
> after all IO ceases and all handles are closed. This means that if you
> delete all the files in a directory, and then delete the directory, it will
> SOMETIMES FAIL.


Yes, Subversion attempts to deal with this already.  See references to
the WIN32_RETRY_LOOP macro in subversion\libsvn_subr\io.c.

It is, of course, possible that code has crept in that does not use
the subversion wrappers for removing files or dirs and so could
exhibit the problem.  Do you mind scanning the source for that and
creating a patch if you find any?  If you could even scan the source
and file an issue with the details if you find somewhere the
subversion wrappers aren't being used, that would be great too.

Thanks!

DJ

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org