You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geronimo.apache.org by "Ted Kirby (JIRA)" <ji...@apache.org> on 2007/09/27 01:46:50 UTC

[jira] Created: (GERONIMO-3489) Deployment problems caused by file deletion failures

Deployment problems caused by file deletion failures
----------------------------------------------------

                 Key: GERONIMO-3489
                 URL: https://issues.apache.org/jira/browse/GERONIMO-3489
             Project: Geronimo
          Issue Type: Bug
      Security Level: public (Regular issues)
          Components: deployment
    Affects Versions: 2.0.1
            Reporter: Ted Kirby
             Fix For: 2.0.2, 2.0.x, 2.1


File.delete() failures in IOUtil.recursiveDelete() are causing various deployment problems.  I open this JIRA to discuss them to see how the server might better handle them.  In all but one case, delete failures are not even noted with a log record!  Deletion problems are seen in many environments and platforms, but they are persistently fatal when using a NFS file system for the repository.

In investigating the problem, I have added code to recursiveDelete to retry the delete a few times if it fails.  I added code to list directory contents if a directory delete failed, and saw a file named .nfs000000002bc43500000053e in the directory.  My first attempt at a bypass was to retry a failed delete 5 times, sleeping a second before each try.  This did not work.  I added a call to System.gc() before each sleep, and this got me passed the problem.  Interestingly, two retries were required to get this to work.  In another version, each retry was a second longer, and I printed all file names in a directory before trying the delete.  This worked in most cases, but required the full 5 retries, so I suspect System.gc() would have time.  System.runFinalization() would be something else to try.

RepositoryConfigurationStore.createNewConfigurationDir(Artifact) shows the failing end of the deletion problem, with the dreaded ConfigurationAlreadyExistsException("Configuration already exists: " + configId)exception.  I think this message is not good.  It should really say directory already exists.  If the file is not deleted on undeploy, this failure occurs on a subsequent deploy.  What is really bad is if the user invokes a redeploy operation, and the file delete fails on the undeploy.  It is important that undeploy not complete until the file goes away.

>From other environments, I am not convinced that all file handles and references, and particularly open streams, are being closed on some artifacts.  This will cause the delete to fail.  It may be that the gc() calls are cleaning these up, and allowing the deletes to work in my case above.

Another option is that RepositoryConfigurationStore.createNewConfigurationDir(Artifact) not throw a ConfigurationAlreadyExistsException if the only problem is an empty directory structure exists.  The next line creates the directory structure anyway.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (GERONIMO-3489) Deployment problems caused by file deletion failures

Posted by "Ted Kirby (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GERONIMO-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Kirby updated GERONIMO-3489:
--------------------------------

    Attachment: G3489-3.patch

G3489-3.patch:

If the directory being created at deploy time already exists, and is empty, then proceed, don't throw a ConfigurationAlreadyExistsException.

> Deployment problems caused by file deletion failures
> ----------------------------------------------------
>
>                 Key: GERONIMO-3489
>                 URL: https://issues.apache.org/jira/browse/GERONIMO-3489
>             Project: Geronimo
>          Issue Type: Bug
>      Security Level: public(Regular issues) 
>          Components: deployment
>    Affects Versions: 2.0.1
>            Reporter: Ted Kirby
>            Assignee: Donald Woods
>             Fix For: 2.0.2, 2.0.x, 2.1
>
>         Attachments: G3489-1.patch, G3489-2.patch, G3489-3.patch
>
>
> File.delete() failures in IOUtil.recursiveDelete() are causing various deployment problems.  I open this JIRA to discuss them to see how the server might better handle them.  In all but one case, delete failures are not even noted with a log record!  Deletion problems are seen in many environments and platforms, but they are persistently fatal when using a NFS file system for the repository.
> In investigating the problem, I have added code to recursiveDelete to retry the delete a few times if it fails.  I added code to list directory contents if a directory delete failed, and saw a file named .nfs000000002bc43500000053e in the directory.  My first attempt at a bypass was to retry a failed delete 5 times, sleeping a second before each try.  This did not work.  I added a call to System.gc() before each sleep, and this got me passed the problem.  Interestingly, two retries were required to get this to work.  In another version, each retry was a second longer, and I printed all file names in a directory before trying the delete.  This worked in most cases, but required the full 5 retries, so I suspect System.gc() would have time.  System.runFinalization() would be something else to try.
> RepositoryConfigurationStore.createNewConfigurationDir(Artifact) shows the failing end of the deletion problem, with the dreaded ConfigurationAlreadyExistsException("Configuration already exists: " + configId)exception.  I think this message is not good.  It should really say directory already exists.  If the file is not deleted on undeploy, this failure occurs on a subsequent deploy.  What is really bad is if the user invokes a redeploy operation, and the file delete fails on the undeploy.  It is important that undeploy not complete until the file goes away.
> From other environments, I am not convinced that all file handles and references, and particularly open streams, are being closed on some artifacts.  This will cause the delete to fail.  It may be that the gc() calls are cleaning these up, and allowing the deletes to work in my case above.
> Another option is that RepositoryConfigurationStore.createNewConfigurationDir(Artifact) not throw a ConfigurationAlreadyExistsException if the only problem is an empty directory structure exists.  The next line creates the directory structure anyway.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (GERONIMO-3489) Deployment problems caused by file deletion failures

Posted by "Ted Kirby (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GERONIMO-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Kirby updated GERONIMO-3489:
--------------------------------

    Attachment: G3489-1.patch

G4380-1.patch is a patch for IOUtil.recursiveDelete.

System.runFinalization() didn't really seem to help out much, so I went with System.gc().
This patch warns on delete failure after 5 retries, and will put debug info on each failure and retry.

> Deployment problems caused by file deletion failures
> ----------------------------------------------------
>
>                 Key: GERONIMO-3489
>                 URL: https://issues.apache.org/jira/browse/GERONIMO-3489
>             Project: Geronimo
>          Issue Type: Bug
>      Security Level: public(Regular issues) 
>          Components: deployment
>    Affects Versions: 2.0.1
>            Reporter: Ted Kirby
>             Fix For: 2.0.2, 2.0.x, 2.1
>
>         Attachments: G3489-1.patch
>
>
> File.delete() failures in IOUtil.recursiveDelete() are causing various deployment problems.  I open this JIRA to discuss them to see how the server might better handle them.  In all but one case, delete failures are not even noted with a log record!  Deletion problems are seen in many environments and platforms, but they are persistently fatal when using a NFS file system for the repository.
> In investigating the problem, I have added code to recursiveDelete to retry the delete a few times if it fails.  I added code to list directory contents if a directory delete failed, and saw a file named .nfs000000002bc43500000053e in the directory.  My first attempt at a bypass was to retry a failed delete 5 times, sleeping a second before each try.  This did not work.  I added a call to System.gc() before each sleep, and this got me passed the problem.  Interestingly, two retries were required to get this to work.  In another version, each retry was a second longer, and I printed all file names in a directory before trying the delete.  This worked in most cases, but required the full 5 retries, so I suspect System.gc() would have time.  System.runFinalization() would be something else to try.
> RepositoryConfigurationStore.createNewConfigurationDir(Artifact) shows the failing end of the deletion problem, with the dreaded ConfigurationAlreadyExistsException("Configuration already exists: " + configId)exception.  I think this message is not good.  It should really say directory already exists.  If the file is not deleted on undeploy, this failure occurs on a subsequent deploy.  What is really bad is if the user invokes a redeploy operation, and the file delete fails on the undeploy.  It is important that undeploy not complete until the file goes away.
> From other environments, I am not convinced that all file handles and references, and particularly open streams, are being closed on some artifacts.  This will cause the delete to fail.  It may be that the gc() calls are cleaning these up, and allowing the deletes to work in my case above.
> Another option is that RepositoryConfigurationStore.createNewConfigurationDir(Artifact) not throw a ConfigurationAlreadyExistsException if the only problem is an empty directory structure exists.  The next line creates the directory structure anyway.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (GERONIMO-3489) Deployment problems caused by file deletion failures

Posted by "Shiva Kumar H R (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GERONIMO-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12530643 ] 

Shiva Kumar H R commented on GERONIMO-3489:
-------------------------------------------

Thanks Ted for raising this. I have hit this problem many a times when deploying from within Eclipse Plug-in. I use Windows XP + NTFS + Sun JDK 1.5.0_11.

> Deployment problems caused by file deletion failures
> ----------------------------------------------------
>
>                 Key: GERONIMO-3489
>                 URL: https://issues.apache.org/jira/browse/GERONIMO-3489
>             Project: Geronimo
>          Issue Type: Bug
>      Security Level: public(Regular issues) 
>          Components: deployment
>    Affects Versions: 2.0.1
>            Reporter: Ted Kirby
>             Fix For: 2.0.2, 2.0.x, 2.1
>
>
> File.delete() failures in IOUtil.recursiveDelete() are causing various deployment problems.  I open this JIRA to discuss them to see how the server might better handle them.  In all but one case, delete failures are not even noted with a log record!  Deletion problems are seen in many environments and platforms, but they are persistently fatal when using a NFS file system for the repository.
> In investigating the problem, I have added code to recursiveDelete to retry the delete a few times if it fails.  I added code to list directory contents if a directory delete failed, and saw a file named .nfs000000002bc43500000053e in the directory.  My first attempt at a bypass was to retry a failed delete 5 times, sleeping a second before each try.  This did not work.  I added a call to System.gc() before each sleep, and this got me passed the problem.  Interestingly, two retries were required to get this to work.  In another version, each retry was a second longer, and I printed all file names in a directory before trying the delete.  This worked in most cases, but required the full 5 retries, so I suspect System.gc() would have time.  System.runFinalization() would be something else to try.
> RepositoryConfigurationStore.createNewConfigurationDir(Artifact) shows the failing end of the deletion problem, with the dreaded ConfigurationAlreadyExistsException("Configuration already exists: " + configId)exception.  I think this message is not good.  It should really say directory already exists.  If the file is not deleted on undeploy, this failure occurs on a subsequent deploy.  What is really bad is if the user invokes a redeploy operation, and the file delete fails on the undeploy.  It is important that undeploy not complete until the file goes away.
> From other environments, I am not convinced that all file handles and references, and particularly open streams, are being closed on some artifacts.  This will cause the delete to fail.  It may be that the gc() calls are cleaning these up, and allowing the deletes to work in my case above.
> Another option is that RepositoryConfigurationStore.createNewConfigurationDir(Artifact) not throw a ConfigurationAlreadyExistsException if the only problem is an empty directory structure exists.  The next line creates the directory structure anyway.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (GERONIMO-3489) Deployment problems caused by file deletion failures

Posted by "Ted Kirby (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GERONIMO-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12532435 ] 

Ted Kirby commented on GERONIMO-3489:
-------------------------------------

I tested G3489-3 patch, and it worked.

I tested with a sample app that deployed to (whose location was):

repository\org\apache\geronimo\applications\examples\geronimo-jsp-examples\2.0\geronimo-jsp-examples-2.0.war

Both before and after the fix, if any subcomponent/subset of the location existed, deploy would succeed.

The fix works when the entire location exists, it is a directory, and the directory is empty.

I log a debug message to show how many and which files are in the location directory, if it exists.

In my testing, I did not see the message in my server log.  I attributed this to the server currently having logging severely locked down to WARN logging, and my not being able to find all the places I needed to tweak to enable DEBUG logging.  Is there any doc on how to do this?

> Deployment problems caused by file deletion failures
> ----------------------------------------------------
>
>                 Key: GERONIMO-3489
>                 URL: https://issues.apache.org/jira/browse/GERONIMO-3489
>             Project: Geronimo
>          Issue Type: Bug
>      Security Level: public(Regular issues) 
>          Components: deployment
>    Affects Versions: 2.0.1
>            Reporter: Ted Kirby
>            Assignee: Donald Woods
>             Fix For: 2.0.2, 2.0.x, 2.1
>
>         Attachments: G3489-1.patch, G3489-2.patch, G3489-3.patch
>
>
> File.delete() failures in IOUtil.recursiveDelete() are causing various deployment problems.  I open this JIRA to discuss them to see how the server might better handle them.  In all but one case, delete failures are not even noted with a log record!  Deletion problems are seen in many environments and platforms, but they are persistently fatal when using a NFS file system for the repository.
> In investigating the problem, I have added code to recursiveDelete to retry the delete a few times if it fails.  I added code to list directory contents if a directory delete failed, and saw a file named .nfs000000002bc43500000053e in the directory.  My first attempt at a bypass was to retry a failed delete 5 times, sleeping a second before each try.  This did not work.  I added a call to System.gc() before each sleep, and this got me passed the problem.  Interestingly, two retries were required to get this to work.  In another version, each retry was a second longer, and I printed all file names in a directory before trying the delete.  This worked in most cases, but required the full 5 retries, so I suspect System.gc() would have time.  System.runFinalization() would be something else to try.
> RepositoryConfigurationStore.createNewConfigurationDir(Artifact) shows the failing end of the deletion problem, with the dreaded ConfigurationAlreadyExistsException("Configuration already exists: " + configId)exception.  I think this message is not good.  It should really say directory already exists.  If the file is not deleted on undeploy, this failure occurs on a subsequent deploy.  What is really bad is if the user invokes a redeploy operation, and the file delete fails on the undeploy.  It is important that undeploy not complete until the file goes away.
> From other environments, I am not convinced that all file handles and references, and particularly open streams, are being closed on some artifacts.  This will cause the delete to fail.  It may be that the gc() calls are cleaning these up, and allowing the deletes to work in my case above.
> Another option is that RepositoryConfigurationStore.createNewConfigurationDir(Artifact) not throw a ConfigurationAlreadyExistsException if the only problem is an empty directory structure exists.  The next line creates the directory structure anyway.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (GERONIMO-3489) Deployment problems caused by file deletion failures

Posted by "Ted Kirby (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GERONIMO-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12530765 ] 

Ted Kirby commented on GERONIMO-3489:
-------------------------------------

I note also that starting the server with these two system properties significantly reduced the occurrence of the problem:

Xorg.apache.geronimo.JarFileClassLoader=true

Xorg.apache.geronimo.kernel.config.MPCLSearchOption=safe

The bad news is that server startup time was increased.  :-(


> Deployment problems caused by file deletion failures
> ----------------------------------------------------
>
>                 Key: GERONIMO-3489
>                 URL: https://issues.apache.org/jira/browse/GERONIMO-3489
>             Project: Geronimo
>          Issue Type: Bug
>      Security Level: public(Regular issues) 
>          Components: deployment
>    Affects Versions: 2.0.1
>            Reporter: Ted Kirby
>             Fix For: 2.0.2, 2.0.x, 2.1
>
>
> File.delete() failures in IOUtil.recursiveDelete() are causing various deployment problems.  I open this JIRA to discuss them to see how the server might better handle them.  In all but one case, delete failures are not even noted with a log record!  Deletion problems are seen in many environments and platforms, but they are persistently fatal when using a NFS file system for the repository.
> In investigating the problem, I have added code to recursiveDelete to retry the delete a few times if it fails.  I added code to list directory contents if a directory delete failed, and saw a file named .nfs000000002bc43500000053e in the directory.  My first attempt at a bypass was to retry a failed delete 5 times, sleeping a second before each try.  This did not work.  I added a call to System.gc() before each sleep, and this got me passed the problem.  Interestingly, two retries were required to get this to work.  In another version, each retry was a second longer, and I printed all file names in a directory before trying the delete.  This worked in most cases, but required the full 5 retries, so I suspect System.gc() would have time.  System.runFinalization() would be something else to try.
> RepositoryConfigurationStore.createNewConfigurationDir(Artifact) shows the failing end of the deletion problem, with the dreaded ConfigurationAlreadyExistsException("Configuration already exists: " + configId)exception.  I think this message is not good.  It should really say directory already exists.  If the file is not deleted on undeploy, this failure occurs on a subsequent deploy.  What is really bad is if the user invokes a redeploy operation, and the file delete fails on the undeploy.  It is important that undeploy not complete until the file goes away.
> From other environments, I am not convinced that all file handles and references, and particularly open streams, are being closed on some artifacts.  This will cause the delete to fail.  It may be that the gc() calls are cleaning these up, and allowing the deletes to work in my case above.
> Another option is that RepositoryConfigurationStore.createNewConfigurationDir(Artifact) not throw a ConfigurationAlreadyExistsException if the only problem is an empty directory structure exists.  The next line creates the directory structure anyway.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (GERONIMO-3489) Deployment problems caused by file deletion failures

Posted by "Ted Kirby (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GERONIMO-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ted Kirby updated GERONIMO-3489:
--------------------------------

    Attachment: G3489-2.patch

Patch G3489-2: Add System.runFinalization(), sleep call in the retry loop, before the System.gc(), sleep sequence.

> Deployment problems caused by file deletion failures
> ----------------------------------------------------
>
>                 Key: GERONIMO-3489
>                 URL: https://issues.apache.org/jira/browse/GERONIMO-3489
>             Project: Geronimo
>          Issue Type: Bug
>      Security Level: public(Regular issues) 
>          Components: deployment
>    Affects Versions: 2.0.1
>            Reporter: Ted Kirby
>             Fix For: 2.0.2, 2.0.x, 2.1
>
>         Attachments: G3489-1.patch, G3489-2.patch
>
>
> File.delete() failures in IOUtil.recursiveDelete() are causing various deployment problems.  I open this JIRA to discuss them to see how the server might better handle them.  In all but one case, delete failures are not even noted with a log record!  Deletion problems are seen in many environments and platforms, but they are persistently fatal when using a NFS file system for the repository.
> In investigating the problem, I have added code to recursiveDelete to retry the delete a few times if it fails.  I added code to list directory contents if a directory delete failed, and saw a file named .nfs000000002bc43500000053e in the directory.  My first attempt at a bypass was to retry a failed delete 5 times, sleeping a second before each try.  This did not work.  I added a call to System.gc() before each sleep, and this got me passed the problem.  Interestingly, two retries were required to get this to work.  In another version, each retry was a second longer, and I printed all file names in a directory before trying the delete.  This worked in most cases, but required the full 5 retries, so I suspect System.gc() would have time.  System.runFinalization() would be something else to try.
> RepositoryConfigurationStore.createNewConfigurationDir(Artifact) shows the failing end of the deletion problem, with the dreaded ConfigurationAlreadyExistsException("Configuration already exists: " + configId)exception.  I think this message is not good.  It should really say directory already exists.  If the file is not deleted on undeploy, this failure occurs on a subsequent deploy.  What is really bad is if the user invokes a redeploy operation, and the file delete fails on the undeploy.  It is important that undeploy not complete until the file goes away.
> From other environments, I am not convinced that all file handles and references, and particularly open streams, are being closed on some artifacts.  This will cause the delete to fail.  It may be that the gc() calls are cleaning these up, and allowing the deletes to work in my case above.
> Another option is that RepositoryConfigurationStore.createNewConfigurationDir(Artifact) not throw a ConfigurationAlreadyExistsException if the only problem is an empty directory structure exists.  The next line creates the directory structure anyway.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (GERONIMO-3489) Deployment problems caused by file deletion failures

Posted by "Donald Woods (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GERONIMO-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Donald Woods closed GERONIMO-3489.
----------------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 2.0.x)

Committed revision 581974 to trunk (2.1-SNAPSHOT)
Committed revision 581975 to branches/2.0 (2.0.2-SNAPSHOT)
Ted, thanks for the patches.


> Deployment problems caused by file deletion failures
> ----------------------------------------------------
>
>                 Key: GERONIMO-3489
>                 URL: https://issues.apache.org/jira/browse/GERONIMO-3489
>             Project: Geronimo
>          Issue Type: Bug
>      Security Level: public(Regular issues) 
>          Components: deployment
>    Affects Versions: 2.0.1
>            Reporter: Ted Kirby
>            Assignee: Donald Woods
>             Fix For: 2.0.2, 2.1
>
>         Attachments: G3489-1.patch, G3489-2.patch, G3489-3.patch
>
>
> File.delete() failures in IOUtil.recursiveDelete() are causing various deployment problems.  I open this JIRA to discuss them to see how the server might better handle them.  In all but one case, delete failures are not even noted with a log record!  Deletion problems are seen in many environments and platforms, but they are persistently fatal when using a NFS file system for the repository.
> In investigating the problem, I have added code to recursiveDelete to retry the delete a few times if it fails.  I added code to list directory contents if a directory delete failed, and saw a file named .nfs000000002bc43500000053e in the directory.  My first attempt at a bypass was to retry a failed delete 5 times, sleeping a second before each try.  This did not work.  I added a call to System.gc() before each sleep, and this got me passed the problem.  Interestingly, two retries were required to get this to work.  In another version, each retry was a second longer, and I printed all file names in a directory before trying the delete.  This worked in most cases, but required the full 5 retries, so I suspect System.gc() would have time.  System.runFinalization() would be something else to try.
> RepositoryConfigurationStore.createNewConfigurationDir(Artifact) shows the failing end of the deletion problem, with the dreaded ConfigurationAlreadyExistsException("Configuration already exists: " + configId)exception.  I think this message is not good.  It should really say directory already exists.  If the file is not deleted on undeploy, this failure occurs on a subsequent deploy.  What is really bad is if the user invokes a redeploy operation, and the file delete fails on the undeploy.  It is important that undeploy not complete until the file goes away.
> From other environments, I am not convinced that all file handles and references, and particularly open streams, are being closed on some artifacts.  This will cause the delete to fail.  It may be that the gc() calls are cleaning these up, and allowing the deletes to work in my case above.
> Another option is that RepositoryConfigurationStore.createNewConfigurationDir(Artifact) not throw a ConfigurationAlreadyExistsException if the only problem is an empty directory structure exists.  The next line creates the directory structure anyway.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (GERONIMO-3489) Deployment problems caused by file deletion failures

Posted by "Donald Woods (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GERONIMO-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Donald Woods reassigned GERONIMO-3489:
--------------------------------------

    Assignee: Donald Woods

> Deployment problems caused by file deletion failures
> ----------------------------------------------------
>
>                 Key: GERONIMO-3489
>                 URL: https://issues.apache.org/jira/browse/GERONIMO-3489
>             Project: Geronimo
>          Issue Type: Bug
>      Security Level: public(Regular issues) 
>          Components: deployment
>    Affects Versions: 2.0.1
>            Reporter: Ted Kirby
>            Assignee: Donald Woods
>             Fix For: 2.0.2, 2.0.x, 2.1
>
>         Attachments: G3489-1.patch, G3489-2.patch
>
>
> File.delete() failures in IOUtil.recursiveDelete() are causing various deployment problems.  I open this JIRA to discuss them to see how the server might better handle them.  In all but one case, delete failures are not even noted with a log record!  Deletion problems are seen in many environments and platforms, but they are persistently fatal when using a NFS file system for the repository.
> In investigating the problem, I have added code to recursiveDelete to retry the delete a few times if it fails.  I added code to list directory contents if a directory delete failed, and saw a file named .nfs000000002bc43500000053e in the directory.  My first attempt at a bypass was to retry a failed delete 5 times, sleeping a second before each try.  This did not work.  I added a call to System.gc() before each sleep, and this got me passed the problem.  Interestingly, two retries were required to get this to work.  In another version, each retry was a second longer, and I printed all file names in a directory before trying the delete.  This worked in most cases, but required the full 5 retries, so I suspect System.gc() would have time.  System.runFinalization() would be something else to try.
> RepositoryConfigurationStore.createNewConfigurationDir(Artifact) shows the failing end of the deletion problem, with the dreaded ConfigurationAlreadyExistsException("Configuration already exists: " + configId)exception.  I think this message is not good.  It should really say directory already exists.  If the file is not deleted on undeploy, this failure occurs on a subsequent deploy.  What is really bad is if the user invokes a redeploy operation, and the file delete fails on the undeploy.  It is important that undeploy not complete until the file goes away.
> From other environments, I am not convinced that all file handles and references, and particularly open streams, are being closed on some artifacts.  This will cause the delete to fail.  It may be that the gc() calls are cleaning these up, and allowing the deletes to work in my case above.
> Another option is that RepositoryConfigurationStore.createNewConfigurationDir(Artifact) not throw a ConfigurationAlreadyExistsException if the only problem is an empty directory structure exists.  The next line creates the directory structure anyway.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.