You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Danny Leshem (JIRA)" <ji...@apache.org> on 2010/04/07 09:59:33 UTC

[jira] Created: (HADOOP-6688) FileSystem.delete(...) implementations should not throw FileNotFoundException

FileSystem.delete(...) implementations should not throw FileNotFoundException
-----------------------------------------------------------------------------

                 Key: HADOOP-6688
                 URL: https://issues.apache.org/jira/browse/HADOOP-6688
             Project: Hadoop Common
          Issue Type: Bug
          Components: fs, fs/s3
    Affects Versions: 0.20.2
         Environment: Amazon EC2/S3
            Reporter: Danny Leshem
            Priority: Blocker
             Fix For: 0.20.3, 0.21.0, 0.22.0


S3FileSystem.delete(Path path, boolean recursive) may fail and throw a FileNotFoundException if a directory is being deleted while at the same time some of its files are deleted in the background.

This is definitely not the expected behavior of a delete method. If one of the to-be-deleted files is found missing, the method should not fail and simply continue. This is true for the general contract of FileSystem.delete, and also for its various implementations: RawLocalFileSystem (and specifically FileUtil.fullyDelete) exhibits the same problem.

The fix is to silently catch and ignore FileNotFoundExceptions in delete loops. This can very easily be unit-tested, at least for RawLocalFileSystem.


The reason this issue bothers me is that the cleanup part of a long (Mahout) MR job inconsistently fails for me, and I think this is the root problem. The log shows:
{code}
java.io.FileNotFoundException: s3://S3-BUCKET/tmp/0008E25BF7554CA9/2521362836721872/DistributedMatrix.times.outputVector/_temporary/_attempt_201004061215_0092_r_000002_0/part-00002: No such file or directory.
	at org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:334)
	at org.apache.hadoop.fs.s3.S3FileSystem.listStatus(S3FileSystem.java:193)
	at org.apache.hadoop.fs.s3.S3FileSystem.delete(S3FileSystem.java:303)
	at org.apache.hadoop.fs.s3.S3FileSystem.delete(S3FileSystem.java:312)
	at org.apache.hadoop.mapred.FileOutputCommitter.cleanupJob(FileOutputCommitter.java:64)
	at org.apache.hadoop.mapred.OutputCommitter.cleanupJob(OutputCommitter.java:135)
	at org.apache.hadoop.mapred.Task.runJobCleanupTask(Task.java:826)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:292)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)
{code}
(similar errors are displayed for ReduceTask.run)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6688) FileSystem.delete(...) implementations should not throw FileNotFoundException

Posted by "Danny Leshem (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856145#action_12856145 ] 

Danny Leshem commented on HADOOP-6688:
--------------------------------------

Tom, this is a classic issue of coding "for the programmer" or "against the programmer".

What is the expected effect of calling FileSystem.delete(someFile)? A reasonable answer would be "someFile no longer exists", and the method should throw an exception only if that expected effect somehow did not happen, meaning the file is there.

For those unlikely users who care whether the file was actually deleted or wasn't there to begin with, you can have FileSystem.delete() return a boolean to convey this information. This is common practice in many file system APIs.

I have no idea as to the root cause of the file not existing. It might be a similar consistency issue to the one you mentioned - I have not looked into it. Since fixing FileSystem.delete is rather easy (and the change is very contained), I think it's a better solution than working around the issue.

> FileSystem.delete(...) implementations should not throw FileNotFoundException
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-6688
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6688
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs, fs/s3
>    Affects Versions: 0.20.2
>         Environment: Amazon EC2/S3
>            Reporter: Danny Leshem
>            Priority: Blocker
>             Fix For: 0.20.3, 0.21.0, 0.22.0
>
>
> S3FileSystem.delete(Path path, boolean recursive) may fail and throw a FileNotFoundException if a directory is being deleted while at the same time some of its files are deleted in the background.
> This is definitely not the expected behavior of a delete method. If one of the to-be-deleted files is found missing, the method should not fail and simply continue. This is true for the general contract of FileSystem.delete, and also for its various implementations: RawLocalFileSystem (and specifically FileUtil.fullyDelete) exhibits the same problem.
> The fix is to silently catch and ignore FileNotFoundExceptions in delete loops. This can very easily be unit-tested, at least for RawLocalFileSystem.
> The reason this issue bothers me is that the cleanup part of a long (Mahout) MR job inconsistently fails for me, and I think this is the root problem. The log shows:
> {code}
> java.io.FileNotFoundException: s3://S3-BUCKET/tmp/0008E25BF7554CA9/2521362836721872/DistributedMatrix.times.outputVector/_temporary/_attempt_201004061215_0092_r_000002_0/part-00002: No such file or directory.
> 	at org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:334)
> 	at org.apache.hadoop.fs.s3.S3FileSystem.listStatus(S3FileSystem.java:193)
> 	at org.apache.hadoop.fs.s3.S3FileSystem.delete(S3FileSystem.java:303)
> 	at org.apache.hadoop.fs.s3.S3FileSystem.delete(S3FileSystem.java:312)
> 	at org.apache.hadoop.mapred.FileOutputCommitter.cleanupJob(FileOutputCommitter.java:64)
> 	at org.apache.hadoop.mapred.OutputCommitter.cleanupJob(OutputCommitter.java:135)
> 	at org.apache.hadoop.mapred.Task.runJobCleanupTask(Task.java:826)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:292)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
> {code}
> (similar errors are displayed for ReduceTask.run)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-6688) FileSystem.delete(...) implementations should not throw FileNotFoundException

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated HADOOP-6688:
------------------------------

    Fix Version/s:     (was: 0.21.0)

> FileSystem.delete(...) implementations should not throw FileNotFoundException
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-6688
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6688
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs, fs/s3
>    Affects Versions: 0.20.2
>         Environment: Amazon EC2/S3
>            Reporter: Danny Leshem
>            Priority: Minor
>             Fix For: 0.20.3, 0.22.0
>
>
> S3FileSystem.delete(Path path, boolean recursive) may fail and throw a FileNotFoundException if a directory is being deleted while at the same time some of its files are deleted in the background.
> This is definitely not the expected behavior of a delete method. If one of the to-be-deleted files is found missing, the method should not fail and simply continue. This is true for the general contract of FileSystem.delete, and also for its various implementations: RawLocalFileSystem (and specifically FileUtil.fullyDelete) exhibits the same problem.
> The fix is to silently catch and ignore FileNotFoundExceptions in delete loops. This can very easily be unit-tested, at least for RawLocalFileSystem.
> The reason this issue bothers me is that the cleanup part of a long (Mahout) MR job inconsistently fails for me, and I think this is the root problem. The log shows:
> {code}
> java.io.FileNotFoundException: s3://S3-BUCKET/tmp/0008E25BF7554CA9/2521362836721872/DistributedMatrix.times.outputVector/_temporary/_attempt_201004061215_0092_r_000002_0/part-00002: No such file or directory.
> 	at org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:334)
> 	at org.apache.hadoop.fs.s3.S3FileSystem.listStatus(S3FileSystem.java:193)
> 	at org.apache.hadoop.fs.s3.S3FileSystem.delete(S3FileSystem.java:303)
> 	at org.apache.hadoop.fs.s3.S3FileSystem.delete(S3FileSystem.java:312)
> 	at org.apache.hadoop.mapred.FileOutputCommitter.cleanupJob(FileOutputCommitter.java:64)
> 	at org.apache.hadoop.mapred.OutputCommitter.cleanupJob(OutputCommitter.java:135)
> 	at org.apache.hadoop.mapred.Task.runJobCleanupTask(Task.java:826)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:292)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
> {code}
> (similar errors are displayed for ReduceTask.run)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-6688) FileSystem.delete(...) implementations should not throw FileNotFoundException

Posted by "Danny Leshem (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-6688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Danny Leshem updated HADOOP-6688:
---------------------------------

    Priority: Minor  (was: Blocker)

> FileSystem.delete(...) implementations should not throw FileNotFoundException
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-6688
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6688
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs, fs/s3
>    Affects Versions: 0.20.2
>         Environment: Amazon EC2/S3
>            Reporter: Danny Leshem
>            Priority: Minor
>             Fix For: 0.20.3, 0.21.0, 0.22.0
>
>
> S3FileSystem.delete(Path path, boolean recursive) may fail and throw a FileNotFoundException if a directory is being deleted while at the same time some of its files are deleted in the background.
> This is definitely not the expected behavior of a delete method. If one of the to-be-deleted files is found missing, the method should not fail and simply continue. This is true for the general contract of FileSystem.delete, and also for its various implementations: RawLocalFileSystem (and specifically FileUtil.fullyDelete) exhibits the same problem.
> The fix is to silently catch and ignore FileNotFoundExceptions in delete loops. This can very easily be unit-tested, at least for RawLocalFileSystem.
> The reason this issue bothers me is that the cleanup part of a long (Mahout) MR job inconsistently fails for me, and I think this is the root problem. The log shows:
> {code}
> java.io.FileNotFoundException: s3://S3-BUCKET/tmp/0008E25BF7554CA9/2521362836721872/DistributedMatrix.times.outputVector/_temporary/_attempt_201004061215_0092_r_000002_0/part-00002: No such file or directory.
> 	at org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:334)
> 	at org.apache.hadoop.fs.s3.S3FileSystem.listStatus(S3FileSystem.java:193)
> 	at org.apache.hadoop.fs.s3.S3FileSystem.delete(S3FileSystem.java:303)
> 	at org.apache.hadoop.fs.s3.S3FileSystem.delete(S3FileSystem.java:312)
> 	at org.apache.hadoop.mapred.FileOutputCommitter.cleanupJob(FileOutputCommitter.java:64)
> 	at org.apache.hadoop.mapred.OutputCommitter.cleanupJob(OutputCommitter.java:135)
> 	at org.apache.hadoop.mapred.Task.runJobCleanupTask(Task.java:826)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:292)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
> {code}
> (similar errors are displayed for ReduceTask.run)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6688) FileSystem.delete(...) implementations should not throw FileNotFoundException

Posted by "Tom White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856131#action_12856131 ] 

Tom White commented on HADOOP-6688:
-----------------------------------

Could this be a consistency issue, like HADOOP-6208? 

I'm not sure a blanket ignore of FileNotFoundExceptions is quite right: if you call delete with a path that doesn't exist should it not throw FileNotFoundException? I can see that if a file in a subdirectory of the path being deleted doesn't exist then that should not result in a FileNotFoundException. You might want to check if the new FileContext API exhibits the same problem, so it can be considered for the contract there too.

I don't think this is a blocker. As a workaround you could create a wrapped FileOutputCommitter that catches and ignores FileNotFoundException.

> FileSystem.delete(...) implementations should not throw FileNotFoundException
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-6688
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6688
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs, fs/s3
>    Affects Versions: 0.20.2
>         Environment: Amazon EC2/S3
>            Reporter: Danny Leshem
>            Priority: Blocker
>             Fix For: 0.20.3, 0.21.0, 0.22.0
>
>
> S3FileSystem.delete(Path path, boolean recursive) may fail and throw a FileNotFoundException if a directory is being deleted while at the same time some of its files are deleted in the background.
> This is definitely not the expected behavior of a delete method. If one of the to-be-deleted files is found missing, the method should not fail and simply continue. This is true for the general contract of FileSystem.delete, and also for its various implementations: RawLocalFileSystem (and specifically FileUtil.fullyDelete) exhibits the same problem.
> The fix is to silently catch and ignore FileNotFoundExceptions in delete loops. This can very easily be unit-tested, at least for RawLocalFileSystem.
> The reason this issue bothers me is that the cleanup part of a long (Mahout) MR job inconsistently fails for me, and I think this is the root problem. The log shows:
> {code}
> java.io.FileNotFoundException: s3://S3-BUCKET/tmp/0008E25BF7554CA9/2521362836721872/DistributedMatrix.times.outputVector/_temporary/_attempt_201004061215_0092_r_000002_0/part-00002: No such file or directory.
> 	at org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:334)
> 	at org.apache.hadoop.fs.s3.S3FileSystem.listStatus(S3FileSystem.java:193)
> 	at org.apache.hadoop.fs.s3.S3FileSystem.delete(S3FileSystem.java:303)
> 	at org.apache.hadoop.fs.s3.S3FileSystem.delete(S3FileSystem.java:312)
> 	at org.apache.hadoop.mapred.FileOutputCommitter.cleanupJob(FileOutputCommitter.java:64)
> 	at org.apache.hadoop.mapred.OutputCommitter.cleanupJob(OutputCommitter.java:135)
> 	at org.apache.hadoop.mapred.Task.runJobCleanupTask(Task.java:826)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:292)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
> {code}
> (similar errors are displayed for ReduceTask.run)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-6688) FileSystem.delete(...) implementations should not throw FileNotFoundException

Posted by "Danny Leshem (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859341#action_12859341 ] 

Danny Leshem commented on HADOOP-6688:
--------------------------------------

While going over the code to suggest a patch, I found out this was partially fixed as part of HADOOP-6201: S3FileSystem.delete now catches FileNotFoundException and simply returns false, as opposed to propagating the exception as described in the issue.

The reason I'm calling this a partial fix is that such recursive directory delete will stop (returning false) the moment the above issue happens. The proper behavior, in my opinion, is to 1) continue deleting files 2) return false only if the top-most directory could not be found.

> FileSystem.delete(...) implementations should not throw FileNotFoundException
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-6688
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6688
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs, fs/s3
>    Affects Versions: 0.20.2
>         Environment: Amazon EC2/S3
>            Reporter: Danny Leshem
>            Priority: Blocker
>             Fix For: 0.20.3, 0.21.0, 0.22.0
>
>
> S3FileSystem.delete(Path path, boolean recursive) may fail and throw a FileNotFoundException if a directory is being deleted while at the same time some of its files are deleted in the background.
> This is definitely not the expected behavior of a delete method. If one of the to-be-deleted files is found missing, the method should not fail and simply continue. This is true for the general contract of FileSystem.delete, and also for its various implementations: RawLocalFileSystem (and specifically FileUtil.fullyDelete) exhibits the same problem.
> The fix is to silently catch and ignore FileNotFoundExceptions in delete loops. This can very easily be unit-tested, at least for RawLocalFileSystem.
> The reason this issue bothers me is that the cleanup part of a long (Mahout) MR job inconsistently fails for me, and I think this is the root problem. The log shows:
> {code}
> java.io.FileNotFoundException: s3://S3-BUCKET/tmp/0008E25BF7554CA9/2521362836721872/DistributedMatrix.times.outputVector/_temporary/_attempt_201004061215_0092_r_000002_0/part-00002: No such file or directory.
> 	at org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:334)
> 	at org.apache.hadoop.fs.s3.S3FileSystem.listStatus(S3FileSystem.java:193)
> 	at org.apache.hadoop.fs.s3.S3FileSystem.delete(S3FileSystem.java:303)
> 	at org.apache.hadoop.fs.s3.S3FileSystem.delete(S3FileSystem.java:312)
> 	at org.apache.hadoop.mapred.FileOutputCommitter.cleanupJob(FileOutputCommitter.java:64)
> 	at org.apache.hadoop.mapred.OutputCommitter.cleanupJob(OutputCommitter.java:135)
> 	at org.apache.hadoop.mapred.Task.runJobCleanupTask(Task.java:826)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:292)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
> {code}
> (similar errors are displayed for ReduceTask.run)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-6688) FileSystem.delete(...) implementations should not throw FileNotFoundException

Posted by "Tom White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-6688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856158#action_12856158 ] 

Tom White commented on HADOOP-6688:
-----------------------------------

I think this is related to the discussion in HADOOP-6631 about having a 'force' option for delete. I didn't mean to suggest that this issue shouldn't be fixed, I was merely pointing out a workaround that might solve your problem in the meantime, while the filesystem semantics are discussed in more detail.

> FileSystem.delete(...) implementations should not throw FileNotFoundException
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-6688
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6688
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs, fs/s3
>    Affects Versions: 0.20.2
>         Environment: Amazon EC2/S3
>            Reporter: Danny Leshem
>            Priority: Blocker
>             Fix For: 0.20.3, 0.21.0, 0.22.0
>
>
> S3FileSystem.delete(Path path, boolean recursive) may fail and throw a FileNotFoundException if a directory is being deleted while at the same time some of its files are deleted in the background.
> This is definitely not the expected behavior of a delete method. If one of the to-be-deleted files is found missing, the method should not fail and simply continue. This is true for the general contract of FileSystem.delete, and also for its various implementations: RawLocalFileSystem (and specifically FileUtil.fullyDelete) exhibits the same problem.
> The fix is to silently catch and ignore FileNotFoundExceptions in delete loops. This can very easily be unit-tested, at least for RawLocalFileSystem.
> The reason this issue bothers me is that the cleanup part of a long (Mahout) MR job inconsistently fails for me, and I think this is the root problem. The log shows:
> {code}
> java.io.FileNotFoundException: s3://S3-BUCKET/tmp/0008E25BF7554CA9/2521362836721872/DistributedMatrix.times.outputVector/_temporary/_attempt_201004061215_0092_r_000002_0/part-00002: No such file or directory.
> 	at org.apache.hadoop.fs.s3.S3FileSystem.getFileStatus(S3FileSystem.java:334)
> 	at org.apache.hadoop.fs.s3.S3FileSystem.listStatus(S3FileSystem.java:193)
> 	at org.apache.hadoop.fs.s3.S3FileSystem.delete(S3FileSystem.java:303)
> 	at org.apache.hadoop.fs.s3.S3FileSystem.delete(S3FileSystem.java:312)
> 	at org.apache.hadoop.mapred.FileOutputCommitter.cleanupJob(FileOutputCommitter.java:64)
> 	at org.apache.hadoop.mapred.OutputCommitter.cleanupJob(OutputCommitter.java:135)
> 	at org.apache.hadoop.mapred.Task.runJobCleanupTask(Task.java:826)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:292)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
> {code}
> (similar errors are displayed for ReduceTask.run)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira