You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Ian Nowland (JIRA)" <ji...@apache.org> on 2009/05/14 20:16:45 UTC

[jira] Commented: (HADOOP-5836) Bug in S3N handling of directory markers using an object with a trailing "/" causes jobs to fail

    [ https://issues.apache.org/jira/browse/HADOOP-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12709509#action_12709509 ] 

Ian Nowland commented on HADOOP-5836:
-------------------------------------

The main fix here is to check for and just not return this empty file in listStatus(). However along with this, I broadened handling in all S3N methods for the different ways of designating directories in S3, in this way:
 
* A note about directories. S3 of course has no "native" support for them.
 * The idiom we choose then is: for any directory created by this class,
 * we use an empty object "#{dirpath}_$folder$" as a marker.
 * Further, to interoperate with other S3 tools, we also accept the following:
 * - an object "#{dirpath}/' denoting a directory marker
 * - if there exists any objects with the prefix "#{dirpath}/", then the
 *   directory is said to exist
 * - if both a file with the name of a directory and a marker for that
 *   directory exists, then the *file masks the directory*, and the directory
 *   is never returned.
 
In particular this meant fixing delete() and rename() to handle all three possible meanings of directory without failing.
 
This patch also includes the following:
-          Add logging any time a file in S3 is accessed for read or write, so when you get failure accessing/using a file its name will be in the task log
-         Fix when opening a file for reading which doesn't exist, change the behavior to immediately throw a FileNotFoundException, rather than returning a hard to debug NPE later when the file is closed.
-          Rewrite rename so that it only deletes the source files after every destination file has been written, so you never end up with half the files in each location
-         Set up retryer so rename automatically retries on S3 errors.


> Bug in S3N handling of directory markers using an object with a trailing "/" causes jobs to fail
> ------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5836
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5836
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 0.18.3
>            Reporter: Ian Nowland
>
> Some tools which upload to S3 and use a object terminated with a "/" as a directory marker, for instance "s3n://mybucket/mydir/". If asked to iterate that "directory" via listStatus(), then the current code will return an empty file "", which the InputFormatter happily assigns to a split, and which later causes a task to fail, and probably the job to fail. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.