You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Ian Nowland (JIRA)" <ji...@apache.org> on 2009/05/14 20:16:45 UTC
[jira] Commented: (HADOOP-5836) Bug in S3N handling of directory
markers using an object with a trailing "/" causes jobs to fail
[ https://issues.apache.org/jira/browse/HADOOP-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12709509#action_12709509 ]
Ian Nowland commented on HADOOP-5836:
-------------------------------------
The main fix here is to check for and just not return this empty file in listStatus(). However along with this, I broadened handling in all S3N methods for the different ways of designating directories in S3, in this way:
* A note about directories. S3 of course has no "native" support for them.
* The idiom we choose then is: for any directory created by this class,
* we use an empty object "#{dirpath}_$folder$" as a marker.
* Further, to interoperate with other S3 tools, we also accept the following:
* - an object "#{dirpath}/' denoting a directory marker
* - if there exists any objects with the prefix "#{dirpath}/", then the
* directory is said to exist
* - if both a file with the name of a directory and a marker for that
* directory exists, then the *file masks the directory*, and the directory
* is never returned.
In particular this meant fixing delete() and rename() to handle all three possible meanings of directory without failing.
This patch also includes the following:
- Add logging any time a file in S3 is accessed for read or write, so when you get failure accessing/using a file its name will be in the task log
- Fix when opening a file for reading which doesn't exist, change the behavior to immediately throw a FileNotFoundException, rather than returning a hard to debug NPE later when the file is closed.
- Rewrite rename so that it only deletes the source files after every destination file has been written, so you never end up with half the files in each location
- Set up retryer so rename automatically retries on S3 errors.
> Bug in S3N handling of directory markers using an object with a trailing "/" causes jobs to fail
> ------------------------------------------------------------------------------------------------
>
> Key: HADOOP-5836
> URL: https://issues.apache.org/jira/browse/HADOOP-5836
> Project: Hadoop Core
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 0.18.3
> Reporter: Ian Nowland
>
> Some tools which upload to S3 and use a object terminated with a "/" as a directory marker, for instance "s3n://mybucket/mydir/". If asked to iterate that "directory" via listStatus(), then the current code will return an empty file "", which the InputFormatter happily assigns to a split, and which later causes a task to fail, and probably the job to fail.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.