You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Jordan Mendelson (JIRA)" <ji...@apache.org> on 2014/03/11 02:59:42 UTC

[jira] [Created] (HADOOP-10400) Incorporate new S3A FileSystem implementation

Jordan Mendelson created HADOOP-10400:
-----------------------------------------

             Summary: Incorporate new S3A FileSystem implementation
                 Key: HADOOP-10400
                 URL: https://issues.apache.org/jira/browse/HADOOP-10400
             Project: Hadoop Common
          Issue Type: Improvement
          Components: fs
            Reporter: Jordan Mendelson


The s3native filesystem has a number of limitations (some of which were recently fixed by HADOOP-9454). This patch adds an s3a filesystem which uses the aws-sdk instead of the jets3t library. There are a number of improvements over s3native including:

- Parallel copy (rename) support (dramatically speeds up commits on large files)
- AWS S3 explorer compatible empty directories files "xyz/" instead of "xyz_$folder$" (reduces littering)
- Ignores s3native created _$folder$ files created by s3native and other S3 browsing utilities
- Supports multiple output buffer dirs to even out IO when uploading files
- Supports IAM role-based authentication
- Allows setting a default canned ACL for uploads (public, private, etc.)
- Better error recovery handling
- Should handle input seeks without having to download the whole file (used for splits a lot)

This code is a copy of https://github.com/Aloisius/hadoop-s3a with patches to various pom files to get it to build against trunk. I've been using 0.0.1 in production with CDH 4 for several months and CDH 5 for a few days. The version here is 0.0.2 which changes around some keys to hopefully bring the key name style more inline with the rest of hadoop 2.x.

It should be largely compatible with s3native except that it won't recognize s3native's empty directory marker files "*_$folder$" since it uses "folder/" like the Amazon's S3 explorer to denote empty directories.



--
This message was sent by Atlassian JIRA
(v6.2#6252)