You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2015/04/21 14:09:32 UTC

[Hadoop Wiki] Update of "AmazonS3" by SteveLoughran

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "AmazonS3" page has been changed by SteveLoughran:
https://wiki.apache.org/hadoop/AmazonS3?action=diff&rev1=18&rev2=19

Comment:
remove all content on configuring the S3 filesystems -point to the markdown docs on github instead.

  = History =
   * The S3 block filesystem was introduced in Hadoop 0.10.0 ([[http://issues.apache.org/jira/browse/HADOOP-574|HADOOP-574]]).
   * The S3 native filesystem was introduced in Hadoop 0.18.0 ([[http://issues.apache.org/jira/browse/HADOOP-930|HADOOP-930]]) and rename support was added in Hadoop 0.19.0 ([[https://issues.apache.org/jira/browse/HADOOP-3361|HADOOP-3361]]).
-  * The S3A filesystem was introduced in Hadoop 2.6.0. Some issues were found and fixed for later Hadoop versions[[https://issues.apache.org/jira/browse/HADOOP-11571|HADOOP-11571]], so Hadoop-2.6.0's support of s3a must be considered an incomplete replacement for the s3n FS.
+  * The S3A filesystem was introduced in Hadoop 2.6.0. Some issues were found and fixed for later Hadoop versions [[https://issues.apache.org/jira/browse/HADOOP-11571|HADOOP-11571]].
  
- = Why you cannot use S3 as a replacement for HDFS =
  
+ = Configuring and using the S3 filesystem support =
+ 
+ Consult the [[https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md|Latest Hadoop documentation]] for the specifics on using any of the S3 clients.
+ 
+ 
+ = Important: you cannot use S3 as a replacement for HDFS =
+ 
- You cannot use either of the S3 filesystem clients as a drop-in replacement for HDFS. Amazon S3 is an "object store" with
+ You cannot use any of the S3 filesystem clients as a drop-in replacement for HDFS. Amazon S3 is an "object store" with
   * eventual consistency: changes made by one application (creation, updates and deletions) will not be visible until some undefined time.
   * s3n and s3a: non-atomic rename and delete operations. Renaming or deleting large directories takes time proportional to the number of entries -and visible to other processes during this time, and indeed, until the eventual consistency has been resolved.
  
  S3 is not a filesystem. The Hadoop S3 filesystem bindings make it pretend to be a filesystem, but it is not. It can
  act as a source of data, and as a destination -though in the latter case, you must remember that the output may not be immediately visible.
- 
- == Configuring to use s3/ s3n filesystems ==
- 
- Edit your `core-site.xml` file to include your S3 keys
- 
- {{{
- 
- <property>
-   <name>fs.s3.awsAccessKeyId</name>
-   <value>ID</value>
- </property>
- 
- <property>
-   <name>fs.s3.awsSecretAccessKey</name>
-   <value>SECRET</value>
- </property>
- }}}
- 
- You can then use URLs to your bucket : ``s3n://MYBUCKET/``, or directories and files inside it.
- 
- {{{
- 
- s3n://BUCKET/
- s3n://BUCKET/dir
- s3n://BUCKET/dir/files.csv.tar.gz
- s3n://BUCKET/dir/*.gz
- 
- }}}
- 
- Alternatively, you can put the access key ID and the secret access key into a ''s3n'' (or ''s3'') URI as the user info:
- 
- {{{
-   s3n://ID:SECRET@BUCKET
- }}}
- 
- Note that since the secret
- access key can contain slashes, you must remember to escape them by replacing each slash `/` with the string `%2F`.
- Keys specified in the URI take precedence over any specified using the properties `fs.s3.awsAccessKeyId` and
- `fs.s3.awsSecretAccessKey`.
- 
- This option is less secure as the URLs are likely to appear in output logs and error messages, so being exposed to remote users.
  
  = Security =