You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by ka...@apache.org on 2016/10/26 18:30:50 UTC

[24/50] [abbrv] hadoop git commit: HADOOP-13309. Document S3A known limitations in file ownership and permission model. Contributed by Chris Nauroth.

HADOOP-13309. Document S3A known limitations in file ownership and permission model. Contributed by Chris Nauroth.


Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/309a4392
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/309a4392
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/309a4392

Branch: refs/heads/YARN-4752
Commit: 309a43925c078ff51cdb6bd1273e6f91f43311cb
Parents: dbd2057
Author: Chris Nauroth <cn...@apache.org>
Authored: Tue Oct 25 09:03:03 2016 -0700
Committer: Chris Nauroth <cn...@apache.org>
Committed: Tue Oct 25 09:03:03 2016 -0700

----------------------------------------------------------------------
 .../site/markdown/filesystem/introduction.md    | 15 +++++++++
 .../src/site/markdown/tools/hadoop-aws/index.md | 34 +++++++++++++++++---
 2 files changed, 44 insertions(+), 5 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hadoop/blob/309a4392/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/introduction.md
----------------------------------------------------------------------
diff --git a/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/introduction.md b/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/introduction.md
index 22da54c..194fa15 100644
--- a/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/introduction.md
+++ b/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/introduction.md
@@ -373,6 +373,21 @@ a time proportional to the quantity of data to upload, and inversely proportiona
 to the network bandwidth. It may also fail &mdash;a failure that is better
 escalated than ignored.
 
+1. **Authorization**. Hadoop uses the `FileStatus` class to
+represent core metadata of files and directories, including the owner, group and
+permissions.  Object stores might not have a viable way to persist this
+metadata, so they might need to populate `FileStatus` with stub values.  Even if
+the object store persists this metadata, it still might not be feasible for the
+object store to enforce file authorization in the same way as a traditional file
+system.  If the object store cannot persist this metadata, then the recommended
+convention is:
+    * File owner is reported as the current user.
+    * File group also is reported as the current user.
+    * Directory permissions are reported as 777.
+    * File permissions are reported as 666.
+    * File system APIs that set ownership and permissions execute successfully
+      without error, but they are no-ops.
+
 Object stores with these characteristics, can not be used as a direct replacement
 for HDFS. In terms of this specification, their implementations of the
 specified operations do not match those required. They are considered supported

http://git-wip-us.apache.org/repos/asf/hadoop/blob/309a4392/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
----------------------------------------------------------------------
diff --git a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
index c0d9157..0eb36ef 100644
--- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
+++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
@@ -39,7 +39,7 @@ higher performance.
 
 The specifics of using these filesystems are documented below.
 
-### Warning #1: Object Stores are not filesystems.
+### Warning #1: Object Stores are not filesystems
 
 Amazon S3 is an example of "an object store". In order to achieve scalability
 and especially high availability, S3 has \u2014as many other cloud object stores have
@@ -56,14 +56,38 @@ recursive file-by-file operations. They take time at least proportional to
 the number of files, during which time partial updates may be visible. If
 the operations are interrupted, the filesystem is left in an intermediate state.
 
-### Warning #2: Because Object stores don't track modification times of directories,
-features of Hadoop relying on this can have unexpected behaviour. E.g. the
+### Warning #2: Object stores don't track modification times of directories
+
+Features of Hadoop relying on this can have unexpected behaviour. E.g. the
 AggregatedLogDeletionService of YARN will not remove the appropriate logfiles.
 
 For further discussion on these topics, please consult
 [The Hadoop FileSystem API Definition](../../../hadoop-project-dist/hadoop-common/filesystem/index.html).
 
-### Warning #3: your AWS credentials are valuable
+### Warning #3: Object stores have differerent authorization models
+
+The object authorization model of S3 is much different from the file
+authorization model of HDFS and traditional file systems.  It is not feasible to
+persist file ownership and permissions in S3, so S3A reports stub information
+from APIs that would query this metadata:
+
+* File owner is reported as the current user.
+* File group also is reported as the current user.  Prior to Apache Hadoop
+2.8.0, file group was reported as empty (no group associated), which is a
+potential incompatibility problem for scripts that perform positional parsing of
+shell output and other clients that expect to find a well-defined group.
+* Directory permissions are reported as 777.
+* File permissions are reported as 666.
+
+S3A does not really enforce any authorization checks on these stub permissions.
+Users authenticate to an S3 bucket using AWS credentials.  It's possible that
+object ACLs have been defined to enforce authorization at the S3 side, but this
+happens entirely within the S3 service, not within the S3A implementation.
+
+For further discussion on these topics, please consult
+[The Hadoop FileSystem API Definition](../../../hadoop-project-dist/hadoop-common/filesystem/index.html).
+
+### Warning #4: Your AWS credentials are valuable
 
 Your AWS credentials not only pay for services, they offer read and write
 access to the data. Anyone with the credentials can not only read your datasets
@@ -78,7 +102,7 @@ Do not inadvertently share these credentials through means such as
 
 If you do any of these: change your credentials immediately!
 
-### Warning #4: the S3 client provided by Amazon EMR are not from the Apache
+### Warning #5: The S3 client provided by Amazon EMR are not from the Apache
 Software foundation, and are only supported by Amazon.
 
 Specifically: on Amazon EMR, s3a is not supported, and amazon recommend


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-commits-help@hadoop.apache.org