You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by st...@apache.org on 2017/02/24 13:26:08 UTC

hadoop git commit: HADOOP-14113. Review ADL Docs. Contributed by Steve Loughran

Repository: hadoop
Updated Branches:
  refs/heads/trunk 9c22a9166 -> e60c6543d


HADOOP-14113. Review ADL Docs. Contributed by Steve Loughran


Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/e60c6543
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/e60c6543
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/e60c6543

Branch: refs/heads/trunk
Commit: e60c6543d57611039b0438d5dcb4cb19ee239bb6
Parents: 9c22a91
Author: Steve Loughran <st...@apache.org>
Authored: Fri Feb 24 13:24:59 2017 +0000
Committer: Steve Loughran <st...@apache.org>
Committed: Fri Feb 24 13:24:59 2017 +0000

----------------------------------------------------------------------
 .../src/site/markdown/index.md                  | 237 ++++++++++---------
 1 file changed, 124 insertions(+), 113 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hadoop/blob/e60c6543/hadoop-tools/hadoop-azure-datalake/src/site/markdown/index.md
----------------------------------------------------------------------
diff --git a/hadoop-tools/hadoop-azure-datalake/src/site/markdown/index.md b/hadoop-tools/hadoop-azure-datalake/src/site/markdown/index.md
index 6d9e173..9355241 100644
--- a/hadoop-tools/hadoop-azure-datalake/src/site/markdown/index.md
+++ b/hadoop-tools/hadoop-azure-datalake/src/site/markdown/index.md
@@ -20,20 +20,20 @@
 * [Usage](#Usage)
     * [Concepts](#Concepts)
         * [OAuth2 Support](#OAuth2_Support)
-    * [Configuring Credentials & FileSystem](#Configuring_Credentials)
+    * [Configuring Credentials and FileSystem](#Configuring_Credentials)
         * [Using Refresh Token](#Refresh_Token)
         * [Using Client Keys](#Client_Credential_Token)
         * [Protecting the Credentials with Credential Providers](#Credential_Provider)
     * [Enabling ADL Filesystem](#Enabling_ADL)
-    * [Accessing adl URLs](#Accessing_adl_URLs)
+    * [Accessing `adl` URLs](#Accessing_adl_URLs)
     * [User/Group Representation](#OIDtoUPNConfiguration)
-* [Testing the hadoop-azure Module](#Testing_the_hadoop-azure_Module)
+* [Testing the `hadoop-azure` Module](#Testing_the_hadoop-azure_Module)
 
 ## <a name="Introduction" />Introduction
 
-The hadoop-azure-datalake module provides support for integration with
-[Azure Data Lake Store]( https://azure.microsoft.com/en-in/documentation/services/data-lake-store/).
-The jar file is named azure-datalake-store.jar.
+The `hadoop-azure-datalake` module provides support for integration with the
+[Azure Data Lake Store](https://azure.microsoft.com/en-in/documentation/services/data-lake-store/).
+This support comes via the JAR file `azure-datalake-store.jar`.
 
 ## <a name="Features" />Features
 
@@ -43,13 +43,14 @@ The jar file is named azure-datalake-store.jar.
 * Can act as a source of data in a MapReduce job, or a sink.
 * Tested on both Linux and Windows.
 * Tested for scale.
-* API setOwner/setAcl/removeAclEntries/modifyAclEntries accepts UPN or OID
-  (Object ID) as user and group name.
+* API `setOwner()`, `setAcl`, `removeAclEntries()`, `modifyAclEntries()` accepts UPN or OID
+  (Object ID) as user and group names.
 
 ## <a name="Limitations" />Limitations
+
 Partial or no support for the following operations :
 
-* Operation on Symbolic Link
+* Operation on Symbolic Links
 * Proxy Users
 * File Truncate
 * File Checksum
@@ -58,55 +59,71 @@ Partial or no support for the following operations :
 * Extended Attributes(XAttrs) Operations
 * Snapshot Operations
 * Delegation Token Operations
-* User and group information returned as ListStatus and GetFileStatus is in form of GUID associated in Azure Active Directory.
+* User and group information returned as `listStatus()` and `getFileStatus()` is
+in the form of the GUID associated in Azure Active Directory.
 
 ## <a name="Usage" />Usage
 
 ### <a name="Concepts" />Concepts
-Azure Data Lake Storage access path syntax is
+Azure Data Lake Storage access path syntax is:
+
+```
+adl://<Account Name>.azuredatalakestore.net/
+```
 
-    adl://<Account Name>.azuredatalakestore.net/
+For details on using the store, see
+[**Get started with Azure Data Lake Store using the Azure Portal**](https://azure.microsoft.com/en-in/documentation/articles/data-lake-store-get-started-portal/)
 
-Get started with azure data lake account with [https://azure.microsoft.com/en-in/documentation/articles/data-lake-store-get-started-portal/](https://azure.microsoft.com/en-in/documentation/articles/data-lake-store-get-started-portal/)
+### <a name="#OAuth2_Support" />OAuth2 Support
 
-#### <a name="#OAuth2_Support" />OAuth2 Support
-Usage of Azure Data Lake Storage requires OAuth2 bearer token to be present as part of the HTTPS header as per OAuth2 specification. Valid OAuth2 bearer token should be obtained from Azure Active Directory for valid users who have  access to Azure Data Lake Storage Account.
+Usage of Azure Data Lake Storage requires an OAuth2 bearer token to be present as
+part of the HTTPS header as per the OAuth2 specification.
+A valid OAuth2 bearer token must be obtained from the Azure Active Directory service
+for those valid users who have access to Azure Data Lake Storage Account.
 
-Azure Active Directory (Azure AD) is Microsoft's multi-tenant cloud based directory and identity management service. See [https://azure.microsoft.com/en-in/documentation/articles/active-directory-whatis/](https://azure.microsoft.com/en-in/documentation/articles/active-directory-whatis/)
+Azure Active Directory (Azure AD) is Microsoft's multi-tenant cloud based directory
+and identity management service. See [*What is ActiveDirectory*](https://azure.microsoft.com/en-in/documentation/articles/active-directory-whatis/).
 
-Following sections describes on OAuth2 configuration in core-site.xml.
+Following sections describes theOAuth2 configuration in `core-site.xml`.
 
-## <a name="Configuring_Credentials" />Configuring Credentials & FileSystem
-Credentials can be configured using either a refresh token (associated with a user) or a client credential (analogous to a service principal).
+#### <a name="Configuring_Credentials" />Configuring Credentials & FileSystem
+Credentials can be configured using either a refresh token (associated with a user),
+or a client credential (analogous to a service principal).
 
-### <a name="Refresh_Token" />Using Refresh Token
+#### <a name="Refresh_Token" />Using Refresh Tokens
 
-Add the following properties to your core-site.xml
+Add the following properties to the cluster's `core-site.xml`
 
-        <property>
-            <name>dfs.adls.oauth2.access.token.provider.type</name>
-            <value>RefreshToken</value>
-        </property>
+```xml
+<property>
+  <name>dfs.adls.oauth2.access.token.provider.type</name>
+  <value>RefreshToken</value>
+</property>
+```
 
-Application require to set Client id and OAuth2 refresh token from Azure Active Directory associated with client id. See [https://github.com/AzureAD/azure-activedirectory-library-for-java](https://github.com/AzureAD/azure-activedirectory-library-for-java).
+Applications must set the Client id and OAuth2 refresh token from the Azure Active Directory
+service associated with the client id. See [*Active Directory Library For Java*](https://github.com/AzureAD/azure-activedirectory-library-for-java).
 
 **Do not share client id and refresh token, it must be kept secret.**
 
-        <property>
-            <name>dfs.adls.oauth2.client.id</name>
-            <value></value>
-        </property>
+```xml
+<property>
+  <name>dfs.adls.oauth2.client.id</name>
+  <value></value>
+</property>
 
-        <property>
-            <name>dfs.adls.oauth2.refresh.token</name>
-            <value></value>
-        </property>
+<property>
+  <name>dfs.adls.oauth2.refresh.token</name>
+  <value></value>
+</property>
+```
 
 
 ### <a name="Client_Credential_Token" />Using Client Keys
 
 #### Generating the Service Principal
-1.  Go to the portal (https://portal.azure.com)
+
+1.  Go to [the portal](https://portal.azure.com)
 2.  Under "Browse", look for Active Directory and click on it.
 3.  Create "Web Application". Remember the name you create here - that is what you will add to your ADL account as authorized user.
 4.  Go through the wizard
@@ -124,31 +141,31 @@ Application require to set Client id and OAuth2 refresh token from Azure Active
 3.  Add your user name you created in Step 6 above (note that it does not show up in the list, but will be found if you searched for the name)
 4.  Add "Owner" role
 
-#### Configure core-site.xml
-Add the following properties to your core-site.xml
-
-    <property>
-      <name>dfs.adls.oauth2.refresh.url</name>
-      <value>TOKEN ENDPOINT FROM STEP 7 ABOVE</value>
-    </property>
+### Configure core-site.xml
+Add the following properties to your `core-site.xml`
 
-    <property>
-      <name>dfs.adls.oauth2.client.id</name>
-      <value>CLIENT ID FROM STEP 7 ABOVE</value>
-    </property>
+```xml
+<property>
+  <name>dfs.adls.oauth2.refresh.url</name>
+  <value>TOKEN ENDPOINT FROM STEP 7 ABOVE</value>
+</property>
 
-    <property>
-      <name>dfs.adls.oauth2.credential</name>
-      <value>PASSWORD FROM STEP 7 ABOVE</value>
-    </property>
+<property>
+  <name>dfs.adls.oauth2.client.id</name>
+  <value>CLIENT ID FROM STEP 7 ABOVE</value>
+</property>
 
+<property>
+  <name>dfs.adls.oauth2.credential</name>
+  <value>PASSWORD FROM STEP 7 ABOVE</value>
+</property>
+```
 
 ### <a name="Credential_Provider" />Protecting the Credentials with Credential Providers
 
-In many Hadoop clusters, the core-site.xml file is world-readable. To protect
-these credentials from prying eyes, it is recommended that you use the
-credential provider framework to securely store them and access them through
-configuration.
+In many Hadoop clusters, the `core-site.xml` file is world-readable. To protect
+these credentials, it is recommended that you use the
+credential provider framework to securely store them and access them.
 
 All ADLS credential properties can be protected by credential providers.
 For additional reading on the credential provider API, see
@@ -156,16 +173,16 @@ For additional reading on the credential provider API, see
 
 #### Provisioning
 
-```
-% hadoop credential create dfs.adls.oauth2.refresh.token -value 123
+```bash
+hadoop credential create dfs.adls.oauth2.refresh.token -value 123
     -provider localjceks://file/home/foo/adls.jceks
-% hadoop credential create dfs.adls.oauth2.credential -value 123
+hadoop credential create dfs.adls.oauth2.credential -value 123
     -provider localjceks://file/home/foo/adls.jceks
 ```
 
 #### Configuring core-site.xml or command line property
 
-```
+```xml
 <property>
   <name>hadoop.security.credential.provider.path</name>
   <value>localjceks://file/home/foo/adls.jceks</value>
@@ -175,42 +192,28 @@ For additional reading on the credential provider API, see
 
 #### Running DistCp
 
-```
-% hadoop distcp
+```bash
+hadoop distcp
     [-D hadoop.security.credential.provider.path=localjceks://file/home/user/adls.jceks]
     hdfs://<NameNode Hostname>:9001/user/foo/007020615
     adl://<Account Name>.azuredatalakestore.net/testDir/
 ```
 
-NOTE: You may optionally add the provider path property to the distcp command
-line instead of added job specific configuration to a generic core-site.xml.
-The square brackets above illustrate this capability.
-
-
-## <a name="Enabling_ADL" />Enabling ADL Filesystem
-
-For ADL FileSystem to take effect. Update core-site.xml with
-
-        <property>
-            <name>fs.adl.impl</name>
-            <value>org.apache.hadoop.fs.adl.AdlFileSystem</value>
-        </property>
-
-        <property>
-            <name>fs.AbstractFileSystem.adl.impl</name>
-            <value>org.apache.hadoop.fs.adl.Adl</value>
-        </property>
-
+NOTE: You may optionally add the provider path property to the `distcp` command
+line instead of added job specific configuration to a generic `core-site.xml`.
+The square brackets above illustrate this capability.`
 
 ### <a name="Accessing_adl_URLs" />Accessing adl URLs
 
-After credentials are configured in core-site.xml, any Hadoop component may
+After credentials are configured in `core-site.xml`, any Hadoop component may
 reference files in that Azure Data Lake Storage account by using URLs of the following
 format:
 
-    adl://<Account Name>.azuredatalakestore.net/<path>
+```
+adl://<Account Name>.azuredatalakestore.net/<path>
+```
 
-The schemes `adl` identify a URL on a file system backed by Azure
+The schemes `adl` identifies a URL on a Hadoop-compatible file system backed by Azure
 Data Lake Storage.  `adl` utilizes encrypted HTTPS access for all interaction with
 the Azure Data Lake Storage API.
 
@@ -218,48 +221,56 @@ For example, the following
 [FileSystem Shell](../hadoop-project-dist/hadoop-common/FileSystemShell.html)
 commands demonstrate access to a storage account named `youraccount`.
 
-    > hadoop fs -mkdir adl://yourcontainer.azuredatalakestore.net/testDir
 
-    > hadoop fs -put testFile adl://yourcontainer.azuredatalakestore.net/testDir/testFile
+```bash
+hadoop fs -mkdir adl://yourcontainer.azuredatalakestore.net/testDir
 
-    > hadoop fs -cat adl://yourcontainer.azuredatalakestore.net/testDir/testFile
-    test file content
+hadoop fs -put testFile adl://yourcontainer.azuredatalakestore.net/testDir/testFile
 
+hadoop fs -cat adl://yourcontainer.azuredatalakestore.net/testDir/testFile
+test file content
+```
 ### <a name="OIDtoUPNConfiguration" />User/Group Representation
-The hadoop-azure-datalake module provides support for configuring how
-User/Group information is represented during
-getFileStatus/listStatus/getAclStatus.
 
-Add the following properties to your core-site.xml
-
-        <property>
-          <name>adl.feature.ownerandgroup.enableupn</name>
-          <value>true</value>
-          <description>
-            When true : User and Group in FileStatus/AclStatus response is
-            represented as user friendly name as per Azure AD profile.
-
-            When false (default) : User and Group in FileStatus/AclStatus
-            response is represented by the unique identifier from Azure AD
-            profile (Object ID as GUID).
+The `hadoop-azure-datalake` module provides support for configuring how
+User/Group information is represented during
+`getFileStatus()`, `listStatus()`,  and `getAclStatus()` calls..
 
-            For performance optimization, Recommended default value.
-          </description>
-        </property>
+Add the following properties to `core-site.xml`
 
+```xml
+<property>
+  <name>adl.feature.ownerandgroup.enableupn</name>
+  <value>true</value>
+  <description>
+    When true : User and Group in FileStatus/AclStatus response is
+    represented as user friendly name as per Azure AD profile.
+
+    When false (default) : User and Group in FileStatus/AclStatus
+    response is represented by the unique identifier from Azure AD
+    profile (Object ID as GUID).
+
+    For performance optimization, Recommended default value.
+  </description>
+</property>
+```
 ## <a name="Testing_the_hadoop-azure_Module" />Testing the azure-datalake-store Module
-The hadoop-azure module includes a full suite of unit tests. Most of the tests will run without additional configuration by running mvn test. This includes tests against mocked storage, which is an in-memory emulation of Azure Data Lake Storage.
+The `hadoop-azure` module includes a full suite of unit tests.
+Most of the tests will run without additional configuration by running mvn test.
+This includes tests against mocked storage, which is an in-memory emulation of Azure Data Lake Storage.
 
 A selection of tests can run against the Azure Data Lake Storage. To run these
 tests, please create `src/test/resources/auth-keys.xml` with Adl account
 information mentioned in the above sections and the following properties.
 
-        <property>
-            <name>dfs.adl.test.contract.enable</name>
-            <value>true</value>
-        </property>
+```xml
+<property>
+    <name>dfs.adl.test.contract.enable</name>
+    <value>true</value>
+</property>
 
-        <property>
-            <name>test.fs.adl.name</name>
-            <value>adl://yourcontainer.azuredatalakestore.net</value>
-        </property>
+<property>
+    <name>test.fs.adl.name</name>
+    <value>adl://yourcontainer.azuredatalakestore.net</value>
+</property>
+```


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-commits-help@hadoop.apache.org