You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by da...@apache.org on 2020/05/19 03:46:06 UTC

[hadoop] branch trunk updated: HADOOP-17004. ABFS: Improve the ABFS driver documentation

This is an automated email from the ASF dual-hosted git repository.

dazhou pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/hadoop.git


The following commit(s) were added to refs/heads/trunk by this push:
     new bdbd59c  HADOOP-17004. ABFS: Improve the ABFS driver documentation
bdbd59c is described below

commit bdbd59cfa0904860fc4ce7a2afef1e84f35b8b82
Author: bilaharith <52...@users.noreply.github.com>
AuthorDate: Tue May 19 09:15:54 2020 +0530

    HADOOP-17004. ABFS: Improve the ABFS driver documentation
    
    Contributed by Bilahari T H.
---
 .../hadoop-azure/src/site/markdown/abfs.md         | 133 ++++++++++++++++++++-
 1 file changed, 130 insertions(+), 3 deletions(-)

diff --git a/hadoop-tools/hadoop-azure/src/site/markdown/abfs.md b/hadoop-tools/hadoop-azure/src/site/markdown/abfs.md
index 89f52e7..93141f1 100644
--- a/hadoop-tools/hadoop-azure/src/site/markdown/abfs.md
+++ b/hadoop-tools/hadoop-azure/src/site/markdown/abfs.md
@@ -257,7 +257,8 @@ will have the URL `abfs://container1@abfswales1.dfs.core.windows.net/`
 
 
 You can create a new container through the ABFS connector, by setting the option
- `fs.azure.createRemoteFileSystemDuringInitialization` to `true`.
+ `fs.azure.createRemoteFileSystemDuringInitialization` to `true`. Though the
+  same is not supported when AuthType is SAS.
 
 If the container does not exist, an attempt to list it with `hadoop fs -ls`
 will fail
@@ -317,8 +318,13 @@ driven by them.
 
 What can be changed is what secrets/credentials are used to authenticate the caller.
 
-The authentication mechanism is set in `fs.azure.account.auth.type` (or the account specific variant),
-and, for the various OAuth options `fs.azure.account.oauth.provider.type`
+The authentication mechanism is set in `fs.azure.account.auth.type` (or the
+account specific variant). The possible values are SharedKey, OAuth, Custom
+and SAS. For the various OAuth options use the config `fs.azure.account
+.oauth.provider.type`. Following are the implementations supported
+ClientCredsTokenProvider, UserPasswordTokenProvider, MsiTokenProvider and
+RefreshTokenBasedTokenProvider. An IllegalArgumentException is thrown if
+the specified provider type is not one of the supported.
 
 All secrets can be stored in JCEKS files. These are encrypted and password
 protected —use them or a compatible Hadoop Key Management Store wherever
@@ -350,6 +356,15 @@ the password, "key", retrieved from the XML/JCECKs configuration files.
 *Note*: The source of the account key can be changed through a custom key provider;
 one exists to execute a shell script to retrieve it.
 
+A custom key provider class can be provided with the config
+`fs.azure.account.keyprovider`. If a key provider class is specified the same
+will be used to get account key. Otherwise the Simple key provider will be used
+which will use the key specified for the config `fs.azure.account.key`.
+
+To retrieve using shell script, specify the path to the script for the config
+`fs.azure.shellkeyprovider.script`. ShellDecryptionKeyProvider class use the
+script specified to retrieve the key.
+
 ### <a name="oauth-client-credentials"></a> OAuth 2.0 Client Credentials
 
 OAuth 2.0 credentials of (client id, client secret, endpoint) are provided in the configuration/JCEKS file.
@@ -466,6 +481,13 @@ With an existing Oauth 2.0 token, make a request of the Active Directory endpoin
   </description>
 </property>
 <property>
+  <name>fs.azure.account.oauth2.refresh.endpoint</name>
+  <value></value>
+  <description>
+  Refresh token endpoint
+  </description>
+</property>
+<property>
   <name>fs.azure.account.oauth2.client.id</name>
   <value></value>
   <description>
@@ -507,6 +529,13 @@ The Azure Portal/CLI is used to create the service identity.
   </description>
 </property>
 <property>
+  <name>fs.azure.account.oauth2.msi.endpoint</name>
+  <value></value>
+  <description>
+   MSI endpoint
+  </description>
+</property>
+<property>
   <name>fs.azure.account.oauth2.client.id</name>
   <value></value>
   <description>
@@ -542,6 +571,26 @@ and optionally `org.apache.hadoop.fs.azurebfs.extensions.BoundDTExtension`.
 
 The declared class also holds responsibility to implement retry logic while fetching access tokens.
 
+### <a name="delegationtokensupportconfigoptions"></a> Delegation Token Provider
+
+A delegation token provider supplies the ABFS connector with delegation tokens,
+helps renew and cancel the tokens by implementing the
+CustomDelegationTokenManager interface.
+
+```xml
+<property>
+  <name>fs.azure.enable.delegation.token</name>
+  <value>true</value>
+  <description>Make this true to use delegation token provider</description>
+</property>
+<property>
+  <name>fs.azure.delegation.token.provider.type</name>
+  <value>{fully-qualified-class-name-for-implementation-of-CustomDelegationTokenManager-interface}</value>
+</property>
+```
+In case delegation token is enabled, and the config `fs.azure.delegation.token
+.provider.type` is not provided then an IlleagalArgumentException is thrown.
+
 ### Shared Access Signature (SAS) Token Provider
 
 A Shared Access Signature (SAS) token provider supplies the ABFS connector with SAS
@@ -691,6 +740,84 @@ Config `fs.azure.account.hns.enabled` provides an option to specify whether
 Config `fs.azure.enable.check.access` needs to be set true to enable
  the AzureBlobFileSystem.access().
 
+### <a name="featureconfigoptions"></a> Primary User Group Options
+The group name which is part of FileStatus and AclStatus will be set the same as
+the username if the following config is set to true
+`fs.azure.skipUserGroupMetadataDuringInitialization`.
+
+### <a name="ioconfigoptions"></a> IO Options
+The following configs are related to read and write operations.
+
+`fs.azure.io.retry.max.retries`: Sets the number of retries for IO operations.
+Currently this is used only for the server call retry logic. Used within
+AbfsClient class as part of the ExponentialRetryPolicy. The value should be
+>= 0.
+
+`fs.azure.write.request.size`: To set the write buffer size. Specify the value
+in bytes. The value should be between 16384 to 104857600 both inclusive (16 KB
+to 100 MB). The default value will be 8388608 (8 MB).
+
+`fs.azure.read.request.size`: To set the read buffer size.Specify the value in
+bytes. The value should be between 16384 to 104857600 both inclusive (16 KB to
+100 MB). The default value will be 4194304 (4 MB).
+
+`fs.azure.readaheadqueue.depth`: Sets the readahead queue depth in
+AbfsInputStream. In case the set value is negative the read ahead queue depth
+will be set as Runtime.getRuntime().availableProcessors(). By default the value
+will be -1.
+
+### <a name="securityconfigoptions"></a> Security Options
+`fs.azure.always.use.https`: Enforces to use HTTPS instead of HTTP when the flag
+is made true. Irrespective of the flag, AbfsClient will use HTTPS if the secure
+scheme (ABFSS) is used or OAuth is used for authentication. By default this will
+be set to true.
+
+`fs.azure.ssl.channel.mode`: Initializing DelegatingSSLSocketFactory with the
+specified SSL channel mode. Value should be of the enum
+DelegatingSSLSocketFactory.SSLChannelMode. The default value will be
+DelegatingSSLSocketFactory.SSLChannelMode.Default.
+
+### <a name="serverconfigoptions"></a> Server Options
+When the config `fs.azure.io.read.tolerate.concurrent.append` is made true, the
+If-Match header sent to the server for read calls will be set as * otherwise the
+same will be set with ETag. This is basically a mechanism in place to handle the
+reads with optimistic concurrency.
+Please refer the following links for further information.
+1. https://docs.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/read
+2. https://azure.microsoft.com/de-de/blog/managing-concurrency-in-microsoft-azure-storage-2/
+
+listStatus API fetches the FileStatus information from server in a page by page
+manner. The config `fs.azure.list.max.results` used to set the maxResults URI
+ param which sets the pagesize(maximum results per call). The value should
+ be >  0. By default this will be 500. Server has a maximum value for this
+ parameter as 5000. So even if the config is above 5000 the response will only
+contain 5000 entries. Please refer the following link for further information.
+https://docs.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/path/list
+
+### <a name="throttlingconfigoptions"></a> Throttling Options
+ABFS driver has the capability to throttle read and write operations to achieve
+maximum throughput by minimizing errors. The errors occur when the account
+ingress or egress limits are exceeded and, the server-side throttles requests.
+Server-side throttling causes the retry policy to be used, but the retry policy
+sleeps for long periods of time causing the total ingress or egress throughput
+to be as much as 35% lower than optimal. The retry policy is also after the
+fact, in that it applies after a request fails. On the other hand, the
+client-side throttling implemented here happens before requests are made and
+sleeps just enough to minimize errors, allowing optimal ingress and/or egress
+throughput. By default the throttling mechanism is enabled in the driver. The
+same can be disabled by setting the config `fs.azure.enable.autothrottling`
+to false.
+
+### <a name="renameconfigoptions"></a> Rename Options
+`fs.azure.atomic.rename.key`: Directories for atomic rename support can be
+specified comma separated in this config. The driver prints the following
+warning log if the source of the rename belongs to one of the configured
+directories. "The atomic rename feature is not supported by the ABFS scheme
+; however, rename, create and delete operations are atomic if Namespace is
+enabled for your Azure Storage account."
+The directories can be specified as comma separated values. By default the value
+is "/hbase"
+
 ### <a name="perfoptions"></a> Perf Options
 
 #### <a name="abfstracklatencyoptions"></a> 1. HTTP Request Tracking Options


---------------------------------------------------------------------
To unsubscribe, e-mail: common-commits-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-commits-help@hadoop.apache.org