You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "John Zhuge (JIRA)" <ji...@apache.org> on 2017/09/05 05:19:00 UTC

[jira] [Updated] (HADOOP-14808) Hadoop keychain

     [ https://issues.apache.org/jira/browse/HADOOP-14808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Zhuge updated HADOOP-14808:
--------------------------------
    Attachment: HADOOP-14808.001.patch

Patch 001
* Common code to load keychain from KeychainLoader services
* Add CredentialProviderKeychainLoader to load credentials from credential providers. The JCEKS stores can be encrypted.
* Add AzureCLIKeychainLoader to load credentials from Azure CLI token cache file. The probably should be committed in a separate JIRA.

Testing Done
* Preparation
** Add S3A credentials to an encrypted JCEKS store: /Users/jzhuge/.config/hadoop/keychain.jceks
** Run Azure CLI to login as an end user. For details on Azure CLI 2.0, refer to https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest.
* List an S3 bucket
* List an ADLS file system
* DistCp from S3 bucket to ADLS

{noformat}
$ hadoop credential list -provider localjceks://file/Users/jzhuge/.config/hadoop/keychain.jceks
Listing aliases for CredentialProvider: localjceks://file/Users/jzhuge/.config/hadoop/keychain.jceks
fs.s3a.secret.key
fs.s3a.access.key

$ az login -u user@microsoft.com
Password: 

$ hadoop fs -Dfs.adl.oauth2.access.token.provider.type=RefreshToken -ls adl://store.azuredatalakestore.net/
2017-09-04 21:41:41,355 INFO security.CredentialProviderKeychainLoader: loading credentials from localjceks://file/Users/jzhuge/.config/hadoop/keychain.jceks
2017-09-04 21:41:41,568 INFO security.CredentialProviderKeychainLoader: loading secret fs.s3a.secret.key
2017-09-04 21:41:41,573 INFO security.CredentialProviderKeychainLoader: loading secret fs.s3a.access.key
2017-09-04 21:41:41,576 INFO adl.AzureCLIKeychainLoader: loading the credential from Azure CLI /Users/jzhuge/.azure/accessTokens.json
2017-09-04 21:41:41,749 INFO adl.AzureCLIKeychainLoader: loading refresh token for bob@adcloudera.onmicrosoft.com
Found 12 items
drwxr-xr-x+  - a8005e54-3276-4b9f-b500-0cf2272a0634 6c4d58c1-4e75-40e1-b7a2-e97ff15c6f31          0 2017-07-26 23:47 adl://store.azuredatalakestore.net/dict
drwxr-xr-x+  - a8005e54-3276-4b9f-b500-0cf2272a0634 6c4d58c1-4e75-40e1-b7a2-e97ff15c6f31          0 2017-08-03 00:43 adl://store.azuredatalakestore.net/keychain_test
drwxr-xr-x+  - a0c43012-fd2a-42a3-90e9-0649584176c0 6c4d58c1-4e75-40e1-b7a2-e97ff15c6f31          0 2017-08-13 00:35 adl://store.azuredatalakestore.net/test
-rw-r--r--+  1 a0c43012-fd2a-42a3-90e9-0649584176c0 6c4d58c1-4e75-40e1-b7a2-e97ff15c6f31          0 2017-08-04 23:28 adl://store.azuredatalakestore.net/testRmRootRecursive

$ hadoop fs -ls s3a://bucket/
2017-09-04 21:41:52,947 INFO security.CredentialProviderKeychainLoader: loading credentials from localjceks://file/Users/jzhuge/.config/hadoop/keychain.jceks
2017-09-04 21:41:53,200 INFO security.CredentialProviderKeychainLoader: loading secret fs.s3a.secret.key
2017-09-04 21:41:53,204 INFO security.CredentialProviderKeychainLoader: loading secret fs.s3a.access.key
2017-09-04 21:41:53,207 INFO adl.AzureCLIKeychainLoader: loading the credential from Azure CLI /Users/jzhuge/.azure/accessTokens.json
2017-09-04 21:41:53,388 INFO adl.AzureCLIKeychainLoader: loading refresh token for bob@adcloudera.onmicrosoft.com
2017-09-04 21:41:55,033 INFO Configuration.deprecation: fs.s3a.server-side-encryption-key is deprecated. Instead, use fs.s3a.server-side-encryption.key
Found 9 items
drwxrwxrwx   - jzhuge jzhuge          0 2017-09-04 21:41 s3a://bucket/Users
drwxrwxrwx   - jzhuge jzhuge          0 2017-09-04 21:41 s3a://bucket/keychain_test
drwxrwxrwx   - jzhuge jzhuge          0 2017-09-04 21:41 s3a://bucket/test
drwxrwxrwx   - jzhuge jzhuge          0 2017-09-04 21:41 s3a://bucket/tg

$ hadoop distcp -Dfs.adl.oauth2.access.token.provider.type=RefreshToken s3a://bucket/tg adl://store.azuredatalakestore.net/tg.cp
2017-09-04 21:53:35,082 INFO security.CredentialProviderKeychainLoader: loading credentials from localjceks://file/Users/jzhuge/.config/hadoop/keychain.jceks
2017-09-04 21:53:35,210 INFO security.CredentialProviderKeychainLoader: loading secret fs.s3a.secret.key
2017-09-04 21:53:35,214 INFO security.CredentialProviderKeychainLoader: loading secret fs.s3a.access.key
2017-09-04 21:53:35,216 INFO adl.AzureCLIKeychainLoader: loading the credential from Azure CLI /Users/jzhuge/.azure/accessTokens.json
2017-09-04 21:53:35,380 INFO adl.AzureCLIKeychainLoader: loading refresh token for bob@adcloudera.onmicrosoft.com
2017-09-04 21:53:36,541 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, overwrite=false, append=false, useDiff=false, useRdiff=false, fromSnapshot=null, toSnapshot=null, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=0.0, copyStrategy='uniformsize', preserveStatus=[BLOCKSIZE], atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[s3a://bucket/tg], targetPath=adl://store.azuredatalakestore.net/tg.cp, filtersFile='null', blocksPerChunk=0, copyBufferSize=8192}, sourcePaths=[s3a://bucket/tg], targetPathExists=false, preserveRawXattrsfalse
...
{noformat}


> Hadoop keychain
> ---------------
>
>                 Key: HADOOP-14808
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14808
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: security
>    Affects Versions: 2.7.0
>            Reporter: John Zhuge
>            Assignee: John Zhuge
>         Attachments: HADOOP-14808.001.patch
>
>
> Extend the idea from HADOOP-6520 "UGI should load tokens from the environment" to a generic lightweight "keychain" design. Load keys (secrets) into a keychain in UGI (secret map) at startup. YARN will distribute them securely into each container. The Hadoop code running in the container can then retrieve the credentials from UGI.
> The use case is Bring Your Own Key (BYOK) credentials for cloud connectors (adl, wasb, s3a, etc.), while Hadoop authentication is still Kerberos. No configuration change, no admin involved. It will support YARN applications initially, e.g., DistCp, Tera Suite, Spark-on-Yarn, etc.
> Implementation is surprisingly simple because almost all pieces are in place:
> * Retrieve secrets from UGI using {{conf.getPassword}} backed by the existing Credential Provider class {{UserProvider}}
> * Reuse Credential Provider classes and interface to define local permanent or transient credential store, e.g., {{LocalJavaKeyStoreProvider}}
> * New: create a new transient Credential Provider that logs into AAD with username/password or device code, and then put the Client ID and Refresh Token into the keychain
> * New: create a new permanent Credential Provider based on Hadoop configuration XML, for dev/testing purpose.
> Links
> * HADOOP-11766 Generic token authentication support for Hadoop
> * HADOOP-11744 Support OAuth2 in Hadoop
> * HADOOP-10959 A Kerberos based token authentication approach
> * HADOOP-9392 Token based authentication and Single Sign On



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org