You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (Jira)" <ji...@apache.org> on 2022/02/15 18:42:00 UTC

[jira] [Comment Edited] (HADOOP-18073) Upgrade AWS SDK to v2

    [ https://issues.apache.org/jira/browse/HADOOP-18073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17471377#comment-17471377 ] 

Steve Loughran edited comment on HADOOP-18073 at 2/15/22, 6:41 PM:
-------------------------------------------------------------------

that is going to be a pretty traumatic update. Currently we are just moving to 1.12 in HADOOP-18068.

I believe the API is radically different. One concern there is that it drops the transfer manager which we used for copy/rename and uploading from the local FS. I see there is now a preview an implementation of that... If it does not include any regressions then it should be possible to use. Otherwise someone is going to have to implement in the S3a code the parallelize block upload/ copy. 

I'm not going to volunteer for this. If you want to contribute it -it is certainly something which ultimately we would like. 

In the meantime, S3A does take session credentials. If you can use the SSO mechanism and the AWS CLI to generate a set then you set the relevant properties (ideally in a JCEKs file) and use them for the life of the credentials. You will be able to use the session delegation tokens to propagate those secrets from your machine to the cluster -so deploy hey cluster in EC2 with lower privileges than the users. You also have the option of providing your own AWS credential provider and delegation token implementation. FWIW some of the cloudera products do exactly this to let someone to go from kerberos auth to session credentials for their assigned roles.


was (Author: stevel@apache.org):

that is going to be a pretty traumatic update. Currently we are just moving to 1.12 in HADOOP-18068.

I believe the API is radically different. One concern there is that it drops the transfer manager which we used for copy stroke rename and uploading from the local FS. I see there is now a preview an implementation of that... If it does not include any regressions then it should be possible to use. Otherwise someone is going to have to implement in the S3a code the parallelize block upload/ copy. 

I'm not going to volunteer for this. If you want to contribute it -it is certainly something which ultimately we would like. 

In the meantime, S3A does take session credentials. If you can use the SSO mechanism and the AWS CLI to generate a set then you set the relevant properties (ideally in a JCEKs file) and use them for the life of the credentials. You will be able to use the session delegation tokens to propagate those secrets from your machine to the cluster -so deploy hey cluster in EC2 with lower privileges than the users. You also have the option of providing your own AWS credential provider and delegation token implementation. FWIW some of the cloudera products do exactly this to let someone to go from kerberos auth to session credentials for their assigned roles.

> Upgrade AWS SDK to v2
> ---------------------
>
>                 Key: HADOOP-18073
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18073
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: auth, fs/s3
>    Affects Versions: 3.3.1
>            Reporter: xiaowei sun
>            Priority: Major
>
> We would like to access s3 with AWS SSO, which is supported in software.amazon.awssdk:sdk-core:2.*. 
> In particular, from [https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html], when to set 'fs.s3a.aws.credentials.provider', it must be "com.amazonaws.auth.AWSCredentialsProvider". We would like to support "software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider" which supports AWS SSO, so users only need to authenticate once.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org