You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2017/11/01 14:42:00 UTC

[jira] [Commented] (HADOOP-13887) Support for client-side encryption in S3A file system

    [ https://issues.apache.org/jira/browse/HADOOP-13887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234156#comment-16234156 ] 

Steve Loughran commented on HADOOP-13887:
-----------------------------------------

This initial patch was just about turning client-side encryption on. Doing that makes for data whose EOF may be slightly less than len(block) which will break all client code which navigates off EOF, assumes the length of the data is the amount it can copy, etc. etc. And if you lose the key, you are on your own.

At the same time, I can see the appeal of some form of support for this purely for some backup/restore process, e.g. for encrypting data before -> glacier, decrypting it as part of a copy. I think that can/should be done outside the s3a lib you can never reliably use client-side encrypted S3 data as a source in any MR, Hive, Tez, Spark &c operation. People will end up encrypting their data, then be filing bugs/support calls trying to understand why their queries are all failing.

*Proposed*: change title of JIRA to "Encrypt S3A data client-side with AWS SDK", to make clear goal, then close as a wontfix with a clear explanation. It's not that we can't take on code that Igor has done, it's that the assumption that EOF=Len(file) is so fundamental, we can't give it to downstream code and expect them to handle it.

The other grand proposal is, well, big. And as it goes near KMS & encryption, beyond my scope. It also isn't going to interact with any other S3 client, which is a significant limitation. I'm certainly not going to go near it, and I wouldn't be in a place to review any but the "how does this glue to the input stream" issue. And even there fear would generally keep me away from it.

*Proposed*: create a new JIRA., "Encrypt S3A data client-side with Hadoop libraries & Hadoop KMS", put that proposal, and for now, let people comment on the proposal & see where it goes. 



> Support for client-side encryption in S3A file system
> -----------------------------------------------------
>
>                 Key: HADOOP-13887
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13887
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Jeeyoung Kim
>            Assignee: Igor Mazur
>            Priority: Minor
>         Attachments: HADOOP-13887-002.patch, HADOOP-13887-007.patch, HADOOP-13887-branch-2-003.patch, HADOOP-13897-branch-2-004.patch, HADOOP-13897-branch-2-005.patch, HADOOP-13897-branch-2-006.patch, HADOOP-13897-branch-2-008.patch, HADOOP-13897-branch-2-009.patch, HADOOP-13897-branch-2-010.patch, HADOOP-13897-branch-2-012.patch, HADOOP-13897-branch-2-014.patch, HADOOP-13897-trunk-011.patch, HADOOP-13897-trunk-013.patch, HADOOP-14171-001.patch, S3-CSE Proposal.pdf
>
>
> Expose the client-side encryption option documented in Amazon S3 documentation  - http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingClientSideEncryption.html
> Currently this is not exposed in Hadoop but it is exposed as an option in AWS Java SDK, which Hadoop currently includes. It should be trivial to propagate this as a parameter passed to the S3client used in S3AFileSystem.java



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org