You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2017/08/01 17:14:00 UTC
[jira] [Commented] (HADOOP-13887) Support for client-side encryption in S3A file system

    [ https://issues.apache.org/jira/browse/HADOOP-13887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16109336#comment-16109336 ] 

Steve Loughran commented on HADOOP-13887:
-----------------------------------------

Like I warned, all I worry about in s3a right now is s3guard and any regressions from things like AWS SDK updates, so not looking at this in too much detail.

I think the option of using object tags to preserve the data is not one to be dismissed lightly, not if it allows a "budget" operation without paying for DDB IO.
# I also think the use case "handle data saved to Glacier" is something needed, hopefully this can do it.
# And there is also the opportunity gained from supporting s3-compatible object stores, as [~Thomas Demoor] and [~ehiggs] will point out.
I know you consider object tagging "wrong", but if we take over a couple of the tags for our purposes (one for security, one for "other", where we can embed permissions, checksum, etc), then that's something which can be managed, especially if we don't allow access to the object tags from the Hadoop APIs, You'd have to be running something alongside Hadoop to want to work with those tags.

I have one more little problem to throw into the mix; in HADOOP-13786 we will have objects which don't come into existence (i.e. the multipart PUT is completed) until a job is completed, *potentially on a different host*. If the OEMI info is attached as metadata in the put, this happens automatically. If a new DDB table is used to map path -> OEMI, then the OEMI data is going to have to be propagated with the pending commit data and the job commit will have to create a new entry mapping the final pathname to the OEMI needed to decrypt the data.

> Support for client-side encryption in S3A file system
> -----------------------------------------------------
>
>                 Key: HADOOP-13887
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13887
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Jeeyoung Kim
>            Assignee: Igor Mazur
>            Priority: Minor
>         Attachments: HADOOP-13887-002.patch, HADOOP-13887-007.patch, HADOOP-13887-branch-2-003.patch, HADOOP-13897-branch-2-004.patch, HADOOP-13897-branch-2-005.patch, HADOOP-13897-branch-2-006.patch, HADOOP-13897-branch-2-008.patch, HADOOP-13897-branch-2-009.patch, HADOOP-13897-branch-2-010.patch, HADOOP-13897-branch-2-012.patch, HADOOP-13897-branch-2-014.patch, HADOOP-13897-trunk-011.patch, HADOOP-13897-trunk-013.patch, HADOOP-14171-001.patch, S3-CSE Proposal.pdf
>
>
> Expose the client-side encryption option documented in Amazon S3 documentation  - http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingClientSideEncryption.html
> Currently this is not exposed in Hadoop but it is exposed as an option in AWS Java SDK, which Hadoop currently includes. It should be trivial to propagate this as a parameter passed to the S3client used in S3AFileSystem.java



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org