You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Moist (JIRA)" <ji...@apache.org> on 2018/01/03 19:03:00 UTC
[jira] [Comment Edited] (HADOOP-15006) Encrypt S3A data client-side with Hadoop libraries & Hadoop KMS

    [ https://issues.apache.org/jira/browse/HADOOP-15006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16310101#comment-16310101 ] 

Steve Moist edited comment on HADOOP-15006 at 1/3/18 7:02 PM:
--------------------------------------------------------------

{quote}
Before worrying about these, why not conduct some experiments? You could take S3A and modify it to always encrypt client side with the same key, then run as many integration tests as you can against it (Hive, Spark, impala, ...), and see what fails. I think that should be a first step to anything client-side related
{quote}

I wrote a simple proof of concept back in May using the HDFS Crytro Streams wrapping the S3 streams with a fixed AES key and IV.  I was able to run the S3 integration tests without issue, terragen/sort/verify without issue and write various files (of differing sizes) and compare the check sums.  It's given me enough confidence to move forward back then with writing the original proposal.  Unfortunately, I've seemed to misplaced the work since it's been so long.  I'll work on re-creating it in the next few weeks and post it here; I've got a deadline I've got to focus on for now instead.  So after that I'll run some Hive/Impala/etc integration tests.  Besides AES/CTR/NoPadding should generate a cipher text the same size as the plain text unlike the AWS's SDK's AES/CBC/PKCS5Padding which is causing the file size to change.


was (Author: moist):
{quote}
Before worrying about these, why not conduct some experiments? You could take S3A and modify it to always encrypt client side with the same key, then run as many integration tests as you can against it (Hive, Spark, impala, ...), and see what fails. I think that should be a first step to anything client-side related
{quote}

I wrote a simple proof of concept back in May using the HDFS Crytro Streams wrapping the S3 streams with a fixed AES key and IV.  I was able to run the S3 integration tests without issue, terragen/sort/verify without issue and write various files (of differing sizes) and compare the check sums.  It's given me enough confidence to move forward back then with writing the original proposal.  Unfortunately, I've seemed to misplaced the work since it's been so long.  I'll work on re-creating it in the next few weeks and post it here; I've got a deadline I've got to focus on for now instead.  Besides AES/CTR/NoPadding should generate a cipher text the same size as the plain text unlike the AWS's SDK's AES/CBC/PKCS5Padding which is causing the file size to change.

> Encrypt S3A data client-side with Hadoop libraries & Hadoop KMS
> ---------------------------------------------------------------
>
>                 Key: HADOOP-15006
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15006
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs/s3, kms
>            Reporter: Steve Moist
>            Priority: Minor
>         Attachments: S3-CSE Proposal.pdf
>
>
> This is for the proposal to introduce Client Side Encryption to S3 in such a way that it can leverage HDFS transparent encryption, use the Hadoop KMS to manage keys, use the `hdfs crypto` command line tools to manage encryption zones in the cloud, and enable distcp to copy from HDFS to S3 (and vice-versa) with data still encrypted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org