You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2017/10/05 17:44:00 UTC
[jira] [Comment Edited] (ORC-14) Add column level encryption to ORC
files
[ https://issues.apache.org/jira/browse/ORC-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193291#comment-16193291 ]
Owen O'Malley edited comment on ORC-14 at 10/5/17 5:43 PM:
-----------------------------------------------------------
*Laugh* Ignore my previous comment.
I am getting closer, I need to write this up for the website, but the general direction is:
* Add support for encrypting columns where the writer adds two alternatives into the file.
a. Encrypted original data
b. Unencrypted masked data
* The format change is backwards compatible where old readers will get the unencrypted masked values.
* It will use the Hadoop KMS by default, although it may be overridden.
* Encryption will be AES (128 or 256 bit) in CTR mode, which allows seeks.
* Different columns may use different master keys. Each writer will generate a random file id that is used to create a unique encryption key for the column in that file. To read an encrypted column, the user will need to have the KMS decrypt the column's encryption key.
* The file and stripe statistics will be encrypted for the encrypted columns. However, the list of streams in the stripe footer will not be encrypted.
* Masking of data may have several forms:
a. Nullify - make all values null
b. Redact - replace strings and numbers with replacements based on character classes ('x' for letters, '9' for numbers, etc.)
c. SHA256 - replace strings and numbers with SHA256 of the value
d. Custom - user defined method
was (Author: owen.omalley):
*Laugh* Ignore my previous comment.
I am getting closer, I need to write this up for the website, but the general direction is:
* Add support for encrypting columns where the writer adds two alternatives into the file.
* Encrypted original data
* Unencrypted masked data
* The format change is backwards compatible where old readers will get the unencrypted masked values.
* It will use the Hadoop KMS by default, although it may be overridden.
* Encryption will be AES (128 or 256 bit) in CTR mode, which allows seeks.
* Different columns may use different master keys. Each writer will generate a random file id that is used to create a unique encryption key for the column in that file. To read an encrypted column, the user will need to have the KMS decrypt the column's encryption key.
* The file and stripe statistics will be encrypted for the encrypted columns. However, the list of streams in the stripe footer will not be encrypted.
* Masking of data may have several forms:
* Nullify - make all values null
* Redact - replace strings and numbers with replacements based on character classes ('x' for letters, '9' for numbers, etc.)
* SHA256 - replace strings and numbers with SHA256 of the value
* Custom - user defined method
> Add column level encryption to ORC files
> ----------------------------------------
>
> Key: ORC-14
> URL: https://issues.apache.org/jira/browse/ORC-14
> Project: ORC
> Issue Type: New Feature
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
>
> It would be useful to support column level encryption in ORC files. Since each column and its associated index is stored separately, encrypting a column separately isn't difficult. In terms of key distribution, it would make sense to use an external server like the one in HADOOP-9331.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)