You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2017/10/05 17:44:00 UTC

[jira] [Comment Edited] (ORC-14) Add column level encryption to ORC files

    [ https://issues.apache.org/jira/browse/ORC-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193291#comment-16193291 ] 

Owen O'Malley edited comment on ORC-14 at 10/5/17 5:43 PM:
-----------------------------------------------------------

*Laugh* Ignore my previous comment.

I am getting closer, I need to write this up for the website, but the general direction is:

* Add support for encrypting columns where the writer adds two alternatives into the file.
   a. Encrypted original data
   b. Unencrypted masked data
* The format change is backwards compatible where old readers will get the unencrypted masked values.
* It will use the Hadoop KMS by default, although it may be overridden.
* Encryption will be AES (128 or 256 bit) in CTR mode, which allows seeks.
* Different columns may use different master keys. Each writer will generate a random file id that is used to create a unique encryption key for the column in that file. To read an encrypted column, the user will need to have the KMS decrypt the column's encryption key.
* The file and stripe statistics will be encrypted for the encrypted columns. However, the list of streams in the stripe footer will not be encrypted.
* Masking of data may have several forms:
  a. Nullify - make all values null
  b. Redact - replace strings and numbers with replacements based on character classes ('x' for letters, '9' for numbers, etc.)
  c. SHA256 - replace strings and numbers with SHA256 of the value
  d. Custom - user defined method


was (Author: owen.omalley):
*Laugh* Ignore my previous comment.

I am getting closer, I need to write this up for the website, but the general direction is:

* Add support for encrypting columns where the writer adds two alternatives into the file.
   * Encrypted original data
   * Unencrypted masked data
* The format change is backwards compatible where old readers will get the unencrypted masked values.
* It will use the Hadoop KMS by default, although it may be overridden.
* Encryption will be AES (128 or 256 bit) in CTR mode, which allows seeks.
* Different columns may use different master keys. Each writer will generate a random file id that is used to create a unique encryption key for the column in that file. To read an encrypted column, the user will need to have the KMS decrypt the column's encryption key.
* The file and stripe statistics will be encrypted for the encrypted columns. However, the list of streams in the stripe footer will not be encrypted.
* Masking of data may have several forms:
  * Nullify - make all values null
  * Redact - replace strings and numbers with replacements based on character classes ('x' for letters, '9' for numbers, etc.)
  * SHA256 - replace strings and numbers with SHA256 of the value
  * Custom - user defined method

> Add column level encryption to ORC files
> ----------------------------------------
>
>                 Key: ORC-14
>                 URL: https://issues.apache.org/jira/browse/ORC-14
>             Project: ORC
>          Issue Type: New Feature
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>
> It would be useful to support column level encryption in ORC files. Since each column and its associated index is stored separately, encrypting a column separately isn't difficult. In terms of key distribution, it would make sense to use an external server like the one in HADOOP-9331.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)