You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Xinli Shang (JIRA)" <ji...@apache.org> on 2019/04/18 14:55:00 UTC

[jira] [Comment Edited] (ORC-14) Add column level encryption to ORC files

    [ https://issues.apache.org/jira/browse/ORC-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821188#comment-16821188 ] 

Xinli Shang edited comment on ORC-14 at 4/18/19 2:54 PM:
---------------------------------------------------------

Yes, I looked at it earlier and also looked at the one you published last year. They are great slides! 

I agree with you about the configuration. What we do in Parquet is something similar like table properties in HMS or other places to specify the column crypto settings(sensitivity, encryption algorithm, etc). All of these inforamtion will be sent through the technical stack(applicaiton layer, query engines etc) to Parquet(it can be ORC too) inside the schema. The Parquet plugin specified in Parquet-1396 consumes the crypto settings part of the schema and provisions the encrytion properties that are needed by Parquet encrytion (Parquet-1178). 

The benefit of this solution is that it avoids changing much of query engines. We just treat them like a tunnel to let the crypto settings inside schema through to Parquet(ORC). Hence it eases and accelerates adoption. 

I talked about Parquet-1396 in this year's Apache Hadoop Contributor Meetup. You can find the link here [https://www.youtube.com/watch?v=W38CrTUJ3YM&t=140s]

The detailed design of Parquet-1396 can be found here [https://docs.google.com/document/d/17GTQAezl1ZC1pMNHjYU_bPVxMU6DIPjtXOiLclXUlyA/edit#heading=h.r9wntu3s8swd]

 


was (Author: shangx@uber.com):
Yes, I looked at it earlier and also looked at the one you published last year. They are great slides! 

I agree with you about the configuration. What we do in Parquet is something similar like table properties in HMS or other places to specify the column crypto settings(sensitivity, encryption algorithm, etc). All of these inforamtion will be sent through the technical stack(applicaiton layer, query engines etc) to Parquet(it can be ORC too) inside the schema. The Parquet plugin specified in Parquet-1396 consumes the crypto settings part of the schema and provisions the encrytion properties that are needed by Parquet encrytion (Parquet-1178). 

The benefit of this solution is that it avoids changing much of query engines. We just treat them like a tunnel to let the crypto settings inside schema through to Parquet(ORC). Hence it eases and accelerates adoption. 

I talked about Parquet-1396 in this year's Apache Hadoop Contributor Meetup. You can find the link here [https://www.youtube.com/watch?v=W38CrTUJ3YM&t=140s]

 

> Add column level encryption to ORC files
> ----------------------------------------
>
>                 Key: ORC-14
>                 URL: https://issues.apache.org/jira/browse/ORC-14
>             Project: ORC
>          Issue Type: New Feature
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>            Priority: Major
>         Attachments: columnEncryption.png
>
>
> It would be useful to support column level encryption in ORC files. Since each column and its associated index is stored separately, encrypting a column separately isn't difficult. In terms of key distribution, it would make sense to use an external server like the one in HADOOP-9331.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)