You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/05/26 13:26:32 UTC

[GitHub] [iceberg] ggershinsky opened a new pull request #2639: Parquet: Support parquet modular encryption

ggershinsky opened a new pull request #2639:
URL: https://github.com/apache/iceberg/pull/2639


   Implements #1413 
   
   Based on https://docs.google.com/document/d/1kkcjr9KrlB9QagRX3ToulG_Rf-65NMSlVANheDNzJq4/edit?usp=sharing
   
   @rdblue @shangxinli  @andersonm-ibm @RussellSpitzer @flyrain @aokolnychyi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ggershinsky commented on pull request #2639: Parquet: Support parquet modular encryption

Posted by GitBox <gi...@apache.org>.
ggershinsky commented on pull request #2639:
URL: https://github.com/apache/iceberg/pull/2639#issuecomment-860623039


   > I'm approving this to run unit tests. Thanks for working on this, @ggershinsky!
   
   Thanks @rdblue!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ggershinsky commented on pull request #2639: Parquet: Support parquet modular encryption

Posted by GitBox <gi...@apache.org>.
ggershinsky commented on pull request #2639:
URL: https://github.com/apache/iceberg/pull/2639#issuecomment-860627950


   >  Can you provide a quick summary of how to plug into Parquet encryption and what this does?
   
   Certainly. There are two encryption interfaces in parquet-mr-1.12.0 : low-level (direct impl of the spec; max flexibility; no key management) and high-level (a layer on top of low-level, with a lib-local key management tools driven by Hadoop properties). In Iceberg, we'll use directly the low-level Parquet encryption API - because the key management will be done by Iceberg, in a similar fashion for all formats; and because Iceberg has a centralized manifest capability, which makes key management more efficient than running lib-local nodes in each worker process.
   Since Iceberg also taps directly into Parquet low-level API (general, no encryption), this PR enables to link it to the encryption feature, and translates general column encryption configuration (TBD) into Parquet encryption configuration. 
   
   > it provides Parquet's equivalent of an EncryptionManager that gets the file AAD and necessary key material from Iceberg's key_metadata field.
   
   Well, here we have the gap that I've described in the other PR. To encrypt data(/delete) files, we need the AES keys - the DEKs, "data encryption keys" which are used to actually encrypt the data and metadata modules (there must be a unique DEK per file/column). But the `key_material` is a binary field in the manifest entry for a data(/delete) file that keeps a "wrapped" version of these DEKs (encrypted with master keys, MEKs, in user's KMS system). It doesn't and shouldn't keep raw DEKs. Therefore, sending `key_material` to Parquet file writers (or any other writers) doesn't help. Per my proposal in the last sync, we can reverse this process - generate random DEKs at (Parquet) writers, use them for encryption, and send them in the `DataFile`/`DeleteFile`/`ContentFile` objects back to the manifest writer. This also seems to fit the current Iceberg model, where manifest entries are written after collecting `ContentFile` objects from file writers. At this point, the manifest writer 
 process will contact the KMS to wrap these DEKs, and package them into the `key_metadata` field in the manifest file (for the readers).
   
   In a future version, we might want to generate the DEKs (or get them from KMS) in the manifest writer process, and then distribute them to data/delete file writers, with a unique DEK per file (or a set of unique DEKs per file, for column encryption). This seems to be more complicated, and less fitting the current Iceberg flow; my suggestion would be to start with the reverse approach described above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2639: Parquet: Support parquet modular encryption

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2639:
URL: https://github.com/apache/iceberg/pull/2639#issuecomment-860125973






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ggershinsky commented on pull request #2639: Parquet: Support parquet modular encryption

Posted by GitBox <gi...@apache.org>.
ggershinsky commented on pull request #2639:
URL: https://github.com/apache/iceberg/pull/2639#issuecomment-1032935945


   Hi @shangxinli ; frankly, the only dependency left is the parquet-mr update. This PR depends on the "uniform encryption" feature, which is already merged in the parquet master, but not released yet. We've started this discussion in the last parquet community sync, it's on track. Once the parquet-mr master is cut (as 1.12.3 or 1.13.0 :), this Iceberg PR will be able to pass CI, ready for a review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain commented on pull request #2639: Parquet: Support parquet modular encryption

Posted by GitBox <gi...@apache.org>.
flyrain commented on pull request #2639:
URL: https://github.com/apache/iceberg/pull/2639#issuecomment-866406346


   Hi @ggershinsky , can you rebase this PR to have #2441? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] shangxinli commented on pull request #2639: Parquet: Support parquet modular encryption

Posted by GitBox <gi...@apache.org>.
shangxinli commented on pull request #2639:
URL: https://github.com/apache/iceberg/pull/2639#issuecomment-1032854500


   @ggershinsky Let me know when it is ready for review. I would love to review this change. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ggershinsky commented on pull request #2639: Parquet: Support parquet modular encryption

Posted by GitBox <gi...@apache.org>.
ggershinsky commented on pull request #2639:
URL: https://github.com/apache/iceberg/pull/2639#issuecomment-866738636


   Hi @flyrain, will do, thanks for the notice. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ggershinsky commented on pull request #2639: Parquet: Support parquet modular encryption

Posted by GitBox <gi...@apache.org>.
ggershinsky commented on pull request #2639:
URL: https://github.com/apache/iceberg/pull/2639#issuecomment-869142556


   Per the comments above, there will be changes in this pull request (and in #2638, #2640). Converting the three to drafts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] liujinhui1994 commented on pull request #2639: Parquet: Support parquet modular encryption

Posted by GitBox <gi...@apache.org>.
liujinhui1994 commented on pull request #2639:
URL: https://github.com/apache/iceberg/pull/2639#issuecomment-1006309018


   @ggershinsky  hello,Is there any latest development of this feature, we are looking forward to this feature


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ggershinsky commented on pull request #2639: Parquet: Support parquet modular encryption

Posted by GitBox <gi...@apache.org>.
ggershinsky commented on pull request #2639:
URL: https://github.com/apache/iceberg/pull/2639#issuecomment-1006487455


   hi @liujinhui1994 , yes, this feature is under active development. Per the community discussions, the encryption PRs will be updated in Q1'22; probably I'll have the commits ready later this month or in Feb.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] shangxinli commented on pull request #2639: Parquet: Support parquet modular encryption

Posted by GitBox <gi...@apache.org>.
shangxinli commented on pull request #2639:
URL: https://github.com/apache/iceberg/pull/2639#issuecomment-1032938484


   OK. I will have a look once I have time.
   
   On Tue, Feb 8, 2022 at 10:35 AM ggershinsky ***@***.***>
   wrote:
   
   > Hi @shangxinli <https://github.com/shangxinli> ; frankly, the only
   > dependency left is the parquet-mr update. This PR depends on the "uniform
   > encryption" feature, which is already merged in the parquet master, but not
   > released yet. We've started this discussion in the last parquet community
   > sync, it's on track. Once the parquet-mr master is cut (as 1.12.3 or 1.13.0
   > :), this Iceberg PR will be able to pass CI, ready for a review.
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/iceberg/pull/2639#issuecomment-1032935945>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AHPXKMIIJC57SQDO2WA22UTU2FPALANCNFSM45SBJ3JQ>
   > .
   > Triage notifications on the go with GitHub Mobile for iOS
   > <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
   > or Android
   > <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
   >
   > You are receiving this because you were mentioned.Message ID:
   > ***@***.***>
   >
   
   
   -- 
   Xinli Shang
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org