You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/24 11:06:14 UTC

[GitHub] [iceberg] wynjauu opened a new pull request #2370: In response to the issue of ICEBERG-2369, code optimization for the c…

wynjauu opened a new pull request #2370:
URL: https://github.com/apache/iceberg/pull/2370


   This fix the issue: https://github.com/apache/iceberg/issues/2369


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue edited a comment on pull request #2370: In response to the issue of ICEBERG-2369, code optimization for the c…

Posted by GitBox <gi...@apache.org>.
rdblue edited a comment on pull request #2370:
URL: https://github.com/apache/iceberg/pull/2370#issuecomment-806070509


   Thanks for looking into this, @wynjauu. It's always good to check places that may benefit from a refactor. Here, I don't think that this refactor is a good idea. There are good reasons why the code is structured this way.
   
   First, the call to `encryption.decrypt` passes all of the encrypted input files at the same time by passing a single iterator. That allows an encryption manager to do a single batch call to fetch the keys for those files, which is much more important than avoiding extra loops.
   
   Second, the same delete file may be used for more than one data file. In the rewrite, `decrypt` would be called once for each copy of the same file rather than just once, and a duplicate would result in an exception from the immutable map builder. That's why we use a mutable map of location to key metadata and then create encrypted input files from the entries.
   
   While this does more loops, it is actually more performant and correct to do it this way. It is much more important to deduplicate decryption and to use the batch decrypt call than to avoid an extra loop or two over a small collection. I'm going to close this and #2369, but feel free to reopen if you want to discuss it more. Thanks for contributing!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue closed pull request #2370: In response to the issue of ICEBERG-2369, code optimization for the c…

Posted by GitBox <gi...@apache.org>.
rdblue closed pull request #2370:
URL: https://github.com/apache/iceberg/pull/2370


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2370: In response to the issue of ICEBERG-2369, code optimization for the c…

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2370:
URL: https://github.com/apache/iceberg/pull/2370#issuecomment-806070509


   Thanks for looking into this, @wynjauu. It's always good to check places that may benefit from a refactor. Here, I don't think that this refactor is a good idea. There are good reasons why the code is structured this way.
   
   First, the call to `encryption.decrypt` passes all of the encrypted input files at the same time by passing a single iterator. That allows an encryption manager to do a single batch call to fetch the keys for those files, which is much more important than avoiding extra loops.
   
   Second, the same delete file may be used for more than one data file. In the rewrite, `decrypt` would be called once for each copy of the same file rather than just once, and a duplicate would result in an exception from the immutable map builder. That's why we use a mutable map of location to key metadata and then create encrypted input files from the entries.
   
   While this does more loops, it is actually more performant and correct to do it this way. It is much more important to deduplicate decryption and to use the batch decrypt call than to avoid an extra loop or two over a small collection.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org