You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/24 18:47:38 UTC

[GitHub] [iceberg] rdblue commented on pull request #2370: In response to the issue of ICEBERG-2369, code optimization for the c…

rdblue commented on pull request #2370:
URL: https://github.com/apache/iceberg/pull/2370#issuecomment-806070509

Thanks for looking into this, @wynjauu. It's always good to check places that may benefit from a refactor. Here, I don't think that this refactor is a good idea. There are good reasons why the code is structured this way.

First, the call to `encryption.decrypt` passes all of the encrypted input files at the same time by passing a single iterator. That allows an encryption manager to do a single batch call to fetch the keys for those files, which is much more important than avoiding extra loops.

Second, the same delete file may be used for more than one data file. In the rewrite, `decrypt` would be called once for each copy of the same file rather than just once, and a duplicate would result in an exception from the immutable map builder. That's why we use a mutable map of location to key metadata and then create encrypted input files from the entries.

While this does more loops, it is actually more performant and correct to do it this way. It is much more important to deduplicate decryption and to use the batch decrypt call than to avoid an extra loop or two over a small collection.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org