You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@iceberg.apache.org by GitBox <gi...@apache.org> on 2018/11/30 22:19:59 UTC

[GitHub] mccheah commented on issue #20: Encryption in Data Files

mccheah commented on issue #20: Encryption in Data Files
URL: https://github.com/apache/incubator-iceberg/issues/20#issuecomment-443356982

I was considering using Palantir's [hadoop-crypto library](https://github.com/palantir/hadoop-crypto) to do the actual encryption portion of things. What do you think about this package?

Column encryption is interesting; on our side we haven't explored this yet, and thus would not really be able to handle per-column encryption, and need to, in the meantime, only encrypt at the top file layer. That is to say, our internal storage solution doesn't handle storing multiple keys to decrypt different portions of the same file. You'll also notice this as such in the hadoop-crypto library. So whatever solution we come up with should be able to handle a full file encryption _or_ a per-column encryption. I suppose though a file would only be able to be encrypted one way or the other way strictly; if we encrypt the whole file, you more or less lose all the benefits of per-column encryption.

Additionally a key part of performance is reducing the number of round trips made to the key storage backend, particularly if the backend supports batch operations. So it's ideal if the `KeyManager` could support getting and putting multiple keys at once, as well as writing the Spark Data Source and other Iceberg clients to contact the backend as few times as possible.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services