You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by Matt Cheah <mc...@palantir.com> on 2018/12/12 19:43:56 UTC
Iceberg Encryption Proposal
Hi everyone,
Encrypting data written to Iceberg tables is crucial for using this technology securely in industry settings. Towards that end, I’ve proposed an API for supporting encryption, including how users can implement their own custom encryption key providers and the metadata we’ll need to store in manifests.
You can find the full spec here: https://docs.google.com/document/d/1LptmFB7az2rLnou27QK_KKHgjcA5vKza0dWj4h8fkno/edit
The GitHub ticket tracking this is here: https://github.com/apache/incubator-iceberg/issues/20
Feel free to provide feedback in comments on the document.
Thanks!
-Matt Cheah
Re: Iceberg Encryption Proposal
Posted by Matt Cheah <mc...@palantir.com>.
Hi Ryan,
I believe I’ve addressed most of the below questions in the Google Doc at least. I’ll go ahead and start on the implementation as outlined in the current document.
Let me know if there are any further concerns.
-Matt Cheah
From: Ryan Blue <rb...@netflix.com>
Reply-To: "rblue@netflix.com" <rb...@netflix.com>
Date: Monday, December 24, 2018 at 11:12 AM
To: Matt Cheah <mc...@palantir.com>
Cc: Iceberg Dev List <de...@iceberg.apache.org>, "Yifei Huang (PD)" <yi...@palantir.com>, Vinoo Ganesh <vg...@palantir.com>
Subject: Re: Iceberg Encryption Proposal
Hi Matt,
Thanks for putting this proposal together! It all seems reasonable to me. I just have a few questions and comments about scope and use:
· Encrypted Iceberg metadata is out of scope?
· Authentication tags are out of scope? (like those used in Parquet)
· I think one requirement should be that Iceberg doesn’t necessarily leak the association of data files to keys. In that case, I’d prefer an opaque byte array of “key metadata” instead of the existing struct. That allows encrypting the key metadata later to avoid the leak.
· Using an opaque byte array would also support storing more than one encryption key reference for per-column encryption. If that were done, the key returned by the get/put API might need to be more flexible.
· This should also describe how to pass the key metadata to file formats for those that support encryption (or explicitly state that’s out of scope)
· I’d like a little more detail on how this could look up keys on the driver and distribute them to tasks safely to avoid the thundering herd problem on the key server
Thanks!
rb
On Wed, Dec 12, 2018 at 11:44 AM Matt Cheah <mc...@palantir.com> wrote:
Hi everyone,
Encrypting data written to Iceberg tables is crucial for using this technology securely in industry settings. Towards that end, I’ve proposed an API for supporting encryption, including how users can implement their own custom encryption key providers and the metadata we’ll need to store in manifests.
You can find the full spec here: https://docs.google.com/document/d/1LptmFB7az2rLnou27QK_KKHgjcA5vKza0dWj4h8fkno/edit [docs.google.com]
The GitHub ticket tracking this is here: https://github.com/apache/incubator-iceberg/issues/20 [github.com]
Feel free to provide feedback in comments on the document.
Thanks!
-Matt Cheah
--
Ryan Blue
Software Engineer
Netflix
Re: Iceberg Encryption Proposal
Posted by Ryan Blue <rb...@netflix.com.INVALID>.
Hi Matt,
Thanks for putting this proposal together! It all seems reasonable to me. I
just have a few questions and comments about scope and use:
- Encrypted Iceberg metadata is out of scope?
- Authentication tags are out of scope? (like those used in Parquet)
- I think one requirement should be that Iceberg doesn’t necessarily
leak the association of data files to keys. In that case, I’d prefer an
opaque byte array of “key metadata” instead of the existing struct. That
allows encrypting the key metadata later to avoid the leak.
- Using an opaque byte array would also support storing more than one
encryption key reference for per-column encryption. If that were done, the
key returned by the get/put API might need to be more flexible.
- This should also describe how to pass the key metadata to file formats
for those that support encryption (or explicitly state that’s out of scope)
- I’d like a little more detail on how this could look up keys on the
driver and distribute them to tasks safely to avoid the thundering herd
problem on the key server
Thanks!
rb
On Wed, Dec 12, 2018 at 11:44 AM Matt Cheah <mc...@palantir.com> wrote:
> Hi everyone,
>
>
>
> Encrypting data written to Iceberg tables is crucial for using this
> technology securely in industry settings. Towards that end, I’ve proposed
> an API for supporting encryption, including how users can implement their
> own custom encryption key providers and the metadata we’ll need to store in
> manifests.
>
>
>
> You can find the full spec here:
> https://docs.google.com/document/d/1LptmFB7az2rLnou27QK_KKHgjcA5vKza0dWj4h8fkno/edit
>
>
>
> The GitHub ticket tracking this is here:
> https://github.com/apache/incubator-iceberg/issues/20
>
>
>
> Feel free to provide feedback in comments on the document.
>
>
>
> Thanks!
>
>
>
> -Matt Cheah
>
--
Ryan Blue
Software Engineer
Netflix