You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by Matt Cheah <mc...@palantir.com> on 2018/12/12 19:43:56 UTC

Iceberg Encryption Proposal

Hi everyone,

 

Encrypting data written to Iceberg tables is crucial for using this technology securely in industry settings. Towards that end, I’ve proposed an API for supporting encryption, including how users can implement their own custom encryption key providers and the metadata we’ll need to store in manifests.

 

You can find the full spec here: https://docs.google.com/document/d/1LptmFB7az2rLnou27QK_KKHgjcA5vKza0dWj4h8fkno/edit

 

The GitHub ticket tracking this is here: https://github.com/apache/incubator-iceberg/issues/20

 

Feel free to provide feedback in comments on the document.

 

Thanks!

 

-Matt Cheah


Re: Iceberg Encryption Proposal

Posted by Matt Cheah <mc...@palantir.com>.
Hi Ryan,

 

I believe I’ve addressed most of the below questions in the Google Doc at least. I’ll go ahead and start on the implementation as outlined in the current document.

 

Let me know if there are any further concerns.

 

-Matt Cheah

 

From: Ryan Blue <rb...@netflix.com>
Reply-To: "rblue@netflix.com" <rb...@netflix.com>
Date: Monday, December 24, 2018 at 11:12 AM
To: Matt Cheah <mc...@palantir.com>
Cc: Iceberg Dev List <de...@iceberg.apache.org>, "Yifei Huang (PD)" <yi...@palantir.com>, Vinoo Ganesh <vg...@palantir.com>
Subject: Re: Iceberg Encryption Proposal

 

Hi Matt,

Thanks for putting this proposal together! It all seems reasonable to me. I just have a few questions and comments about scope and use:

·         Encrypted Iceberg metadata is out of scope? 

·         Authentication tags are out of scope? (like those used in Parquet) 

·         I think one requirement should be that Iceberg doesn’t necessarily leak the association of data files to keys. In that case, I’d prefer an opaque byte array of “key metadata” instead of the existing struct. That allows encrypting the key metadata later to avoid the leak. 

·         Using an opaque byte array would also support storing more than one encryption key reference for per-column encryption. If that were done, the key returned by the get/put API might need to be more flexible. 

·         This should also describe how to pass the key metadata to file formats for those that support encryption (or explicitly state that’s out of scope) 

·         I’d like a little more detail on how this could look up keys on the driver and distribute them to tasks safely to avoid the thundering herd problem on the key server 

Thanks!

rb

 

On Wed, Dec 12, 2018 at 11:44 AM Matt Cheah <mc...@palantir.com> wrote:

Hi everyone,

 

Encrypting data written to Iceberg tables is crucial for using this technology securely in industry settings. Towards that end, I’ve proposed an API for supporting encryption, including how users can implement their own custom encryption key providers and the metadata we’ll need to store in manifests.

 

You can find the full spec here: https://docs.google.com/document/d/1LptmFB7az2rLnou27QK_KKHgjcA5vKza0dWj4h8fkno/edit [docs.google.com]

 

The GitHub ticket tracking this is here: https://github.com/apache/incubator-iceberg/issues/20 [github.com]

 

Feel free to provide feedback in comments on the document.

 

Thanks!

 

-Matt Cheah


 

-- 

Ryan Blue 

Software Engineer

Netflix


Re: Iceberg Encryption Proposal

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
Hi Matt,

Thanks for putting this proposal together! It all seems reasonable to me. I
just have a few questions and comments about scope and use:

   - Encrypted Iceberg metadata is out of scope?
   - Authentication tags are out of scope? (like those used in Parquet)
   - I think one requirement should be that Iceberg doesn’t necessarily
   leak the association of data files to keys. In that case, I’d prefer an
   opaque byte array of “key metadata” instead of the existing struct. That
   allows encrypting the key metadata later to avoid the leak.
   - Using an opaque byte array would also support storing more than one
   encryption key reference for per-column encryption. If that were done, the
   key returned by the get/put API might need to be more flexible.
   - This should also describe how to pass the key metadata to file formats
   for those that support encryption (or explicitly state that’s out of scope)
   - I’d like a little more detail on how this could look up keys on the
   driver and distribute them to tasks safely to avoid the thundering herd
   problem on the key server

Thanks!

rb

On Wed, Dec 12, 2018 at 11:44 AM Matt Cheah <mc...@palantir.com> wrote:

> Hi everyone,
>
>
>
> Encrypting data written to Iceberg tables is crucial for using this
> technology securely in industry settings. Towards that end, I’ve proposed
> an API for supporting encryption, including how users can implement their
> own custom encryption key providers and the metadata we’ll need to store in
> manifests.
>
>
>
> You can find the full spec here:
> https://docs.google.com/document/d/1LptmFB7az2rLnou27QK_KKHgjcA5vKza0dWj4h8fkno/edit
>
>
>
> The GitHub ticket tracking this is here:
> https://github.com/apache/incubator-iceberg/issues/20
>
>
>
> Feel free to provide feedback in comments on the document.
>
>
>
> Thanks!
>
>
>
> -Matt Cheah
>


-- 
Ryan Blue
Software Engineer
Netflix