You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by nicolas paris <ni...@riseup.net> on 2022/10/26 22:01:27 UTC

Modular encryption to return null values instead of Crypto exception when bad key provided

hello,

as mentionned in several places [1], from a data analyst point of view,
having null values for encrypted columns when one has no key to decrypt
is better than getting exceptions, and ease the data exploration
allowing select * instead of writing each allowed columns.

I have been digging the crypto source code to find a easy way to catch
crypto exception and turn values to null from the
DecryptionPropertiesFactory that can be passed to the query engine
thought hadoop configs.

I might be missing something, but I haven't found a way to tell the
ParquetReader to put nulls and go ahead reading un-encrypted columns
when something get wrong with the KMS.

Is such behavior available or are you willing to add such feature at
parquet level in the future ? 

Thanks


[1]
https://www.uber.com/en-FR/blog/one-stone-three-birds-finer-grained-encryption-apache-parquet/

Re: Modular encryption to return null values instead of Crypto exception when bad key provided

Posted by nicolas paris <ni...@riseup.net>.
thanks again for your guidance, and work around this.
that makes sense

On Thu, 2022-10-27 at 10:45 +0300, Gidon Gershinsky wrote:
> trying to project columns without authorization can be very costly,
> for two
> reasons:
> - unnecessary per-column/file calls to the (remote) KMS service, plus
> the
> cost of per-call authorization checks
> - red-flagging unauthorized calls and triggering "breach attempt"
> alerts
> 
> IMO, the best way to handle this is to have a layer on top of parquet
> -
> that gets the list of authorized columns for the reader (eg from a
> policy
> engine), and allows to project only them (returning nulls for the
> others)
> 
> Cheers, Gidon
> 
> 
> On Thu, Oct 27, 2022 at 1:01 AM nicolas paris
> <ni...@riseup.net>
> wrote:
> 
> > hello,
> > 
> > as mentionned in several places [1], from a data analyst point of
> > view,
> > having null values for encrypted columns when one has no key to
> > decrypt
> > is better than getting exceptions, and ease the data exploration
> > allowing select * instead of writing each allowed columns.
> > 
> > I have been digging the crypto source code to find a easy way to
> > catch
> > crypto exception and turn values to null from the
> > DecryptionPropertiesFactory that can be passed to the query engine
> > thought hadoop configs.
> > 
> > I might be missing something, but I haven't found a way to tell the
> > ParquetReader to put nulls and go ahead reading un-encrypted
> > columns
> > when something get wrong with the KMS.
> > 
> > Is such behavior available or are you willing to add such feature
> > at
> > parquet level in the future ?
> > 
> > Thanks
> > 
> > 
> > [1]
> > 
> > https://www.uber.com/en-FR/blog/one-stone-three-birds-finer-grained-encryption-apache-parquet/
> > 


Re: Modular encryption to return null values instead of Crypto exception when bad key provided

Posted by Gidon Gershinsky <gg...@gmail.com>.
trying to project columns without authorization can be very costly, for two
reasons:
- unnecessary per-column/file calls to the (remote) KMS service, plus the
cost of per-call authorization checks
- red-flagging unauthorized calls and triggering "breach attempt" alerts

IMO, the best way to handle this is to have a layer on top of parquet -
that gets the list of authorized columns for the reader (eg from a policy
engine), and allows to project only them (returning nulls for the others)

Cheers, Gidon


On Thu, Oct 27, 2022 at 1:01 AM nicolas paris <ni...@riseup.net>
wrote:

> hello,
>
> as mentionned in several places [1], from a data analyst point of view,
> having null values for encrypted columns when one has no key to decrypt
> is better than getting exceptions, and ease the data exploration
> allowing select * instead of writing each allowed columns.
>
> I have been digging the crypto source code to find a easy way to catch
> crypto exception and turn values to null from the
> DecryptionPropertiesFactory that can be passed to the query engine
> thought hadoop configs.
>
> I might be missing something, but I haven't found a way to tell the
> ParquetReader to put nulls and go ahead reading un-encrypted columns
> when something get wrong with the KMS.
>
> Is such behavior available or are you willing to add such feature at
> parquet level in the future ?
>
> Thanks
>
>
> [1]
>
> https://www.uber.com/en-FR/blog/one-stone-three-birds-finer-grained-encryption-apache-parquet/
>