You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@parquet.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/04/22 16:48:00 UTC

[jira] [Commented] (PARQUET-2223) Parquet Data Masking for Column Encryption

    [ https://issues.apache.org/jira/browse/PARQUET-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715305#comment-17715305 ] 

ASF GitHub Bot commented on PARQUET-2223:
-----------------------------------------

shangxinli commented on PR #1016:
URL: https://github.com/apache/parquet-mr/pull/1016#issuecomment-1518702874

   @zhangjiashen The current change is incomplete. You only port the change of utilizes to hide the columns in schema, but you need to actually hide it in the readFooter(). And before that, you need to mark those columns as hidden when access denied is thrown from KMS. 




> Parquet Data Masking for Column Encryption
> ------------------------------------------
>
>                 Key: PARQUET-2223
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2223
>             Project: Parquet
>          Issue Type: Task
>            Reporter: Jiashen Zhang
>            Priority: Minor
>
> h1. Background
> h2. What is Data Masking?
> Data masking is the process of obfuscating sensitive data. Instead of revealing PII data, masking allows us to return NULLs, hashes or redacted data in its place. With data masking, users who are in the correct permission groups can retrieve the original data and users without permissions will receive masked data.
> h2. Why do we need it?
>  * Fined-Grained Access Control
> h2. Why do we want to enhance data masking?
>  
> Users might not have all permissions for all columns, existing code doesn’t have support for us to skip columns that users don’t have permissions to access. This enhancement will add this support so that users can decide to skip some columns to avoid decryption error.
> h1. Design Requirements
>  # Users can skip some columns with a configuration
> h1. Proposed solution
> Key idea is to modify the request schema by removing skipped columns from the schema.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)