You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Jiashen Zhang (Jira)" <ji...@apache.org> on 2023/01/04 23:26:00 UTC

[jira] [Updated] (PARQUET-2223) Parquet Data Masking for Column Encryption

     [ https://issues.apache.org/jira/browse/PARQUET-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jiashen Zhang updated PARQUET-2223:
-----------------------------------
    Description: 
h1. Background
h2. What is Data Masking?

Data masking is the process of obfuscating sensitive data. Instead of revealing PII data, masking allows us to return NULLs, hashes or redacted data in its place. With data masking, users who are in the correct permission groups can retrieve the original data and users without permissions will receive masked data.
h2. Why do we need it?
 * Fined-Grained Access Control

h2. Why do we want to enhance data masking?

 

Users might not have all permissions for all columns, existing code doesn’t have support for us to skip columns that users don’t have permissions to access. This enhancement will add this support so that users can decide to skip some columns to avoid decryption error.
h1. Design Requirements
 # Users can skip some columns with a configuration

h1. Proposed solution

Key idea is to modify the request schema by removing skipped columns from the schema.

> Parquet Data Masking for Column Encryption
> ------------------------------------------
>
>                 Key: PARQUET-2223
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2223
>             Project: Parquet
>          Issue Type: Task
>            Reporter: Jiashen Zhang
>            Priority: Minor
>
> h1. Background
> h2. What is Data Masking?
> Data masking is the process of obfuscating sensitive data. Instead of revealing PII data, masking allows us to return NULLs, hashes or redacted data in its place. With data masking, users who are in the correct permission groups can retrieve the original data and users without permissions will receive masked data.
> h2. Why do we need it?
>  * Fined-Grained Access Control
> h2. Why do we want to enhance data masking?
>  
> Users might not have all permissions for all columns, existing code doesn’t have support for us to skip columns that users don’t have permissions to access. This enhancement will add this support so that users can decide to skip some columns to avoid decryption error.
> h1. Design Requirements
>  # Users can skip some columns with a configuration
> h1. Proposed solution
> Key idea is to modify the request schema by removing skipped columns from the schema.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)