You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Atri Sharma (JIRA)" <ji...@apache.org> on 2017/11/30 17:56:00 UTC

[jira] [Commented] (PARQUET-1155) Support for GDPR erase requirements

    [ https://issues.apache.org/jira/browse/PARQUET-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16273057#comment-16273057 ] 

Atri Sharma commented on PARQUET-1155:
--------------------------------------

Is this issue being actively worked on? I would like to actively work on this one if its open.

> Support for GDPR erase requirements
> -----------------------------------
>
>                 Key: PARQUET-1155
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1155
>             Project: Parquet
>          Issue Type: Wish
>          Components: parquet-format
>    Affects Versions: 1.8.2
>            Reporter: Machiel Groeneveld
>
> As understand it Parquet is a write once thing. So mutating data inside Parquet files is not an option. Now there is a new cross EU law coming in effect May 2018 that requires companies to delete data pertaining a customer if being asked to do so.
> Our case is quite simple, our biggest parquet tables collect 7.5 billion rows a month. So removing data by duplicating this table whilst filtering out the unwanted customer data is not feasible. 
> Perhaps there is some way to remove particular data? Or perhaps there is an efficient way to do read/filter/write? Perhaps zeroing the data is an idea to not change the layout of the files. 
> Not sure if this is the right platform to start this discussion but I think more people will have this issue once it becomes clear that data needs to be deleted in all places, also in parquet files. Companies fase multi million dollar fines if they don't comply with GDPR.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)