You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Xinli shang <sh...@uber.com.INVALID> on 2020/09/22 17:49:14 UTC

Parquet sync meeting notes - 9/22/2020

9/22/2020

Hi all,

Attendees:  Ashish Singh, Julien, Gidon, Gabor, Xinli

   1.

   Column Encryption
   1.

      PRs are all merged.
      2.

      Data masking
      1.

         This feature should have top-down approach starting from service.
         Maybe we can start with parquet-tool.
         2.

         Reducing the storage overhead for data masking - There are some
         thinkings about different compressions and tools.
         3.

         Testing: Write more tests and test with real data usually are good
         approaches. Maybe we can ask some 3rd parties that don’t know
this feature
         to have test by looking at the specification. Another way is to have
         interoperability tests between C++ and Java implementation.
Gidon has done
         some in his work. It would be great if we have automation for
it to test
         more.
         2.

   Parquet 12 release
   1.

      Created Jira ticket to release.
      2.

      Concern for the adoption of Parquet 1.11.0 because of Avro
version. Action
      Item(Owner: Xinli) - Bring in people dealing with this upgrade effort
      to the next meeting to discuss the next step.
      3.

   Parquet-313: Implement 3 level ist.
   1.

      This is a good initiative. It is a little hard for current committers
      to review because we don’t use that or don’t have knowledge of
that. Check
      if somebody from Twitter can help with the review & verification. Action
      Item(Owner: Julie) Julie will try to find someone.
      2.

      Use a flag in the code to make it optional to reduce the risk.


Please let me know if you have any questions.

Xinli Shang | Tech Lead Manager @ Uber Data Infra


-- 
Xinli Shang