You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Xinli shang <sh...@uber.com.INVALID> on 2020/10/27 17:14:41 UTC

Parquet sync meeting notes - Oct 2020

10/27/2020


Hi all,


Attendees:  Michael, Gidon, Gabor, Xinli, Cao Sun

   1.

   Column Encryption
   1.

      Parquet-1396 is approved and will be merged.
      2.

      So far, all the planned changes around column encryption are done!
      3.

      Data masking
      1.

         Set up a call with Sri and Gidon for it.
         2.

         PR <https://github.com/apache/parquet-mr/pull/819/> for replacing
         the column with null is in the review.
         2.

   Questions around Parquet v1 and v2
   1.

      Encoding is not bounded to v1 or v2
      2.

      Not clear on v2 roadmap
      3.

      Documentation of v1 and v2 - Owner: Michael
      4.

      Discussion for retiring of V2
      3.

   Parquet 1.11.x adoption to Presto
   1.

      PR <https://github.com/prestodb/presto/pull/14960> is created but it
      has a unit test failure
      4.

   Parquet 1.11.x adoption to Iceberg
   1.

      Upgraded to 1.11.1 now
      2.

      Integrating the Column Index feature to it. See some issues and Xinli
      is working on it.
      5.

   Parquet 1.11.x adoption to Spark
   1.

      Parquet 1.11.x introduced a new version(1.9) of Avro which removed
      some APIs. It breaks the upgrading.
      2.

      Solutions - Fix Avro in Hive
      1.

         Upgrade to 1.9, attempted but introduced a lot of issues.
         2.

         Replace deprecated API call in Hive with a new API. Hive remains
         1.8 of Avro. This has to be in Hive 2.3 and upgrade Avro from
1.7 to 1.8 of
         Avro.
         3.

      Tracking ticket
      1.

         https://issues.apache.org/jira/browse/SPARK-27733
         2.

         https://issues.apache.org/jira/browse/SPARK-26346
         6.

   Parquet 1.12.0
   1.

      Some companies want to adopt column encryption which is only
      available in Parquet 12.
      2.

      Parquet 1.12.0 will be on Avro 1.10. Need to know if this is a
      problem. Chao will try it out.


Please let me know if you have any questions.


Xinli Shang | Tech Lead Manager @ Uber Data Infra