You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Julien Le Dem <ju...@gmail.com> on 2018/03/13 17:01:56 UTC

Parquet sync starting now

https://meet.google.com/jpy-mump-ngc

Re: Parquet sync starting now

Posted by Julien Le Dem <ju...@gmail.com>.
Notes:

Attendees:

   - Julien (WeWork): proto, release
   - Marcel: Iceberg
   - Zoltan, Gabor, Anna (Cloudera): bug null values.
   -
      - https://issues.apache.org/jira/projects/PARQUET/issues/PARQUET-1222
      <https://issues.apache.org/jira/projects/PARQUET/issues/PARQUET-1222?filter=allopenissues>
      - https://issues.apache.org/jira/projects/PARQUET/issues/PARQUET-1217
      <https://issues.apache.org/jira/projects/PARQUET/issues/PARQUET-1217?filter=allissues>
   - Lars, Zoltan Borok-nagy (Cloudera Impala): new way of merging changes
   after moving to gitbox.
   - Deepak (Vertica): encryption in c++
   - Benoit, Singhue (Criteo): protobuf. Merging
   -
      - https://github.com/apache/parquet-mr/pull/411
      - PARQUET-968
   - Chao (Uber): encryption, Native Rust implementation.
   - Gidon (IBM): encryption jira, status and next steps.



   - Protobuf:
   -
      - https://github.com/apache/parquet-mr/pull/411
      - In use for a few weeks.
      - Introduces a breaking change:
      -
         - Empty maps become null maps
      - Will add flag to avoid compatibility break
   - Rust:
   -
      - Been working for 1 year
      - 2 contributors.
      - Read implementation only for now.
      - Want to contribute to the parquet project.
      - Plan to have Parquet-rust using Arrow-rust
      - Personal project.
   - Encryption: https://issues.apache.org/jira/browse/PARQUET-1178
   -
      - Need review: https://github.com/apache/parquet-format/pull/84/files
      - Chao: Hive table use parquet format. Different engines (Presto).
      use the data so security should be implemented at the node level
      - Deepak: make sure there’s no incompatibility issues.
      - Gidon: has been looking at the C++ implementation. Cross
      compatibility working.
      - Action:
      -
         - Provide feedback on PR and doc.
         - Giddon to share java.
         - Deepak take a look and provide cpp point of view
      - Bugs:
   -
      - PARQUET-1222
      <https://issues.apache.org/jira/projects/PARQUET/issues/PARQUET-1222?filter=allopenissues>:
Handling
      of NaN and 0+ 0-:
      -
         - 1: fix current behavior (ignore NaN in stats and 0+-)
         - 2: provide better total ordering including NaN etc
      - PARQUET-1217: if null_count if populated but not min/max old
      parquet use default 0 min max for numbers.
      -
         - Need a fix and parquet-mr
         - Old readers will have problems:
         -
            - Possibly provide a 1.8.3 release with the bug fix for project
            depending on an old version.
            - For example Spark:
            https://github.com/apache/spark/blob/34811e0b908449fd59bca476604612b1d200778d/pom.xml#L132
            - Will reach out to the spark team to see if they can upgrade.



On Tue, Mar 13, 2018 at 10:01 AM, Julien Le Dem <ju...@gmail.com>
wrote:

> https://meet.google.com/jpy-mump-ngc
>