You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Julien Le Dem <ju...@gmail.com> on 2018/04/24 16:05:18 UTC

Parquet sync

Happening now:
https://meet.google.com/esu-yiit-mun

Re: Parquet sync

Posted by Julien Le Dem <ju...@gmail.com>.
Notes:
attendees/agenda:
Ryan (Netflix):

   -  Spark update to parquet 1.10 pending

Nandor, Zoltan, Gabor, Anna (Cloudera):

   - Backport schema description language. New logical types => introduce
   parameters. Need to evolve schema parser.
   - Need review on column indexes PARQUET-1211. PR 456 :
   -
      - https://github.com/apache/parquet-mr/pull/456

Gidon:

   - Encryption

Benoit, xinhui: protobuf

   - Jackson shading
   - Parquet version in Spark
   - https://issues.apache.org/jira/browse/PARQUET-968

Julien (Wework)

Notes:
Parquet version in Spark:

   - PR in spark https://github.com/apache/spark/pull/21070
   -
      - would like in 2.4
      - Databricks would like TPCDS run.

Jackson shading:

   - Debug level => job crashes
   - Prettify schema.
   - Hadoop using Jackson 1.8
   - https://issues.apache.org/jira/projects/PARQUET/issues/PARQUET-1281

Parquet-proto:

   - https://issues.apache.org/jira/browse/PARQUET-968
   - Tested on amazon presto fork.
   - https://github.com/apache/parquet-mr/pull/411 is ready to merge.

Encryption:

   - Need pluggable mechanism
   - Will open PRs on parquet-format/parquet-mr
   - For review: https://github.com/apache/parquet-format/pull/84

Schema language for new logical types:

   - Timestamp types has 2 parameters:
   - It is ok to have a breaking change in the parquet schema text
   representation.
   - Will be added as a follow up.
   - https://github.com/apache/parquet-mr/pull/463

Schema index implementation:

   - Please review: https://github.com/apache/parquet-mr/pull/456/files
   - Write path only for now
   - More PR are blocked by it.
   - Will work on read path soon.

Parquet 1.8.3: PARQUET-1277

   - PARQUET-1217 Incorrect handling of missing values in Statistics
   - PARQUET-1246 Ignore float/double statistics in case of NaN
   - Will be used for a spark patch release
   - No other ticket requested






On Tue, Apr 24, 2018 at 12:05 PM, Julien Le Dem <ju...@gmail.com>
wrote:

> Happening now:
> https://meet.google.com/esu-yiit-mun
>