You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Julien Le Dem <ju...@gmail.com> on 2018/03/13 17:01:56 UTC
Parquet sync starting now
https://meet.google.com/jpy-mump-ngc
Re: Parquet sync starting now
Posted by Julien Le Dem <ju...@gmail.com>.
Notes:
Attendees:
- Julien (WeWork): proto, release
- Marcel: Iceberg
- Zoltan, Gabor, Anna (Cloudera): bug null values.
-
- https://issues.apache.org/jira/projects/PARQUET/issues/PARQUET-1222
<https://issues.apache.org/jira/projects/PARQUET/issues/PARQUET-1222?filter=allopenissues>
- https://issues.apache.org/jira/projects/PARQUET/issues/PARQUET-1217
<https://issues.apache.org/jira/projects/PARQUET/issues/PARQUET-1217?filter=allissues>
- Lars, Zoltan Borok-nagy (Cloudera Impala): new way of merging changes
after moving to gitbox.
- Deepak (Vertica): encryption in c++
- Benoit, Singhue (Criteo): protobuf. Merging
-
- https://github.com/apache/parquet-mr/pull/411
- PARQUET-968
- Chao (Uber): encryption, Native Rust implementation.
- Gidon (IBM): encryption jira, status and next steps.
- Protobuf:
-
- https://github.com/apache/parquet-mr/pull/411
- In use for a few weeks.
- Introduces a breaking change:
-
- Empty maps become null maps
- Will add flag to avoid compatibility break
- Rust:
-
- Been working for 1 year
- 2 contributors.
- Read implementation only for now.
- Want to contribute to the parquet project.
- Plan to have Parquet-rust using Arrow-rust
- Personal project.
- Encryption: https://issues.apache.org/jira/browse/PARQUET-1178
-
- Need review: https://github.com/apache/parquet-format/pull/84/files
- Chao: Hive table use parquet format. Different engines (Presto).
use the data so security should be implemented at the node level
- Deepak: make sure there’s no incompatibility issues.
- Gidon: has been looking at the C++ implementation. Cross
compatibility working.
- Action:
-
- Provide feedback on PR and doc.
- Giddon to share java.
- Deepak take a look and provide cpp point of view
- Bugs:
-
- PARQUET-1222
<https://issues.apache.org/jira/projects/PARQUET/issues/PARQUET-1222?filter=allopenissues>:
Handling
of NaN and 0+ 0-:
-
- 1: fix current behavior (ignore NaN in stats and 0+-)
- 2: provide better total ordering including NaN etc
- PARQUET-1217: if null_count if populated but not min/max old
parquet use default 0 min max for numbers.
-
- Need a fix and parquet-mr
- Old readers will have problems:
-
- Possibly provide a 1.8.3 release with the bug fix for project
depending on an old version.
- For example Spark:
https://github.com/apache/spark/blob/34811e0b908449fd59bca476604612b1d200778d/pom.xml#L132
- Will reach out to the spark team to see if they can upgrade.
On Tue, Mar 13, 2018 at 10:01 AM, Julien Le Dem <ju...@gmail.com>
wrote:
> https://meet.google.com/jpy-mump-ngc
>