You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Julien Le Dem <ju...@wework.com> on 2018/02/14 17:02:35 UTC
parquet sync
starting now on google hangout:
https://meet.google.com/nhj-cvpt-atx
Re: parquet sync
Posted by Julien Le Dem <ju...@wework.com>.
Notes:
Attendees, Agenda:
Lars (Cloudera Impala): Zoltan proposal to get to a more stable release or
feature flag
Qinghui, Benoit, Miguel, Justin (Criteo): Pull request. Parquet-proto.
PARQUET-968
Gidon (IBM): encryption JIRA. On track
Ryan (Netflix): getting 1.10 out
Zoltan (Cloudera): column index fixes from Gabor, ideas on list
Anna (Cloudera): Compatibility issues.
Discussion:
Compatibility issues and flags:
- Define standard flags for features that are supported or not:
-
- New Compression algorithms: Brotli, ZStandard, ...
- New Encodings (since v1): Delta-int, …
- Flags are standards across parquet implementations to limit usage of
features to a set supported across all components
- Define (a few) profiles with the sets of features supported for a
given version (1.0, 2.0, 3.0)
-
- These are goals for any implementation to support.
- To be discussed: optional features that can be ignored and don’t
prevent reading the file (ex: bloom filters, page index)
- Zoltan: create jira and google doc with a design proposal
Parquet-proto:
- Criteo to validate and give +1 :
https://github.com/apache/parquet-mr/pull/411
- New feature needed:
-
- support: empty list vs null list.
- Crate will Create jira and submit New PR
Column indexes: (By Gabor) PR: https://github.com/apache/parquet-mr/pull/456
- Needs modification in parquet-format utils (not the thrift metadata)
=> new release
- first version writing into parquet-mr
- Action:
-
- Ryan to review
- Ryan and Zoltan to follow up on making parquet-format release
On Wed, Feb 14, 2018 at 9:02 AM, Julien Le Dem <ju...@wework.com>
wrote:
> starting now on google hangout:
> https://meet.google.com/nhj-cvpt-atx
>