You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Julien Le Dem <ju...@dremio.com> on 2017/02/06 17:57:02 UTC
parquet sync
starting in 5 mins. at 10 am PT
https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
--
Julien
Re: parquet sync
Posted by Julien Le Dem <ju...@dremio.com>.
Notes:
Attendees/Agenda setting:
- Deepak (Vertica): works on parquet-cpp.
- add mechanism to ignore corrupt int96 statistics in c++.
- Lars (Cloudera): working on Impala.
- finalize Min/Max Statistics
- Tim (Cloudera): working on Impala.
- min max stats.
- Uwe (Blue Yonder): data scientist.
- Parquet-cpp 1.0 release
- Wes (NY 2sigma): working on parquet-cpp. Refactoring for arrow-0.2 and
parquet-cpp-1.0 releases.
- Julien (Premio):
- min-max stats
- releases
- Alex (Twitter):
- min-max stats
- Ryan (Netflix):
- stats
- humongous allocation
- 1.9.1 release.
- parquet-cli.
Min-Max/Ordering:
- We are adding a ColumnOrdering field that explicitly defines collation
in relation with logical type.
- The absence of this field implies the old statistics format. Making
the change backwards compatible.
- => if no Ordering for that column, statistics are signed.
- Summary:
- A column has a single order in the footer of the file.
- A row group can be sorted by a list of columns.
- Custom Ordering:
- as defined by Unicode, SQL
- need to clarify what exact spec we follow here.
- Action: reach out to Andrew Duffy, Julian Hyde for collation definition
- Action: move definition of orderings of Logical types to parquet-format
Deprecate int96:
- impala deprecated writing timestamps and decimal
- TODO: add JIRA
Dictionary is ordered:
- Julien: there is metadata for this already:
https://github.com/apache/parquet-format/blob/65e851eae174c1635a5d5ca38f69c81e7ecc7092/src/main/thrift/parquet.thrift#L371
add mechanism to ignore corrupt int96 statistics in c++.
- agreed
2017-02-06 9:57 GMT-08:00 Julien Le Dem <ju...@dremio.com>:
> starting in 5 mins. at 10 am PT
> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
>
> --
> Julien
>
--
Julien