You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Julien Le Dem <ju...@dremio.com> on 2017/02/06 17:57:02 UTC

parquet sync

starting in 5 mins. at 10 am PT
https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up

-- 
Julien

Re: parquet sync

Posted by Julien Le Dem <ju...@dremio.com>.
Notes:
Attendees/Agenda setting:
 - Deepak (Vertica): works on parquet-cpp.
   - add mechanism to ignore corrupt int96 statistics in c++.
- Lars (Cloudera): working on Impala.
   - finalize Min/Max Statistics
- Tim (Cloudera): working on Impala.
   - min max stats.
- Uwe (Blue Yonder): data scientist.
   - Parquet-cpp 1.0 release
- Wes (NY 2sigma): working on parquet-cpp. Refactoring for arrow-0.2 and
parquet-cpp-1.0 releases.
- Julien (Premio):
   - min-max stats
   - releases
- Alex (Twitter):
   - min-max stats
- Ryan (Netflix):
  - stats
  - humongous allocation
  - 1.9.1 release.
  - parquet-cli.

Min-Max/Ordering:
 - We are adding a ColumnOrdering field that explicitly defines collation
in relation with logical type.
   - The absence of this field implies the old statistics format. Making
the change backwards compatible.
   - => if no Ordering for that column, statistics are signed.
   - Summary:
      - A column has a single order in the footer of the file.
      - A row group can be sorted by a list of columns.
 - Custom Ordering:
   - as defined by Unicode, SQL
   - need to clarify what exact spec we follow here.
 - Action: reach out to Andrew Duffy, Julian Hyde for collation definition
 - Action: move definition of orderings of Logical types to parquet-format

Deprecate int96:
  - impala deprecated writing timestamps and decimal
  - TODO: add JIRA

Dictionary is ordered:
 - Julien: there is metadata for this already:
https://github.com/apache/parquet-format/blob/65e851eae174c1635a5d5ca38f69c81e7ecc7092/src/main/thrift/parquet.thrift#L371

add mechanism to ignore corrupt int96 statistics in c++.
  - agreed





2017-02-06 9:57 GMT-08:00 Julien Le Dem <ju...@dremio.com>:

> starting in 5 mins. at 10 am PT
> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
>
> --
> Julien
>



-- 
Julien