You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Julien Le Dem <ju...@dremio.com> on 2016/10/27 16:59:40 UTC
Parquet sync-up starting now
https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
--
Julien
Re: Parquet sync-up starting now
Posted by Wes McKinney <we...@gmail.com>.
Same. Thanks
On Fri, Oct 28, 2016 at 2:36 PM, Deepak Majeti <ma...@gmail.com> wrote:
> Julien,
>
> Can you please add me to the calendar invite for the sync-up meetings ?
> Thanks.
>
> On Thu, Oct 27, 2016 at 2:33 PM, Julien Le Dem <ju...@dremio.com> wrote:
>> Attendees/Agenda
>> Julien (Dremio):
>> - Parquet-format: arrow types parity.
>> - parquet-mr: Parquet-Arrow schema converter PR
>> Ryan (Netflix):
>> - present New Parquet cli
>> - Parquet sort order proposal
>> Gabor, Zoltan (Cloudera, file formats team):
>> - getting started
>> Uwe (Blue Yonder):
>> - parquet-cpp getting close to release
>> - type changes with arrow discussion
>>
>> Parquet logical types:
>> - Julien proposed new logical types to bring parity with Arrow: Union,
>> Intervals types, Null, Half Precision floats
>> - TODO(Julien): add LogicalType doc for new types.
>> - Union:
>> - differentiate between null union and projecting another value using
>> the union itself optional fields.
>> - describe union type constraints.
>> - Null: type for things that are always null. For example data coming from
>> schema discovery on son with a field always null.
>> - Interval Type:
>> - uses actual SQL spec for interval units
>> - deprecate existing Interval logical type.
>> - Half precision float: punt on that for now.
>> - defined in Arrow metadata
>> - actually not implemented in arrow-cpp and arrow-java
>> - possibly add physical type for half precision types.
>> - add encodings? See Ryan’s PR for float encoding
>>
>> - Uwe: TIMESTAMP_NANOS ?
>> - used in Pandas
>> - used in Hive (through loosely defined Parquet’s int96)
>> - debate wether we should support it or not.
>> - Possibly have an int64 or fixed length byte array to store it.
>> - TODO(Uwe): open a JIRA, Ryan comment
>>
>> Parquet-cli:
>> - Ryan's new parquet-cli
>> - easier to try encodings.
>> - look at data.
>> - some code from the kite project in Apache 2.
>>
>> Parquet sort order:
>> - current proposal: to have 2 separate min and max in stats block
>> - Ryan: to create a Pull Request.
>> - how to formally specify sort order (comparator/collation)
>> - standard database collations? Look into Calcite?
>>
>> Parquet-cpp release?
>> - fix bugs.
>> - release JIRA.
>>
>> next sync up in two weeks.
>>
>> On Thu, Oct 27, 2016 at 9:59 AM, Julien Le Dem <ju...@dremio.com> wrote:
>>
>>> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
>>>
>>> --
>>> Julien
>>>
>>
>>
>>
>> --
>> Julien
>
>
>
> --
> regards,
> Deepak Majeti
Re: Parquet sync-up starting now
Posted by Deepak Majeti <ma...@gmail.com>.
Julien,
Can you please add me to the calendar invite for the sync-up meetings ?
Thanks.
On Thu, Oct 27, 2016 at 2:33 PM, Julien Le Dem <ju...@dremio.com> wrote:
> Attendees/Agenda
> Julien (Dremio):
> - Parquet-format: arrow types parity.
> - parquet-mr: Parquet-Arrow schema converter PR
> Ryan (Netflix):
> - present New Parquet cli
> - Parquet sort order proposal
> Gabor, Zoltan (Cloudera, file formats team):
> - getting started
> Uwe (Blue Yonder):
> - parquet-cpp getting close to release
> - type changes with arrow discussion
>
> Parquet logical types:
> - Julien proposed new logical types to bring parity with Arrow: Union,
> Intervals types, Null, Half Precision floats
> - TODO(Julien): add LogicalType doc for new types.
> - Union:
> - differentiate between null union and projecting another value using
> the union itself optional fields.
> - describe union type constraints.
> - Null: type for things that are always null. For example data coming from
> schema discovery on son with a field always null.
> - Interval Type:
> - uses actual SQL spec for interval units
> - deprecate existing Interval logical type.
> - Half precision float: punt on that for now.
> - defined in Arrow metadata
> - actually not implemented in arrow-cpp and arrow-java
> - possibly add physical type for half precision types.
> - add encodings? See Ryan’s PR for float encoding
>
> - Uwe: TIMESTAMP_NANOS ?
> - used in Pandas
> - used in Hive (through loosely defined Parquet’s int96)
> - debate wether we should support it or not.
> - Possibly have an int64 or fixed length byte array to store it.
> - TODO(Uwe): open a JIRA, Ryan comment
>
> Parquet-cli:
> - Ryan's new parquet-cli
> - easier to try encodings.
> - look at data.
> - some code from the kite project in Apache 2.
>
> Parquet sort order:
> - current proposal: to have 2 separate min and max in stats block
> - Ryan: to create a Pull Request.
> - how to formally specify sort order (comparator/collation)
> - standard database collations? Look into Calcite?
>
> Parquet-cpp release?
> - fix bugs.
> - release JIRA.
>
> next sync up in two weeks.
>
> On Thu, Oct 27, 2016 at 9:59 AM, Julien Le Dem <ju...@dremio.com> wrote:
>
>> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
>>
>> --
>> Julien
>>
>
>
>
> --
> Julien
--
regards,
Deepak Majeti
Re: Parquet sync-up starting now
Posted by Julien Le Dem <ju...@dremio.com>.
Attendees/Agenda
Julien (Dremio):
- Parquet-format: arrow types parity.
- parquet-mr: Parquet-Arrow schema converter PR
Ryan (Netflix):
- present New Parquet cli
- Parquet sort order proposal
Gabor, Zoltan (Cloudera, file formats team):
- getting started
Uwe (Blue Yonder):
- parquet-cpp getting close to release
- type changes with arrow discussion
Parquet logical types:
- Julien proposed new logical types to bring parity with Arrow: Union,
Intervals types, Null, Half Precision floats
- TODO(Julien): add LogicalType doc for new types.
- Union:
- differentiate between null union and projecting another value using
the union itself optional fields.
- describe union type constraints.
- Null: type for things that are always null. For example data coming from
schema discovery on son with a field always null.
- Interval Type:
- uses actual SQL spec for interval units
- deprecate existing Interval logical type.
- Half precision float: punt on that for now.
- defined in Arrow metadata
- actually not implemented in arrow-cpp and arrow-java
- possibly add physical type for half precision types.
- add encodings? See Ryan’s PR for float encoding
- Uwe: TIMESTAMP_NANOS ?
- used in Pandas
- used in Hive (through loosely defined Parquet’s int96)
- debate wether we should support it or not.
- Possibly have an int64 or fixed length byte array to store it.
- TODO(Uwe): open a JIRA, Ryan comment
Parquet-cli:
- Ryan's new parquet-cli
- easier to try encodings.
- look at data.
- some code from the kite project in Apache 2.
Parquet sort order:
- current proposal: to have 2 separate min and max in stats block
- Ryan: to create a Pull Request.
- how to formally specify sort order (comparator/collation)
- standard database collations? Look into Calcite?
Parquet-cpp release?
- fix bugs.
- release JIRA.
next sync up in two weeks.
On Thu, Oct 27, 2016 at 9:59 AM, Julien Le Dem <ju...@dremio.com> wrote:
> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
>
> --
> Julien
>
--
Julien