You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Julien Le Dem <ju...@dremio.com> on 2016/10/27 16:59:40 UTC

Parquet sync-up starting now

https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up

-- 
Julien

Re: Parquet sync-up starting now

Posted by Wes McKinney <we...@gmail.com>.
Same. Thanks

On Fri, Oct 28, 2016 at 2:36 PM, Deepak Majeti <ma...@gmail.com> wrote:
> Julien,
>
> Can you please add me to the calendar invite for the sync-up meetings ?
> Thanks.
>
> On Thu, Oct 27, 2016 at 2:33 PM, Julien Le Dem <ju...@dremio.com> wrote:
>>  Attendees/Agenda
>> Julien (Dremio):
>>  - Parquet-format: arrow types parity.
>>  - parquet-mr: Parquet-Arrow schema converter PR
>> Ryan (Netflix):
>>  - present New Parquet cli
>>  - Parquet sort order proposal
>> Gabor, Zoltan (Cloudera, file formats team):
>>  - getting started
>> Uwe (Blue Yonder):
>>  - parquet-cpp getting close to release
>>  - type changes with arrow discussion
>>
>> Parquet logical types:
>>  - Julien proposed new logical types to bring parity with Arrow: Union,
>> Intervals types, Null, Half Precision floats
>>  - TODO(Julien): add LogicalType doc for new types.
>>  - Union:
>>     - differentiate between null union and projecting another value using
>> the union itself optional fields.
>>     - describe union type constraints.
>>  - Null: type for things that are always null. For example data coming from
>> schema discovery on son with a field always null.
>>  - Interval Type:
>>    - uses actual SQL spec for interval units
>>    - deprecate existing Interval logical type.
>>  - Half precision float: punt on that for now.
>>    - defined in Arrow metadata
>>    - actually not implemented in arrow-cpp and arrow-java
>>    - possibly add physical type for half precision types.
>>    - add encodings?  See Ryan’s PR for float encoding
>>
>>  - Uwe: TIMESTAMP_NANOS ?
>>    - used in Pandas
>>    - used in Hive (through loosely defined Parquet’s int96)
>>    - debate wether we should support it or not.
>>    - Possibly have an int64 or fixed length byte array to store it.
>>    - TODO(Uwe): open a JIRA, Ryan comment
>>
>> Parquet-cli:
>>   - Ryan's new parquet-cli
>>   - easier to try encodings.
>>   - look at data.
>>   - some code from the kite project in Apache 2.
>>
>> Parquet sort order:
>>   - current proposal: to have 2 separate min and max in stats block
>>   - Ryan: to create a Pull Request.
>>   - how to formally specify sort order (comparator/collation)
>>   - standard database collations? Look into Calcite?
>>
>> Parquet-cpp release?
>>   - fix bugs.
>>   - release JIRA.
>>
>> next sync up in two weeks.
>>
>> On Thu, Oct 27, 2016 at 9:59 AM, Julien Le Dem <ju...@dremio.com> wrote:
>>
>>> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
>>>
>>> --
>>> Julien
>>>
>>
>>
>>
>> --
>> Julien
>
>
>
> --
> regards,
> Deepak Majeti

Re: Parquet sync-up starting now

Posted by Deepak Majeti <ma...@gmail.com>.
Julien,

Can you please add me to the calendar invite for the sync-up meetings ?
Thanks.

On Thu, Oct 27, 2016 at 2:33 PM, Julien Le Dem <ju...@dremio.com> wrote:
>  Attendees/Agenda
> Julien (Dremio):
>  - Parquet-format: arrow types parity.
>  - parquet-mr: Parquet-Arrow schema converter PR
> Ryan (Netflix):
>  - present New Parquet cli
>  - Parquet sort order proposal
> Gabor, Zoltan (Cloudera, file formats team):
>  - getting started
> Uwe (Blue Yonder):
>  - parquet-cpp getting close to release
>  - type changes with arrow discussion
>
> Parquet logical types:
>  - Julien proposed new logical types to bring parity with Arrow: Union,
> Intervals types, Null, Half Precision floats
>  - TODO(Julien): add LogicalType doc for new types.
>  - Union:
>     - differentiate between null union and projecting another value using
> the union itself optional fields.
>     - describe union type constraints.
>  - Null: type for things that are always null. For example data coming from
> schema discovery on son with a field always null.
>  - Interval Type:
>    - uses actual SQL spec for interval units
>    - deprecate existing Interval logical type.
>  - Half precision float: punt on that for now.
>    - defined in Arrow metadata
>    - actually not implemented in arrow-cpp and arrow-java
>    - possibly add physical type for half precision types.
>    - add encodings?  See Ryan’s PR for float encoding
>
>  - Uwe: TIMESTAMP_NANOS ?
>    - used in Pandas
>    - used in Hive (through loosely defined Parquet’s int96)
>    - debate wether we should support it or not.
>    - Possibly have an int64 or fixed length byte array to store it.
>    - TODO(Uwe): open a JIRA, Ryan comment
>
> Parquet-cli:
>   - Ryan's new parquet-cli
>   - easier to try encodings.
>   - look at data.
>   - some code from the kite project in Apache 2.
>
> Parquet sort order:
>   - current proposal: to have 2 separate min and max in stats block
>   - Ryan: to create a Pull Request.
>   - how to formally specify sort order (comparator/collation)
>   - standard database collations? Look into Calcite?
>
> Parquet-cpp release?
>   - fix bugs.
>   - release JIRA.
>
> next sync up in two weeks.
>
> On Thu, Oct 27, 2016 at 9:59 AM, Julien Le Dem <ju...@dremio.com> wrote:
>
>> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
>>
>> --
>> Julien
>>
>
>
>
> --
> Julien



-- 
regards,
Deepak Majeti

Re: Parquet sync-up starting now

Posted by Julien Le Dem <ju...@dremio.com>.
 Attendees/Agenda
Julien (Dremio):
 - Parquet-format: arrow types parity.
 - parquet-mr: Parquet-Arrow schema converter PR
Ryan (Netflix):
 - present New Parquet cli
 - Parquet sort order proposal
Gabor, Zoltan (Cloudera, file formats team):
 - getting started
Uwe (Blue Yonder):
 - parquet-cpp getting close to release
 - type changes with arrow discussion

Parquet logical types:
 - Julien proposed new logical types to bring parity with Arrow: Union,
Intervals types, Null, Half Precision floats
 - TODO(Julien): add LogicalType doc for new types.
 - Union:
    - differentiate between null union and projecting another value using
the union itself optional fields.
    - describe union type constraints.
 - Null: type for things that are always null. For example data coming from
schema discovery on son with a field always null.
 - Interval Type:
   - uses actual SQL spec for interval units
   - deprecate existing Interval logical type.
 - Half precision float: punt on that for now.
   - defined in Arrow metadata
   - actually not implemented in arrow-cpp and arrow-java
   - possibly add physical type for half precision types.
   - add encodings?  See Ryan’s PR for float encoding

 - Uwe: TIMESTAMP_NANOS ?
   - used in Pandas
   - used in Hive (through loosely defined Parquet’s int96)
   - debate wether we should support it or not.
   - Possibly have an int64 or fixed length byte array to store it.
   - TODO(Uwe): open a JIRA, Ryan comment

Parquet-cli:
  - Ryan's new parquet-cli
  - easier to try encodings.
  - look at data.
  - some code from the kite project in Apache 2.

Parquet sort order:
  - current proposal: to have 2 separate min and max in stats block
  - Ryan: to create a Pull Request.
  - how to formally specify sort order (comparator/collation)
  - standard database collations? Look into Calcite?

Parquet-cpp release?
  - fix bugs.
  - release JIRA.

next sync up in two weeks.

On Thu, Oct 27, 2016 at 9:59 AM, Julien Le Dem <ju...@dremio.com> wrote:

> https://hangouts.google.com/hangouts/_/dremio.com/parquet-sync-up
>
> --
> Julien
>



-- 
Julien