You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Lars Volker <lv...@cloudera.com> on 2017/09/11 14:55:51 UTC

Date and time for next Parquet Sync

Hi All,

I'd like to propose to have the next Parquet Sync on Wednesday, Sep 13th,
at 9am PST. Possible topics would be the pull request to add a page index
to the format, ongoing work on bloom filters.

If Wednesday does not work for you, please propose another date and time.
Otherwise I'll send out a MR later today.

Cheers, Lars

Re: Date and time for next Parquet Sync

Posted by Julien Le Dem <ju...@gmail.com>.
Notes:
Parquet Sync Sept 13 2017:

Lars (Impala Cloudera - CA): want feedback on Puja’s pull request for page
index
Anna (Cloudera - Hungary)
Jim (Cloudera - CA): Bloom Filters
Ryan (Netflix - CA): parquet-cli zstd/lz4 to try out. Parquet format
release, logical type PR.
Junjie (Intel - Shanghai): Bloom filter status
Bikramjeet (Cloudera Impala - CA): clarify specification for column stats
and type for min/max storage
Wes (Twosigma - NY): C++
Julien (CA): patch release of parquet-mr

TZs: GMT-8, GMT-5, GMT+1, GMT+8
Time: 9am (SF), 12am (NY), 6pm (Budapest), 1am (Shanghai) !

 - Bloom Filter:
- Junjie submitted pull request for parquet-format and parquet-mr. bloom
filter utility + tests.
    - https://github.com/apache/parquet-format/pull/62/files
        - not to be merged right away but feedback
    - https://github.com/apache/parquet-mr/pull/425/files
        - to move to package protected or tests to start incremental merge
without making it public
    - Need review: Ryan, Julien, Jim
- compatibility, integration tests?
    - old compatibility test repo:
https://github.com/Parquet/parquet-compatibility
    - Arrow integration tests:
https://github.com/apache/arrow/tree/master/integration
    - Action: Anna, Lars to follow up with Cloudera

Build: travis-ci broken with latest linux thrift-7 incompatibility
 - parquet-mr should move to thrift-9: PARQUET-1103
 - pin thrift to fixed version in build like in parquet-format.

 - Page Index: https://github.com/apache/parquet-format/pull/63
   - Action review by end of next week: Julien, Ryan, Marcel
   - TODO (Lars?): move design doc to markdown in parquet-format
   - should add (brief) comments in thrift definition (clarify in review)

 - zstd/lz4:
   - Ryan has e version of parquet-cli working with zstd, lz4 and brotli
for experimentation
   - building with zstd backported was difficult. (provides hadoop jar)
   - anyone interested in running their own tests?
   - Lars to check at Cloudera.
   - Ryan to send out on the list
   - Wes built benchmarking fixtures in Cpp. todo write tests.
   - use some shareable dataset for validation (NY Taxi dataset?).

 - Logical type PR: https://github.com/apache/parquet-format/pull/51
- TODO: feedback
- reviewers: Julien

 - clarification of min max storage:
   -
https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L215
   - format of min and max values is the same as defined by the type.

- making releases:
  - want a parquet-format release for:
    - logical types (not merged yet)
    - page indexes (not merged yet)
    - sort order (merged)
  - we won’t block on bloom filter. We can make another release as soon as
it is ready.
  - Ryan to run the parquet-format release.
  - need volunteer for parquet-mr release.



On Wed, Sep 13, 2017 at 8:58 AM, Julien Le Dem <ju...@gmail.com>
wrote:

> The Parquet sync is starting now at:
> https://meet.google.com/ent-mvhf-twr
>
> On Tue, Sep 12, 2017 at 8:55 PM, Julien Le Dem <ju...@gmail.com>
> wrote:
>
>> +1
>>
>> On Mon, Sep 11, 2017 at 8:36 PM, Lars Volker <lv...@cloudera.com> wrote:
>>
>>> There were no objections so I sent out a meeting invite to everyone who
>>> was
>>> on the last invite. If you'd like to participate, too, please reply to
>>> this
>>> email.
>>>
>>> Cheers, Lars
>>>
>>> On Mon, Sep 11, 2017 at 11:06 AM, Ryan Blue <rb...@netflix.com.invalid>
>>> wrote:
>>>
>>> > That works for me.
>>> >
>>> > On Mon, Sep 11, 2017 at 7:55 AM, Lars Volker <lv...@cloudera.com> wrote:
>>> >
>>> > > Hi All,
>>> > >
>>> > > I'd like to propose to have the next Parquet Sync on Wednesday, Sep
>>> 13th,
>>> > > at 9am PST. Possible topics would be the pull request to add a page
>>> index
>>> > > to the format, ongoing work on bloom filters.
>>> > >
>>> > > If Wednesday does not work for you, please propose another date and
>>> time.
>>> > > Otherwise I'll send out a MR later today.
>>> > >
>>> > > Cheers, Lars
>>> > >
>>> >
>>> >
>>> >
>>> > --
>>> > Ryan Blue
>>> > Software Engineer
>>> > Netflix
>>> >
>>>
>>
>>
>

Re: Date and time for next Parquet Sync

Posted by Julien Le Dem <ju...@gmail.com>.
The Parquet sync is starting now at:
https://meet.google.com/ent-mvhf-twr

On Tue, Sep 12, 2017 at 8:55 PM, Julien Le Dem <ju...@gmail.com>
wrote:

> +1
>
> On Mon, Sep 11, 2017 at 8:36 PM, Lars Volker <lv...@cloudera.com> wrote:
>
>> There were no objections so I sent out a meeting invite to everyone who
>> was
>> on the last invite. If you'd like to participate, too, please reply to
>> this
>> email.
>>
>> Cheers, Lars
>>
>> On Mon, Sep 11, 2017 at 11:06 AM, Ryan Blue <rb...@netflix.com.invalid>
>> wrote:
>>
>> > That works for me.
>> >
>> > On Mon, Sep 11, 2017 at 7:55 AM, Lars Volker <lv...@cloudera.com> wrote:
>> >
>> > > Hi All,
>> > >
>> > > I'd like to propose to have the next Parquet Sync on Wednesday, Sep
>> 13th,
>> > > at 9am PST. Possible topics would be the pull request to add a page
>> index
>> > > to the format, ongoing work on bloom filters.
>> > >
>> > > If Wednesday does not work for you, please propose another date and
>> time.
>> > > Otherwise I'll send out a MR later today.
>> > >
>> > > Cheers, Lars
>> > >
>> >
>> >
>> >
>> > --
>> > Ryan Blue
>> > Software Engineer
>> > Netflix
>> >
>>
>
>

Re: Date and time for next Parquet Sync

Posted by Julien Le Dem <ju...@gmail.com>.
+1

On Mon, Sep 11, 2017 at 8:36 PM, Lars Volker <lv...@cloudera.com> wrote:

> There were no objections so I sent out a meeting invite to everyone who was
> on the last invite. If you'd like to participate, too, please reply to this
> email.
>
> Cheers, Lars
>
> On Mon, Sep 11, 2017 at 11:06 AM, Ryan Blue <rb...@netflix.com.invalid>
> wrote:
>
> > That works for me.
> >
> > On Mon, Sep 11, 2017 at 7:55 AM, Lars Volker <lv...@cloudera.com> wrote:
> >
> > > Hi All,
> > >
> > > I'd like to propose to have the next Parquet Sync on Wednesday, Sep
> 13th,
> > > at 9am PST. Possible topics would be the pull request to add a page
> index
> > > to the format, ongoing work on bloom filters.
> > >
> > > If Wednesday does not work for you, please propose another date and
> time.
> > > Otherwise I'll send out a MR later today.
> > >
> > > Cheers, Lars
> > >
> >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
> >
>

Re: Date and time for next Parquet Sync

Posted by Lars Volker <lv...@cloudera.com>.
There were no objections so I sent out a meeting invite to everyone who was
on the last invite. If you'd like to participate, too, please reply to this
email.

Cheers, Lars

On Mon, Sep 11, 2017 at 11:06 AM, Ryan Blue <rb...@netflix.com.invalid>
wrote:

> That works for me.
>
> On Mon, Sep 11, 2017 at 7:55 AM, Lars Volker <lv...@cloudera.com> wrote:
>
> > Hi All,
> >
> > I'd like to propose to have the next Parquet Sync on Wednesday, Sep 13th,
> > at 9am PST. Possible topics would be the pull request to add a page index
> > to the format, ongoing work on bloom filters.
> >
> > If Wednesday does not work for you, please propose another date and time.
> > Otherwise I'll send out a MR later today.
> >
> > Cheers, Lars
> >
>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Date and time for next Parquet Sync

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
That works for me.

On Mon, Sep 11, 2017 at 7:55 AM, Lars Volker <lv...@cloudera.com> wrote:

> Hi All,
>
> I'd like to propose to have the next Parquet Sync on Wednesday, Sep 13th,
> at 9am PST. Possible topics would be the pull request to add a page index
> to the format, ongoing work on bloom filters.
>
> If Wednesday does not work for you, please propose another date and time.
> Otherwise I'll send out a MR later today.
>
> Cheers, Lars
>



-- 
Ryan Blue
Software Engineer
Netflix