You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Ryan Blue <rb...@netflix.com.INVALID> on 2019/09/06 21:00:04 UTC

Re: [VOTE] Parquet Bloom filter spec sign-off

+1 on the current spec. Is everyone else still +1?

Sorry for the delay, I didn't realize that everything had been addressed
and I didn't see the email from Jim in my inbox.

On Wed, Aug 28, 2019 at 10:13 AM Jim Apple <jb...@apache.org> wrote:

> We've got +1's from Zoltan and Gabor. Ryan, you've committed a few BF
> patches that were written in response to your feedback on this list. Are
> you in a position to vote +1 now, or do you have further concerns we could
> address?
>
> On 2019/07/31 02:17:15, 俊杰陈 <cj...@gmail.com> wrote:
> > Dear Parquet developers
> >
> > We still need your vote!
> >
> >
> > On Wed, Jul 24, 2019 at 9:30 PM 俊杰陈 <cj...@gmail.com> wrote:
> > >
> > > Hi @Ryan Blue  @Wes McKinney
> > >
> > > We need your valuable vote, any feedback is welcome as well.
> > >
> > > On Tue, Jul 23, 2019 at 1:24 PM 俊杰陈 <cj...@gmail.com> wrote:
> > > >
> > > > Call for voting again.
> > > >
> > > > On Fri, Jul 19, 2019 at 1:17 PM 俊杰陈 <cj...@gmail.com> wrote:
> > > > >
> > > > > Dear Parquet developers
> > > > >
> > > > > We need more votes, please help to vote on this.
> > > > >
> > > > > On Wed, Jul 17, 2019 at 3:42 PM Gabor Szadovszky
> > > > > <ga...@cloudera.com.invalid> wrote:
> > > > > >
> > > > > > After getting in PARQUET-1625 I vote again for having bloom
> filter spec and
> > > > > > the thrift file update as is in parquet-format master.
> > > > > > +1 (binding)
> > > > > >
> > > > > > On Mon, Jul 15, 2019 at 3:23 PM 俊杰陈 <cj...@gmail.com> wrote:
> > > > > >
> > > > > > > Thanks Gabor, It's never too late to make it better. We don't
> have to
> > > > > > > run it in a hurry, it has been developed for a long time yet.:)
> > > > > > >
> > > > > > > The thrift file is indeed a bit lag behind the spec. As the
> spec
> > > > > > > defined, the bloom filter data is stored near the footer which
> means
> > > > > > > we don't have to handle it like the page. Therefore, I just
> opened a
> > > > > > > jira to remove bloom_filter_page_header in PageHeader
> structure, while
> > > > > > > the BloomFitlerHeader is kept intentionally for convenience.
> Since the
> > > > > > > spec and the thrift should be aligned with each other
> eventually, so
> > > > > > > the vote target is both of them.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Jul 15, 2019 at 7:48 PM Gabor Szadovszky
> > > > > > > <ga...@cloudera.com.invalid> wrote:
> > > > > > > >
> > > > > > > > Hi Junjie,
> > > > > > > >
> > > > > > > > Sorry for bringing up this a bit late but I have some
> problems with the
> > > > > > > > format update. The parquet.thrift file is updated to have
> the bloom
> > > > > > > filters
> > > > > > > > as a page (just as dictionaries and data pages). Meanwhile,
> the spec
> > > > > > > > (BloomFilter.md) says that the bloom filter is stored near
> the footer.
> > > > > > > So,
> > > > > > > > if the bloom filter is not part of the row-groups (like
> column indexes) I
> > > > > > > > would not add it as a page. See the struct ColumnIndex in
> the thrift
> > > > > > > file.
> > > > > > > > This struct is not referenced anywhere in it only declared.
> It was done
> > > > > > > > this way because we don't parse it in the same way as we
> parse the pages.
> > > > > > > >
> > > > > > > > Currently, I am not 100% sure about the target of this vote.
> If it is a
> > > > > > > > vote about adding bloom filters in general then it is a +1
> (binding). If
> > > > > > > it
> > > > > > > > is about adding the bloom filters to parquet-format as is
> then, it is a
> > > > > > > -1
> > > > > > > > (binding) until we fix the issue above.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Gabor
> > > > > > > >
> > > > > > > > On Mon, Jul 15, 2019 at 11:45 AM Gidon Gershinsky <
> gg5070@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1 (non-binding)
> > > > > > > > >
> > > > > > > > > On Mon, Jul 15, 2019 at 12:08 PM Zoltan Ivanfi
> <zi@cloudera.com.invalid
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > +1 (binding)
> > > > > > > > > >
> > > > > > > > > > On Mon, Jul 15, 2019 at 9:57 AM 俊杰陈 <cj...@gmail.com>
> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Dear Parquet developers
> > > > > > > > > > >
> > > > > > > > > > > I'd like to resume this vote, you can start to vote
> now. Thanks for
> > > > > > > > > your
> > > > > > > > > > time.
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Jul 10, 2019 at 9:29 PM 俊杰陈 <
> cjjnjust@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > I see, will resume this next week.  Thanks.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Jul 10, 2019 at 5:26 PM Zoltan Ivanfi
> > > > > > > > > <zi...@cloudera.com.invalid>
> > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi Junjie,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Since there are ongoing improvements addressing
> review
> > > > > > > comments, I
> > > > > > > > > > would
> > > > > > > > > > > > > hold off with the vote for a few more days until
> the
> > > > > > > specification
> > > > > > > > > > settles.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Br,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Zoltan
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Jul 10, 2019 at 9:32 AM 俊杰陈 <
> cjjnjust@gmail.com>
> > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Parquet committers and developers
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > We are waiting for your important ballot:)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Jul 9, 2019 at 10:21 AM 俊杰陈 <
> cjjnjust@gmail.com>
> > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Yes, there are some public benchmark results,
> such as the
> > > > > > > > > > official
> > > > > > > > > > > > > > > benchmark from xxhash site (
> http://www.xxhash.com/) and
> > > > > > > > > > published
> > > > > > > > > > > > > > > comparison from smhasher project
> > > > > > > > > > > > > > > (https://github.com/rurban/smhasher/).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Tue, Jul 9, 2019 at 5:25 AM Wes McKinney <
> > > > > > > > > wesmckinn@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Do you have any benchmark data to support
> the choice of
> > > > > > > hash
> > > > > > > > > > function?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Wed, Jul 3, 2019 at 8:41 AM 俊杰陈 <
> cjjnjust@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Dear Parquet developers
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > To simplify the voting, I 'd like to
> update voting
> > > > > > > content
> > > > > > > > > > to the
> > > > > > > > > > > > > > spec
> > > > > > > > > > > > > > > > > with xxHash hash strategy. Now you can
> reply with +1
> > > > > > > or -1.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks for your participation.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Tue, Jul 2, 2019 at 10:23 AM 俊杰陈 <
> > > > > > > cjjnjust@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Dear Parquet developers
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Parquet Bloom filter has been developed
> for a while,
> > > > > > > per
> > > > > > > > > > the
> > > > > > > > > > > > > > discussion on the mail list, it's time to call a
> vote for
> > > > > > > spec to
> > > > > > > > > > move
> > > > > > > > > > > > > > forward. The current spec can be found at
> > > > > > > > > > > > > >
> > > > > > > > > >
> https://github.com/apache/parquet-format/blob/master/BloomFilter.md.
> > > > > > > > > > > > > > There are some different options about the
> internal hash
> > > > > > > choice
> > > > > > > > > of
> > > > > > > > > > Bloom
> > > > > > > > > > > > > > filter and the PR is for that concern.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > So I 'd like to propose to vote the spec
> + hash
> > > > > > > option,
> > > > > > > > > for
> > > > > > > > > > > > > > example:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > +1 to spec and xxHash
> > > > > > > > > > > > > > > > > > +1 to spec and murmur3
> > > > > > > > > > > > > > > > > > ...
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Please help to vote, any feedback is
> also welcome in
> > > > > > > the
> > > > > > > > > > > > > > discussion thread.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks & Best Regards
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > > Thanks & Best Regards
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > Thanks & Best Regards
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > Thanks & Best Regards
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Thanks & Best Regards
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Thanks & Best Regards
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Thanks & Best Regards
> > > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Thanks & Best Regards
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks & Best Regards
> > >
> > >
> > >
> > > --
> > > Thanks & Best Regards
> >
> >
> >
> > --
> > Thanks & Best Regards
> >
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: [VOTE] Parquet Bloom filter spec sign-off

Posted by Jim Apple <jb...@apache.org>.

On 2019/09/18 20:52:30, Wes McKinney <we...@gmail.com> wrote: 
> We have just worked to move almost all hashing in Apache Arrow to xxh3
> -- I may have lost it in the mix, but are we dropping murmur3?

Yep: https://github.com/apache/parquet-format/commit/8f1783ec0b273e89c884b46c0f527d0a48321826#diff-d96aef0e8954afde569c8b40b8748081

Re: [VOTE] Parquet Bloom filter spec sign-off

Posted by Wes McKinney <we...@gmail.com>.
We have just worked to move almost all hashing in Apache Arrow to xxh3
-- I may have lost it in the mix, but are we dropping murmur3? Per
discussion in ARROW-3298 that would definitely be our preference.

On Mon, Sep 9, 2019 at 2:17 PM Wes McKinney <we...@gmail.com> wrote:
>
> +1 from me
>
> On Fri, Sep 6, 2019 at 4:00 PM Ryan Blue <rb...@netflix.com.invalid> wrote:
> >
> > +1 on the current spec. Is everyone else still +1?
> >
> > Sorry for the delay, I didn't realize that everything had been addressed
> > and I didn't see the email from Jim in my inbox.
> >
> > On Wed, Aug 28, 2019 at 10:13 AM Jim Apple <jb...@apache.org> wrote:
> >
> > > We've got +1's from Zoltan and Gabor. Ryan, you've committed a few BF
> > > patches that were written in response to your feedback on this list. Are
> > > you in a position to vote +1 now, or do you have further concerns we could
> > > address?
> > >
> > > On 2019/07/31 02:17:15, 俊杰陈 <cj...@gmail.com> wrote:
> > > > Dear Parquet developers
> > > >
> > > > We still need your vote!
> > > >
> > > >
> > > > On Wed, Jul 24, 2019 at 9:30 PM 俊杰陈 <cj...@gmail.com> wrote:
> > > > >
> > > > > Hi @Ryan Blue  @Wes McKinney
> > > > >
> > > > > We need your valuable vote, any feedback is welcome as well.
> > > > >
> > > > > On Tue, Jul 23, 2019 at 1:24 PM 俊杰陈 <cj...@gmail.com> wrote:
> > > > > >
> > > > > > Call for voting again.
> > > > > >
> > > > > > On Fri, Jul 19, 2019 at 1:17 PM 俊杰陈 <cj...@gmail.com> wrote:
> > > > > > >
> > > > > > > Dear Parquet developers
> > > > > > >
> > > > > > > We need more votes, please help to vote on this.
> > > > > > >
> > > > > > > On Wed, Jul 17, 2019 at 3:42 PM Gabor Szadovszky
> > > > > > > <ga...@cloudera.com.invalid> wrote:
> > > > > > > >
> > > > > > > > After getting in PARQUET-1625 I vote again for having bloom
> > > filter spec and
> > > > > > > > the thrift file update as is in parquet-format master.
> > > > > > > > +1 (binding)
> > > > > > > >
> > > > > > > > On Mon, Jul 15, 2019 at 3:23 PM 俊杰陈 <cj...@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > Thanks Gabor, It's never too late to make it better. We don't
> > > have to
> > > > > > > > > run it in a hurry, it has been developed for a long time yet.:)
> > > > > > > > >
> > > > > > > > > The thrift file is indeed a bit lag behind the spec. As the
> > > spec
> > > > > > > > > defined, the bloom filter data is stored near the footer which
> > > means
> > > > > > > > > we don't have to handle it like the page. Therefore, I just
> > > opened a
> > > > > > > > > jira to remove bloom_filter_page_header in PageHeader
> > > structure, while
> > > > > > > > > the BloomFitlerHeader is kept intentionally for convenience.
> > > Since the
> > > > > > > > > spec and the thrift should be aligned with each other
> > > eventually, so
> > > > > > > > > the vote target is both of them.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Jul 15, 2019 at 7:48 PM Gabor Szadovszky
> > > > > > > > > <ga...@cloudera.com.invalid> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Junjie,
> > > > > > > > > >
> > > > > > > > > > Sorry for bringing up this a bit late but I have some
> > > problems with the
> > > > > > > > > > format update. The parquet.thrift file is updated to have
> > > the bloom
> > > > > > > > > filters
> > > > > > > > > > as a page (just as dictionaries and data pages). Meanwhile,
> > > the spec
> > > > > > > > > > (BloomFilter.md) says that the bloom filter is stored near
> > > the footer.
> > > > > > > > > So,
> > > > > > > > > > if the bloom filter is not part of the row-groups (like
> > > column indexes) I
> > > > > > > > > > would not add it as a page. See the struct ColumnIndex in
> > > the thrift
> > > > > > > > > file.
> > > > > > > > > > This struct is not referenced anywhere in it only declared.
> > > It was done
> > > > > > > > > > this way because we don't parse it in the same way as we
> > > parse the pages.
> > > > > > > > > >
> > > > > > > > > > Currently, I am not 100% sure about the target of this vote.
> > > If it is a
> > > > > > > > > > vote about adding bloom filters in general then it is a +1
> > > (binding). If
> > > > > > > > > it
> > > > > > > > > > is about adding the bloom filters to parquet-format as is
> > > then, it is a
> > > > > > > > > -1
> > > > > > > > > > (binding) until we fix the issue above.
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > > Gabor
> > > > > > > > > >
> > > > > > > > > > On Mon, Jul 15, 2019 at 11:45 AM Gidon Gershinsky <
> > > gg5070@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > +1 (non-binding)
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Jul 15, 2019 at 12:08 PM Zoltan Ivanfi
> > > <zi@cloudera.com.invalid
> > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > +1 (binding)
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Jul 15, 2019 at 9:57 AM 俊杰陈 <cj...@gmail.com>
> > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Dear Parquet developers
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'd like to resume this vote, you can start to vote
> > > now. Thanks for
> > > > > > > > > > > your
> > > > > > > > > > > > time.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Jul 10, 2019 at 9:29 PM 俊杰陈 <
> > > cjjnjust@gmail.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I see, will resume this next week.  Thanks.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Jul 10, 2019 at 5:26 PM Zoltan Ivanfi
> > > > > > > > > > > <zi...@cloudera.com.invalid>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Junjie,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Since there are ongoing improvements addressing
> > > review
> > > > > > > > > comments, I
> > > > > > > > > > > > would
> > > > > > > > > > > > > > > hold off with the vote for a few more days until
> > > the
> > > > > > > > > specification
> > > > > > > > > > > > settles.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Br,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Zoltan
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, Jul 10, 2019 at 9:32 AM 俊杰陈 <
> > > cjjnjust@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi Parquet committers and developers
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > We are waiting for your important ballot:)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Tue, Jul 9, 2019 at 10:21 AM 俊杰陈 <
> > > cjjnjust@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Yes, there are some public benchmark results,
> > > such as the
> > > > > > > > > > > > official
> > > > > > > > > > > > > > > > > benchmark from xxhash site (
> > > http://www.xxhash.com/) and
> > > > > > > > > > > > published
> > > > > > > > > > > > > > > > > comparison from smhasher project
> > > > > > > > > > > > > > > > > (https://github.com/rurban/smhasher/).
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Tue, Jul 9, 2019 at 5:25 AM Wes McKinney <
> > > > > > > > > > > wesmckinn@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Do you have any benchmark data to support
> > > the choice of
> > > > > > > > > hash
> > > > > > > > > > > > function?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Wed, Jul 3, 2019 at 8:41 AM 俊杰陈 <
> > > cjjnjust@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Dear Parquet developers
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > To simplify the voting, I 'd like to
> > > update voting
> > > > > > > > > content
> > > > > > > > > > > > to the
> > > > > > > > > > > > > > > > spec
> > > > > > > > > > > > > > > > > > > with xxHash hash strategy. Now you can
> > > reply with +1
> > > > > > > > > or -1.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks for your participation.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > On Tue, Jul 2, 2019 at 10:23 AM 俊杰陈 <
> > > > > > > > > cjjnjust@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Dear Parquet developers
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Parquet Bloom filter has been developed
> > > for a while,
> > > > > > > > > per
> > > > > > > > > > > > the
> > > > > > > > > > > > > > > > discussion on the mail list, it's time to call a
> > > vote for
> > > > > > > > > spec to
> > > > > > > > > > > > move
> > > > > > > > > > > > > > > > forward. The current spec can be found at
> > > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > https://github.com/apache/parquet-format/blob/master/BloomFilter.md.
> > > > > > > > > > > > > > > > There are some different options about the
> > > internal hash
> > > > > > > > > choice
> > > > > > > > > > > of
> > > > > > > > > > > > Bloom
> > > > > > > > > > > > > > > > filter and the PR is for that concern.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > So I 'd like to propose to vote the spec
> > > + hash
> > > > > > > > > option,
> > > > > > > > > > > for
> > > > > > > > > > > > > > > > example:
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > +1 to spec and xxHash
> > > > > > > > > > > > > > > > > > > > +1 to spec and murmur3
> > > > > > > > > > > > > > > > > > > > ...
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Please help to vote, any feedback is
> > > also welcome in
> > > > > > > > > the
> > > > > > > > > > > > > > > > discussion thread.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Thanks & Best Regards
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > > > > Thanks & Best Regards
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > > Thanks & Best Regards
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > Thanks & Best Regards
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > Thanks & Best Regards
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > Thanks & Best Regards
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Thanks & Best Regards
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Thanks & Best Regards
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Thanks & Best Regards
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Thanks & Best Regards
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks & Best Regards
> > > >
> > >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix

Re: [VOTE] Parquet Bloom filter spec sign-off

Posted by Wes McKinney <we...@gmail.com>.
+1 from me

On Fri, Sep 6, 2019 at 4:00 PM Ryan Blue <rb...@netflix.com.invalid> wrote:
>
> +1 on the current spec. Is everyone else still +1?
>
> Sorry for the delay, I didn't realize that everything had been addressed
> and I didn't see the email from Jim in my inbox.
>
> On Wed, Aug 28, 2019 at 10:13 AM Jim Apple <jb...@apache.org> wrote:
>
> > We've got +1's from Zoltan and Gabor. Ryan, you've committed a few BF
> > patches that were written in response to your feedback on this list. Are
> > you in a position to vote +1 now, or do you have further concerns we could
> > address?
> >
> > On 2019/07/31 02:17:15, 俊杰陈 <cj...@gmail.com> wrote:
> > > Dear Parquet developers
> > >
> > > We still need your vote!
> > >
> > >
> > > On Wed, Jul 24, 2019 at 9:30 PM 俊杰陈 <cj...@gmail.com> wrote:
> > > >
> > > > Hi @Ryan Blue  @Wes McKinney
> > > >
> > > > We need your valuable vote, any feedback is welcome as well.
> > > >
> > > > On Tue, Jul 23, 2019 at 1:24 PM 俊杰陈 <cj...@gmail.com> wrote:
> > > > >
> > > > > Call for voting again.
> > > > >
> > > > > On Fri, Jul 19, 2019 at 1:17 PM 俊杰陈 <cj...@gmail.com> wrote:
> > > > > >
> > > > > > Dear Parquet developers
> > > > > >
> > > > > > We need more votes, please help to vote on this.
> > > > > >
> > > > > > On Wed, Jul 17, 2019 at 3:42 PM Gabor Szadovszky
> > > > > > <ga...@cloudera.com.invalid> wrote:
> > > > > > >
> > > > > > > After getting in PARQUET-1625 I vote again for having bloom
> > filter spec and
> > > > > > > the thrift file update as is in parquet-format master.
> > > > > > > +1 (binding)
> > > > > > >
> > > > > > > On Mon, Jul 15, 2019 at 3:23 PM 俊杰陈 <cj...@gmail.com> wrote:
> > > > > > >
> > > > > > > > Thanks Gabor, It's never too late to make it better. We don't
> > have to
> > > > > > > > run it in a hurry, it has been developed for a long time yet.:)
> > > > > > > >
> > > > > > > > The thrift file is indeed a bit lag behind the spec. As the
> > spec
> > > > > > > > defined, the bloom filter data is stored near the footer which
> > means
> > > > > > > > we don't have to handle it like the page. Therefore, I just
> > opened a
> > > > > > > > jira to remove bloom_filter_page_header in PageHeader
> > structure, while
> > > > > > > > the BloomFitlerHeader is kept intentionally for convenience.
> > Since the
> > > > > > > > spec and the thrift should be aligned with each other
> > eventually, so
> > > > > > > > the vote target is both of them.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Jul 15, 2019 at 7:48 PM Gabor Szadovszky
> > > > > > > > <ga...@cloudera.com.invalid> wrote:
> > > > > > > > >
> > > > > > > > > Hi Junjie,
> > > > > > > > >
> > > > > > > > > Sorry for bringing up this a bit late but I have some
> > problems with the
> > > > > > > > > format update. The parquet.thrift file is updated to have
> > the bloom
> > > > > > > > filters
> > > > > > > > > as a page (just as dictionaries and data pages). Meanwhile,
> > the spec
> > > > > > > > > (BloomFilter.md) says that the bloom filter is stored near
> > the footer.
> > > > > > > > So,
> > > > > > > > > if the bloom filter is not part of the row-groups (like
> > column indexes) I
> > > > > > > > > would not add it as a page. See the struct ColumnIndex in
> > the thrift
> > > > > > > > file.
> > > > > > > > > This struct is not referenced anywhere in it only declared.
> > It was done
> > > > > > > > > this way because we don't parse it in the same way as we
> > parse the pages.
> > > > > > > > >
> > > > > > > > > Currently, I am not 100% sure about the target of this vote.
> > If it is a
> > > > > > > > > vote about adding bloom filters in general then it is a +1
> > (binding). If
> > > > > > > > it
> > > > > > > > > is about adding the bloom filters to parquet-format as is
> > then, it is a
> > > > > > > > -1
> > > > > > > > > (binding) until we fix the issue above.
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > > Gabor
> > > > > > > > >
> > > > > > > > > On Mon, Jul 15, 2019 at 11:45 AM Gidon Gershinsky <
> > gg5070@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > +1 (non-binding)
> > > > > > > > > >
> > > > > > > > > > On Mon, Jul 15, 2019 at 12:08 PM Zoltan Ivanfi
> > <zi@cloudera.com.invalid
> > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > +1 (binding)
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Jul 15, 2019 at 9:57 AM 俊杰陈 <cj...@gmail.com>
> > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Dear Parquet developers
> > > > > > > > > > > >
> > > > > > > > > > > > I'd like to resume this vote, you can start to vote
> > now. Thanks for
> > > > > > > > > > your
> > > > > > > > > > > time.
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Jul 10, 2019 at 9:29 PM 俊杰陈 <
> > cjjnjust@gmail.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > I see, will resume this next week.  Thanks.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Jul 10, 2019 at 5:26 PM Zoltan Ivanfi
> > > > > > > > > > <zi...@cloudera.com.invalid>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Junjie,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Since there are ongoing improvements addressing
> > review
> > > > > > > > comments, I
> > > > > > > > > > > would
> > > > > > > > > > > > > > hold off with the vote for a few more days until
> > the
> > > > > > > > specification
> > > > > > > > > > > settles.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Br,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Zoltan
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Jul 10, 2019 at 9:32 AM 俊杰陈 <
> > cjjnjust@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Parquet committers and developers
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > We are waiting for your important ballot:)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Tue, Jul 9, 2019 at 10:21 AM 俊杰陈 <
> > cjjnjust@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Yes, there are some public benchmark results,
> > such as the
> > > > > > > > > > > official
> > > > > > > > > > > > > > > > benchmark from xxhash site (
> > http://www.xxhash.com/) and
> > > > > > > > > > > published
> > > > > > > > > > > > > > > > comparison from smhasher project
> > > > > > > > > > > > > > > > (https://github.com/rurban/smhasher/).
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Tue, Jul 9, 2019 at 5:25 AM Wes McKinney <
> > > > > > > > > > wesmckinn@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Do you have any benchmark data to support
> > the choice of
> > > > > > > > hash
> > > > > > > > > > > function?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Wed, Jul 3, 2019 at 8:41 AM 俊杰陈 <
> > cjjnjust@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Dear Parquet developers
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > To simplify the voting, I 'd like to
> > update voting
> > > > > > > > content
> > > > > > > > > > > to the
> > > > > > > > > > > > > > > spec
> > > > > > > > > > > > > > > > > > with xxHash hash strategy. Now you can
> > reply with +1
> > > > > > > > or -1.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks for your participation.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > On Tue, Jul 2, 2019 at 10:23 AM 俊杰陈 <
> > > > > > > > cjjnjust@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Dear Parquet developers
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Parquet Bloom filter has been developed
> > for a while,
> > > > > > > > per
> > > > > > > > > > > the
> > > > > > > > > > > > > > > discussion on the mail list, it's time to call a
> > vote for
> > > > > > > > spec to
> > > > > > > > > > > move
> > > > > > > > > > > > > > > forward. The current spec can be found at
> > > > > > > > > > > > > > >
> > > > > > > > > > >
> > https://github.com/apache/parquet-format/blob/master/BloomFilter.md.
> > > > > > > > > > > > > > > There are some different options about the
> > internal hash
> > > > > > > > choice
> > > > > > > > > > of
> > > > > > > > > > > Bloom
> > > > > > > > > > > > > > > filter and the PR is for that concern.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > So I 'd like to propose to vote the spec
> > + hash
> > > > > > > > option,
> > > > > > > > > > for
> > > > > > > > > > > > > > > example:
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > +1 to spec and xxHash
> > > > > > > > > > > > > > > > > > > +1 to spec and murmur3
> > > > > > > > > > > > > > > > > > > ...
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Please help to vote, any feedback is
> > also welcome in
> > > > > > > > the
> > > > > > > > > > > > > > > discussion thread.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Thanks & Best Regards
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > > > Thanks & Best Regards
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > Thanks & Best Regards
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > Thanks & Best Regards
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > Thanks & Best Regards
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Thanks & Best Regards
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Thanks & Best Regards
> > > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Thanks & Best Regards
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Thanks & Best Regards
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks & Best Regards
> > >
> > >
> > >
> > > --
> > > Thanks & Best Regards
> > >
> >
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix