You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by 俊杰陈 <cj...@gmail.com> on 2019/05/31 02:11:04 UTC

[vote] Merge bloom-filter branch to master

Hi developers

I 'd like to propose a vote for merging the Bloom filter branch to master.
Currently the feature on branch is functional well and can be use directly,
it is also a independent feature that won't break existing feature or
ongoing feature development such as dictionary filter and page index, when
the switch is off.  Though there are optimizations we should do, such as
xxhash, folding bloom filters and etc., I think we  can handle optimization
further on the master.

Please help to vote on this.



Thanks & Best Regards

Re: [vote] Merge bloom-filter branch to master

Posted by Jim Apple <jb...@apache.org>.
https://github.com/apache/parquet-format/commit/f0eab9d64c3563e14cf2c4959f345372e1ba0c8f is now merged. Todd's xxhash proposal hasn't received an update recently; I was convinced by Junjie's argument that extensibility allows us to add it later.

Should we discuss a parquet-format release?

Re: [vote] Merge bloom-filter branch to master

Posted by 俊杰陈 <cj...@gmail.com>.
Thanks Zoltan

I planed to update naming issue when we have another update. Let me open a
jira to do this.

For the hash choice from Todd, both of xxh3 and murmur3 can be coexist at
same time, so I planed to add xxh3 later since it needs some effort to
implement and benchmark.



On Tue, Jun 11, 2019 at 7:27 PM Zoltan Ivanfi <zi...@cloudera.com.invalid>
wrote:

> Hi,
>
> It has been merged into master but has not been released yet. In fact,
> I asked for a minor change before releasing it:
>
> https://github.com/apache/parquet-format/commit/54839ad5e04314c944fed8aa4bc6cf15e4a58698#r31084264
> It may seem like a nit, but I think the naming of the parquet
> structures is important. I also see that Todd raised some concerns
> about the choice of the hash function.
>
> Br,
>
> Zoltan
>
>
> On Fri, Jun 7, 2019 at 9:30 PM Jim Apple <jb...@apache.org> wrote:
> >
> > On 2019/05/31 16:01:54, Ryan Blue <rb...@netflix.com.INVALID> wrote:
> > > -1
> > >
> > > Junjie, I think we need to vote to adopt the proposed spec before
> > > committing code that implements it.
> >
> > Ryan, it seems like Junjie and I think that the spec has already been
> adopted and is in the repo:
> >
> >
> https://github.com/apache/parquet-format/commit/54839ad5e04314c944fed8aa4bc6cf15e4a58698
> >
> > Is your view that this is in the repo but has not been adopted?
> >
> > If it has, do you see other blockers to a vote on the implementations?
>


-- 
Thanks & Best Regards

Re: [vote] Merge bloom-filter branch to master

Posted by Zoltan Ivanfi <zi...@cloudera.com.INVALID>.
Hi,

It has been merged into master but has not been released yet. In fact,
I asked for a minor change before releasing it:
https://github.com/apache/parquet-format/commit/54839ad5e04314c944fed8aa4bc6cf15e4a58698#r31084264
It may seem like a nit, but I think the naming of the parquet
structures is important. I also see that Todd raised some concerns
about the choice of the hash function.

Br,

Zoltan


On Fri, Jun 7, 2019 at 9:30 PM Jim Apple <jb...@apache.org> wrote:
>
> On 2019/05/31 16:01:54, Ryan Blue <rb...@netflix.com.INVALID> wrote:
> > -1
> >
> > Junjie, I think we need to vote to adopt the proposed spec before
> > committing code that implements it.
>
> Ryan, it seems like Junjie and I think that the spec has already been adopted and is in the repo:
>
> https://github.com/apache/parquet-format/commit/54839ad5e04314c944fed8aa4bc6cf15e4a58698
>
> Is your view that this is in the repo but has not been adopted?
>
> If it has, do you see other blockers to a vote on the implementations?

Re: [vote] Merge bloom-filter branch to master

Posted by Jim Apple <jb...@apache.org>.
On 2019/05/31 16:01:54, Ryan Blue <rb...@netflix.com.INVALID> wrote: 
> -1
> 
> Junjie, I think we need to vote to adopt the proposed spec before
> committing code that implements it.

Ryan, it seems like Junjie and I think that the spec has already been adopted and is in the repo:

https://github.com/apache/parquet-format/commit/54839ad5e04314c944fed8aa4bc6cf15e4a58698

Is your view that this is in the repo but has not been adopted?

If it has, do you see other blockers to a vote on the implementations?

Re: [vote] Merge bloom-filter branch to master

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
-1

Junjie, I think we need to vote to adopt the proposed spec before
committing code that implements it.

On Fri, May 31, 2019 at 7:40 AM Wes McKinney <we...@gmail.com> wrote:

> Are there any tools or scripts to assist with integration testing the
> Bloom filter across Java and C++ (or is it a manual process)? I think
> we are relying on a binary file in the C++ library to test
> deserialization.
>
> On Thu, May 30, 2019 at 9:11 PM 俊杰陈 <cj...@gmail.com> wrote:
> >
> > Hi developers
> >
> > I 'd like to propose a vote for merging the Bloom filter branch to
> master.
> > Currently the feature on branch is functional well and can be use
> directly,
> > it is also a independent feature that won't break existing feature or
> > ongoing feature development such as dictionary filter and page index,
> when
> > the switch is off.  Though there are optimizations we should do, such as
> > xxhash, folding bloom filters and etc., I think we  can handle
> optimization
> > further on the master.
> >
> > Please help to vote on this.
> >
> >
> >
> > Thanks & Best Regards
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: [vote] Merge bloom-filter branch to master

Posted by Wes McKinney <we...@gmail.com>.
Are there any tools or scripts to assist with integration testing the
Bloom filter across Java and C++ (or is it a manual process)? I think
we are relying on a binary file in the C++ library to test
deserialization.

On Thu, May 30, 2019 at 9:11 PM 俊杰陈 <cj...@gmail.com> wrote:
>
> Hi developers
>
> I 'd like to propose a vote for merging the Bloom filter branch to master.
> Currently the feature on branch is functional well and can be use directly,
> it is also a independent feature that won't break existing feature or
> ongoing feature development such as dictionary filter and page index, when
> the switch is off.  Though there are optimizations we should do, such as
> xxhash, folding bloom filters and etc., I think we  can handle optimization
> further on the master.
>
> Please help to vote on this.
>
>
>
> Thanks & Best Regards