You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@parquet.apache.org by 俊杰陈 <cj...@gmail.com> on 2018/09/04 15:41:50 UTC

Re: [VOTE] Finalizing the design and moving forward to read/write implementation

I agree with Jim that we might discover more when implementing
reader/writer and there should be no major change for parquet-format
because:

what type of bloom filter to use?
We use block-based Bloom filter now and no major changes if we plan to
support others. Just add it to defined algorithm union.

where to add them in the file?
At beginning of row group.  This is defined by offset specific in column
chunk metadata so at least there is no change for parquet-format if we want
to add it in different places.

what thrift object should contain?
The thrift definition now contains enough information to read a block-based
bloom filter, it might need to add other info if we plan to support other
type bloom filters in future.

I can submit reader/writer PR in java side make this clear once we finish
bloom filter utility PR in java side.

Jim Apple <jb...@apache.org> 于2018年9月1日周六 上午12:26写道：

> On 2018/08/30 19:41:59, Ryan Blue <rb...@netflix.com.INVALID> wrote:
> > Jim, do you think that the implementation is going to make major changes
> to
> > the design of how bloom filters are stored in files?
>
> I don't foresee any problems with the current layout.
>

-- 
Thanks & Best Regards