You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Ferdinand Xu (JIRA)" <ji...@apache.org> on 2015/06/23 09:24:01 UTC

[jira] [Comment Edited] (PARQUET-41) Add bloom filters to parquet statistics

    [ https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597222#comment-14597222 ] 

Ferdinand Xu edited comment on PARQUET-41 at 6/23/15 7:23 AM:
--------------------------------------------------------------

Hi,
Any suggestion or comments about my current solution?
I'm also thinking about using the Bloom Filter API from Guava instead of implementing it by our own.
[http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/hash/BloomFilter.html#create(com.google.common.hash.Funnel, int, double)]
In the first step we should finalize what we should store in the parquet-format.
If trying to use the guava, we will store the expected insertions and the false positive probabilities which could be different from the current solution.

With the regards of the comments from [~dwhite], we could put the discussion of multi-strategies support here. And also we could discuss about how we archive the fall back for bloom filter as [~spena] suggests.

Thank you!


was (Author: ferd):
Hi,
I'm thinking about using the Bloom Filter API from Guava instead of implementing it by our own. Any suggestions or comments?
[http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/hash/BloomFilter.html#create(com.google.common.hash.Funnel, int, double)]
Thank you!

> Add bloom filters to parquet statistics
> ---------------------------------------
>
>                 Key: PARQUET-41
>                 URL: https://issues.apache.org/jira/browse/PARQUET-41
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-format, parquet-mr
>            Reporter: Alex Levenson
>            Assignee: Ferdinand Xu
>              Labels: filter2
>
> For row groups with no dictionary, we could still produce a bloom filter. This could be very useful in filtering entire row groups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)