You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/05/03 11:45:00 UTC

[jira] [Commented] (PARQUET-41) Add bloom filters to parquet statistics

    [ https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16462312#comment-16462312 ] 

ASF GitHub Bot commented on PARQUET-41:
---------------------------------------

BenoitHanotte commented on issue #425: PARQUET-41:Add Bloom Filter for parquet
URL: https://github.com/apache/parquet-mr/pull/425#issuecomment-386268112
 
 
   I am quite interested in this topic, is there any plan to move forward with this?
   
   As we already depend on guava 18.0 in the parquet-hadoop module, why don't we simply use the BloomFilter class it provides? (https://google.github.io/guava/releases/18.0/api/docs/com/google/common/hash/BloomFilter.html)
   It is very efficient and well tested. If it fit our needs I would suggest not bothering re-implementing the filters as it is not a trivial task.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Add bloom filters to parquet statistics
> ---------------------------------------
>
>                 Key: PARQUET-41
>                 URL: https://issues.apache.org/jira/browse/PARQUET-41
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-format, parquet-mr
>            Reporter: Alex Levenson
>            Assignee: Ferdinand Xu
>            Priority: Major
>              Labels: filter2
>
> For row groups with no dictionary, we could still produce a bloom filter. This could be very useful in filtering entire row groups.
> Pull request:
> https://github.com/apache/parquet-mr/pull/215



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)