You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Ben Kietzman (Jira)" <ji...@apache.org> on 2021/02/22 18:29:00 UTC

[jira] [Commented] (ARROW-11384) [C++][Dataset] Support bloom filters in predicate pushdown

    [ https://issues.apache.org/jira/browse/ARROW-11384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288556#comment-17288556 ] 

Ben Kietzman commented on ARROW-11384:
--------------------------------------

The first step here would be providing a compute function which tests inputs against a Bloom filter. This function could then be referenced by (for example) the expressions extracted from row group statistics. Finally, a special case would be added to expression simplification to test if a filter could be satisfied given a bloom filter. For example:

{code}
SimplifyGivenGuarantee(equal(field_ref("a"), literal(1)), bloom_filter(field_ref("a"), ...)))
{code}

would either return {{literal(false)}} to indicate that the filter is unsatisfiable or pass through {{equal(field_ref("a"), literal(1))}} to indicate that the Bloom filter does not exclude the value 1.

> [C++][Dataset] Support bloom filters in predicate pushdown
> ----------------------------------------------------------
>
>                 Key: ARROW-11384
>                 URL: https://issues.apache.org/jira/browse/ARROW-11384
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Ben Kietzman
>            Priority: Major
>              Labels: dataset, parquet
>
> The parquet spec includes bloom filters which can be useful during filtration. In the context of dataset::, this would be expressed as additional parquet statistics expressions on each row group, allowing entirely-excluded row groups to be skipped more aggressively.
> Prerequisite: https://issues.apache.org/jira/browse/PARQUET-1327 (reader/writer support for bloom filters)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)