You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Bankim Bhavsar (Jira)" <ji...@apache.org> on 2020/06/02 19:45:00 UTC

[jira] [Commented] (KUDU-3140) Add heuristics to disable predicate evaluation/filtering for Bloom filter predicate

    [ https://issues.apache.org/jira/browse/KUDU-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124250#comment-17124250 ] 

Bankim Bhavsar commented on KUDU-3140:
--------------------------------------

HDFS scanner for Parquet maintains a per predicate stat. For every 16 blocks, it checks the effectiveness of the filter and if the rejection ration is less than 10%(by default) then the filter is disabled.
Code pointers: 
https://github.com/apache/impala/blob/master/be/src/exec/hdfs-scanner.cc#L775
https://github.com/apache/impala/blob/master/be/src/exec/hdfs-scanner.h#L138
https://github.com/apache/impala/blob/master/be/src/exec/hdfs-scanner-ir.cc#L102

> Add heuristics to disable predicate evaluation/filtering for Bloom filter predicate
> -----------------------------------------------------------------------------------
>
>                 Key: KUDU-3140
>                 URL: https://issues.apache.org/jira/browse/KUDU-3140
>             Project: Kudu
>          Issue Type: Improvement
>          Components: perf, util
>    Affects Versions: 1.12.0
>            Reporter: Bankim Bhavsar
>            Assignee: Bankim Bhavsar
>            Priority: Major
>
> KUDU-2483 introduced support for Bloom filter predicate.
> However as observed with TPCH, query 9 exhibits regression when Bloom filter predicates are pushed down to kudu.
> See excerpt from performance analysis of TPCH run by [~wzhou].
> https://gist.github.com/bbhavsar/811ccbe0cd144090f82bdabcd801f827



--
This message was sent by Atlassian Jira
(v8.3.4#803005)