You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Stamatis Zampetakis (Jira)" <ji...@apache.org> on 2020/10/09 11:44:00 UTC

[jira] [Created] (HIVE-24251) Improve bloom filter size estimation for multi column semijoin reducers

Stamatis Zampetakis created HIVE-24251:
------------------------------------------

             Summary: Improve bloom filter size estimation for multi column semijoin reducers
                 Key: HIVE-24251
                 URL: https://issues.apache.org/jira/browse/HIVE-24251
             Project: Hive
          Issue Type: Improvement
          Components: Query Planning
            Reporter: Stamatis Zampetakis
            Assignee: Stamatis Zampetakis


There are various cases where the expected size of the bloom filter is largely underestimated  making the semijoin reducer completely ineffective. This more relevant for multi-column semi join reducers since the current [code|https://github.com/apache/hive/blob/d61c9160ffa5afbd729887c3db690eccd7ef8238/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFBloomFilter.java#L273] does not take them into account.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)