You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Stamatis Zampetakis (Jira)" <ji...@apache.org> on 2020/10/09 11:44:00 UTC
[jira] [Created] (HIVE-24251) Improve bloom filter size estimation
for multi column semijoin reducers
Stamatis Zampetakis created HIVE-24251:
------------------------------------------
Summary: Improve bloom filter size estimation for multi column semijoin reducers
Key: HIVE-24251
URL: https://issues.apache.org/jira/browse/HIVE-24251
Project: Hive
Issue Type: Improvement
Components: Query Planning
Reporter: Stamatis Zampetakis
Assignee: Stamatis Zampetakis
There are various cases where the expected size of the bloom filter is largely underestimated making the semijoin reducer completely ineffective. This more relevant for multi-column semi join reducers since the current [code|https://github.com/apache/hive/blob/d61c9160ffa5afbd729887c3db690eccd7ef8238/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFBloomFilter.java#L273] does not take them into account.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)