You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Stamatis Zampetakis (Jira)" <ji...@apache.org> on 2020/10/09 12:06:00 UTC

[jira] [Created] (HIVE-24252) Improve decision model for using semijoin reducers

Stamatis Zampetakis created HIVE-24252:
------------------------------------------

             Summary: Improve decision model for using semijoin reducers
                 Key: HIVE-24252
                 URL: https://issues.apache.org/jira/browse/HIVE-24252
             Project: Hive
          Issue Type: Improvement
            Reporter: Stamatis Zampetakis
            Assignee: Stamatis Zampetakis


After a few experiments with TPC-DS 10TB dataset, we observed that in some cases semijoin reducers were not effective; they didn't reduce the number of records or they reduced the relation only a tiny bit. 

In some cases we can make the semijoin reducer more effective by adding more columns but this requires also a bigger bloom filter so the decision for the number of columns to include in the bloom becomes more delicate.

The current decision model always chooses multi-column semijoin reducers if they are available but this may not always beneficial if the a single column can reduce significantly the target relation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)